pfrater / arcpullr

21 stars 9 forks source link

Use maxRecordCount as idsplits #12

Closed szego closed 1 year ago

szego commented 1 year ago

So that the user does not need to figure out the right value for idsplits themselves, just use layer_info$maxRecordCount as the idsplits value.

Basically I ran into the same issue as #4 and found that queries return the expected results by calling get_spatial_layer(..., idsplits = maxRecordCount).

For example, arcpullr currently returns 150 records for this query, while the code on my branch returns all 166:

get_spatial_layer(
  "https://www.portlandmaps.com/arcgis/rest/services/Public/COP_OpenData_PlanningDevelopment/MapServer/207"
)

If you visit that layer URL, you'll see there "MaxRecordCount: 150".

I've written the code so that users can still pass idsplits themselves. It will only be set to layer_info$maxRecordCount if idsplits is missing from the dots (...). This should preserve backward compatibility for anyone who currently passes idsplits to get_spatial_layer().

rdenham commented 1 year ago

Just a follow up comment on this, in case anyone comes across this.

I found that my problem persists even with this modification. So, users might still need to specify idsplits. Here is my example:

get_layer_info("https://spatial-gis.information.qld.gov.au/arcgis/rest/services/Boundaries/AdminBoundariesFramework/FeatureServer/3")$maxRecordCount

Shows that the maxRecordCount is 2000, but:

all.sf <- get_spatial_layer("https://spatial-gis.information.qld.gov.au/arcgis/rest/services/Boundaries/AdminBoundariesFramework/FeatureServer/3",
                            idsplits=2000)
nrow(all.sf)

returns 0 rows. Which is annoying, since there is no warning or message or anything. But

all.sf <- get_spatial_layer("https://spatial-gis.information.qld.gov.au/arcgis/rest/services/Boundaries/AdminBoundariesFramework/FeatureServer/3",
                            idsplits=7)
nrow(all.sf)

successfully fetches all 133 records.

I don't know how to work out what the ideal number of records is, or whether this is just a problem with my network. Just thought it could help people though.

rdenham commented 1 year ago

I bit more digging, it turns out that I'm gettting the following:

> query
$objectIds
[1] "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,106,107"

$outSR
[1] "4326"

$f
[1] "json"

gives the response:

> response
$error
$error$code
[1] 500

$error$message
[1] "Error performing query operation"

$error$details
list()

which according to https://support.esri.com/en-us/knowledge-base/error-error-performing-query-operation-000011736 means I'm trying to return a result set greater than 64mb.

I think I'll create a PR so we can at least catch the error, so users are aware and can reduce the id split size.