pfrater / arcpullr

21 stars 9 forks source link

id_split chunks size too big #4

Closed rdenham closed 1 year ago

rdenham commented 1 year ago

Thanks for writing this very handy package. I think though I run into a problem with retrieving spatial data due to queries becoming too large if all records are requested in one query.

Looking at the code, I see that features are retrieved in chunks:

get_esri_features <- function(query_url, fields, where, token='', head, ...) {
  ids <- get_object_ids(query_url, where, token, ...)
  if(is.null(ids)){
    warning("No records match the search critera")
    return()
  }
  if (isTRUE(head)) {
    id_splits <- ids[1:5]
  } else if (head > 0 & head < 500) {
    id_splits <- ids[1:head]
  } else {
    id_splits <- split(ids, ceiling(seq_along(ids)/500))
  }
  results <- lapply(
    id_splits,
    get_esri_features_by_id,
    query_url,
    fields,
    token,
    ...
  )
  merged <- unlist(results, recursive=FALSE)
  return(merged)
}

I think that the default chunk size of 500 is too big for some of the things I'd like to retrieve. I tested it with smaller values, and it seemed to work. Note that you don't get any errors or warnings, just an empty result. I wouldn't be suprised if there were some server setting that limits the size of the returned query, but I'm not really familiar with the ArcGIS server.

Here is my test:

> subbioregions.sf <- get_spatial_layer("https://spatial-gis.information.qld.gov.au/arcgis/rest/services/Boundaries/AdminBoundariesFramework/MapServer/3",
+                                       idsplit=500)
> subbioregions.sf
Simple feature collection with 0 features and 0 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS 84
[1] geoms
<0 rows> (or 0-length row.names)
> subbioregions.sf <- get_spatial_layer("https://spatial-gis.information.qld.gov.au/arcgis/rest/services/Boundaries/AdminBoundariesFramework/MapServer/3",
+                                       idsplit=5)
> subbioregions.sf
Simple feature collection with 133 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 137.9946 ymin: -29.1779 xmax: 153.5529 ymax: -9.067217
Geodetic CRS:  WGS 84
First 10 features:
   q_reg    q_reg_name q_sub            q_sub_name objectid                          geoms
1    BRB Brigalow Belt  11.1     Townsville Plains        1 MULTIPOLYGON (((146.615 -19...
2    BRB Brigalow Belt 11.10          Basalt Downs        2 MULTIPOLYGON (((147.6078 -2...
3    BRB Brigalow Belt 11.11   Isaac - Comet Downs        3 MULTIPOLYGON (((148.6707 -2...
4    BRB Brigalow Belt 11.12 Nebo - Connors Ranges        4 MULTIPOLYGON (((149.0068 -2...
5    BRB Brigalow Belt 11.13  South Drummond Basin        5 MULTIPOLYGON (((146.7731 -2...
6    BRB Brigalow Belt 11.14    Marlborough Plains        6 MULTIPOLYGON (((149.464 -22...
7    BRB Brigalow Belt 11.15    Claude River Downs        7 MULTIPOLYGON (((146.2638 -2...
8    BRB Brigalow Belt 11.16            Woorabinda        8 MULTIPOLYGON (((149.0985 -2...
9    BRB Brigalow Belt 11.17          Boomer Range        9 MULTIPOLYGON (((149.6715 -2...
10   BRB Brigalow Belt 11.18   Mount Morgan Ranges       10 MULTIPOLYGON (((149.8721 -2...

I added the argument idsplit in get_esri_features which then gets carried through by your ... mechanism.

Do you think this could be added to the code? Happy to make PR if you'd like.

rdenham commented 1 year ago

I chose to add idsplit directly rather than relying on passing it via ... so the intention of ... remained unchanged (ie in the help it says Additional arguments to pass to the ArcGIS REST POST request). I didn't want to change this.

rdenham commented 1 year ago

Thanks for addressing this issue @pfrater

Is it possible to now provide some documentation on the idsplits argument ?

If it's of any use, my attempt was:

#' @param idsplit Positive integer. Limits the number of records returned in
#' each request. To get the full set of records, multiple requests will be made
#' in batches of no more than \code{idsplit} in size. These will then be merged.
#' Setting this to a smaller value can be useful when requesting
#' fewer, complicated features, which might otherwise silently return an empty
#' layer.