yonghah / esri2sf

Scrape features from ArcGIS Server REST API and create simple features dataframe
Other
138 stars 37 forks source link

Modernize package w/ switch from {httr} to {httr2} + adopting {cli} for messages #49

Open elipousson opened 2 years ago

elipousson commented 2 years ago

As discussed previously, this is the issue to support discussion of a forthcoming pull request to switch the package over from {httr} to {httr2}. Simultaneous with making these changes, I also made several other significant (more or less) changes:

A few things to highlight:

I noticed a few issues that should be resolved before merging the pull request:

I expect there are a few other changes or bugs that I'm forgetting to mention but hopefully this is enough to get a review started.

elipousson commented 2 years ago

@jacpete No rush if you're busy but, if you have time to take a look at #50, I likely have some time next week to work on this update. Even if you just want to flag oustanding issues from your tests, I can start on those before attempting to add any new features!

elipousson commented 2 years ago

I've incorporated a bunch of minor fixes and a few exciting new features (including support for caching with the httr2::req_cache() function) into the PR #50 but I'm still running into a few weird issues where there are URLs that the existing version supports but the new version won't.

Here is one example: https://egisdata.baltimorecity.gov/egis/rest/services/Housing/dmxOwnership/MapServer/0

It works fine with the current version of esri2sf() but returns a "HTTP 404 Not Found." error with the new version. getObjectIds works fine but it fails on getEsriFeaturesByIds and I can't figure out what the issue is. I tried wrapping the call with httr2::with_verbosity() which is helpful (a nice feature of the switch to httr2) but it didn't turn up anything obvious to my eyes.

Despite this challenge, the switch to using {httr2} and {cli} seems like it still a good idea. I'm not sure how to do a speed comparison with two different versions of the same package but it feels like it may be faster.

elipousson commented 2 years ago

I suspect the problem may be related to the length of the request URL when there are a large number of object IDs. I also think there may be a way of throttling the request that doesn't require the two-step process of getting object IDs and then getting the features although, if there is an easier solution, I'm all for it.

elipousson commented 2 years ago

The {osmdata} package just completed a migration from httr to httr2 so I'm hoping to review the changes in this related issue to see if there is anything they are doing for that package that is missing for {esri2sf} https://github.com/ropensci/osmdata/issues/272

elipousson commented 2 years ago

Wrapping the objectId and outFields parameters with the new I() flag in {httr2} to allow for skipping encoding for some parameters seems to have fixed a lot of the problems I ran into earlier this month. The version at the pull request now passes all the tests (excepting the one with a secret token that I can't run). It is soooo close at this point!

A few outstanding items: generateToken and generateOAuthToken are still using {httr} and the URL check helpers could be rewritten to allow the expanded range of URLs supported by esriIndex but we could also let httr2 handle the url checks since it returns fairly helpful error messages most of the time.

Unfortunately, he baltimorecity.gov URL linked above still isn't working when I run it and it is returning this error: LibreSSL SSL_read: SSL_ERROR_SYSCALL, errno 54. Curiously however, if I run the same URL with a manually defined objectIds parameter it works. Hope you can still take a look sometime soon, @jacpete.

Here is a reprex:

library(esri2sf)

esri2sf(
  "https://egisdata.baltimorecity.gov/egis/rest/services/Housing/dmxOwnership/MapServer/0"
)
#> ✔ Downloading "Properties" from <https://egisdata.baltimorecity.gov/egis/rest/services/Housing/dmxOwnership/MapServer/0>
#> Layer type: "Feature Layer"
#> Geometry type: "esriGeometryPolygon"
#> Service Coordinate Reference System: "EPSG:2248"
#> Output Coordinate Reference System: "EPSG:4326"
#> Error:
#> ! LibreSSL SSL_read: SSL_ERROR_SYSCALL, errno 54

esri2sf(
  "https://egisdata.baltimorecity.gov/egis/rest/services/Housing/dmxOwnership/MapServer/0",
  objectIds = paste0(c(1:25), collapse = ",")
  )
#> ✔ Downloading "Properties" from <https://egisdata.baltimorecity.gov/egis/rest/services/Housing/dmxOwnership/MapServer/0>
#> Layer type: "Feature Layer"
#> Geometry type: "esriGeometryPolygon"
#> Service Coordinate Reference System: "EPSG:2248"
#> Output Coordinate Reference System: "EPSG:4326"
#> Simple feature collection with 25 features and 38 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -76.65118 ymin: 39.30897 xmax: -76.64997 ymax: 39.30958
#> Geodetic CRS:  WGS 84
#> First 10 features:
#>    OBJECTID BLOCKLOT BLOCK  LOT         FULLADDR BLDG_NO FRACTION SPAN_NUM
#> 1         1 0001 001 0001  001  2045 W NORTH AVE    2045                 0
#> 2         2 0001 002 0001  002  2043 W NORTH AVE    2043                 0
#> 3         3 0001 003 0001  003  2041 W NORTH AVE    2041                 0
#> 4         4 0001 004 0001  004  2039 W NORTH AVE    2039                 0
#> 5         5 0001 005 0001  005  2037 W NORTH AVE    2037                 0
#> 6         6 0001 006 0001  006  2035 W NORTH AVE    2035                 0
#> 7         7 0001 007 0001  007  2033 W NORTH AVE    2033                 0
#> 8         8 0001 008 0001  008  2031 W NORTH AVE    2031                 0
#> 9         9 0001 009 0001  009  2029 W NORTH AVE    2029                 0
#> 10       10 0001 010 0001  010  2027 W NORTH AVE    2027                 0
#>    STDIRPRE                   ST_NAME ST_TYPE ZIP_CODE TAXBASE FULLCASH
#> 1         W NORTH                        AVE     21217  161800        0
#> 2         W NORTH                        AVE     21217   49500        0
#> 3         W NORTH                        AVE     21217   11000        0
#> 4         W NORTH                        AVE     21217   11000        0
#> 5         W NORTH                        AVE     21217   23000        0
#> 6         W NORTH                        AVE     21217    8000        0
#> 7         W NORTH                        AVE     21217   23000        0
#> 8         W NORTH                        AVE     21217   11000        0
#> 9         W NORTH                        AVE     21217   23000        0
#> 10        W NORTH                        AVE     21217   23000        0
#>    PERMHOME NO_IMPRV ZONECODE SALEPRIC SALEDATE OWNER_ABBR
#> 1         N       NA    C-1       5000 11282005         NA
#> 2         N       NA    C-1       5000 11282005         NA
#> 3         N       NA    OR-1*        0 10171975         NA
#> 4         N       NA    OR-1*        0 11132014         NA
#> 5         N       NA    OR-1*    70000 06042021         NA
#> 6         N       NA    OR-1*    62000 10182021         NA
#> 7         N       NA    OR-1*     2120 06072006         NA
#> 8         N       NA    OR-1*     2000 12152021         NA
#> 9         N       NA    OR-1*    10000 03081995         NA
#> 10        N       NA    OR-1*    52500 08162021         NA
#>                              OWNER_1                           OWNER_2
#> 1  CHONG, YON UN                                                      
#> 2  CHONG, YON UN                                                      
#> 3  NEW YORK INC                                                       
#> 4  MUHAMMAD, ANNA AVON                                                
#> 5  THE WHITE CARR DEVELOPMENT GROUP, LLC                              
#> 6  WHITE CARR DEVELOPMENT GROUP LLC, THE                              
#> 7  NORTH AVENUE REVELATION PROJECT,  LLC.                             
#> 8  VERONA ENTERPRISE LLC                                              
#> 9  DOUGLAS, VERONA E                                                  
#> 10 JOHNSON, MONICA                                                    
#>                              OWNER_3 RESPAGCY                  PROPDESC
#> 1                                                                      
#> 2                                                                      
#> 3                                                                      
#> 4                                                                      
#> 5                                                                      
#> 6                                                                      
#> 7                                                                      
#> 8                                                                      
#> 9                                                                      
#> 10                                                                     
#>                                     NEIGHBOR YEAR_BUILD EXTD_ZIP DHCDUSE1
#> 1  EASTERWOOD                                      1920     1221     3052
#> 2  EASTERWOOD                                         0     1221     0000
#> 3  EASTERWOOD                                      1915     1221     1008
#> 4  EASTERWOOD                                      1920     1221     0000
#> 5  EASTERWOOD                                      1920     1221     5410
#> 6  EASTERWOOD                                      1915     1221     1007
#> 7  EASTERWOOD                                      1915     1221     1008
#> 8  EASTERWOOD                                      1920     1221     1007
#> 9  EASTERWOOD                                      1915     1221     1008
#> 10 EASTERWOOD                                      1915     1221     1008
#>                                                                                                                             SDATLINK
#> 1  http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=001 
#> 2  http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=002 
#> 3  http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=003 
#> 4  http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=004 
#> 5  http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=005 
#> 6  http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=006 
#> 7  http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=007 
#> 8  http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=008 
#> 9  http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=009 
#> 10 http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=010 
#>                                                   BLOCKPLAT
#> 1  https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 2  https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 3  https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 4  https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 5  https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 6  https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 7  https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 8  https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 9  https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 10 https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#>                                  MAILTOADD ProjDemo ReleaseDate   VBN_Issued
#> 1                  2045 W NORTH AVE, 21217       NA          NA           NA
#> 2                  2045 W NORTH AVE, 21217       NA          NA           NA
#> 3                         SUITE 201, 21208       NA          NA 1.327506e+12
#> 4  9726 MENDOZA RD RANDALLSTOWN, MD, 21133       NA          NA 1.555750e+12
#> 5          2601 PENNSYLVANIA AVENUE, 21217       NA          NA           NA
#> 6             2601 PENNSYLVANIA AVE, 21217       NA          NA           NA
#> 7    2311 OAK DRIVE IJAMSVILLE, MD., 21754       NA          NA 1.359728e+12
#> 8             5411 PARK HEIGHTS AVE, 21215       NA          NA 1.435767e+12
#> 9                  2029 W NORTH AVE, 21217       NA          NA           NA
#> 10        401 SPIKE CT FLORENCE, SC, 29505       NA          NA           NA
#>    Name Shape_Length Shape_Area                          geoms
#> 1    NA     200.1155   1359.934 MULTIPOLYGON (((-76.65113 3...
#> 2    NA     195.6736   1174.100 MULTIPOLYGON (((-76.65113 3...
#> 3    NA     196.0319   1188.322 MULTIPOLYGON (((-76.65101 3...
#> 4    NA     196.3390   1201.348 MULTIPOLYGON (((-76.65096 3...
#> 5    NA     194.0874   1106.553 MULTIPOLYGON (((-76.65091 3...
#> 6    NA     197.8996   1267.556 MULTIPOLYGON (((-76.65086 3...
#> 7    NA     194.7870   1136.311 MULTIPOLYGON (((-76.65088 3...
#> 8    NA     194.9372   1142.615 MULTIPOLYGON (((-76.65083 3...
#> 9    NA     193.6459   1088.861 MULTIPOLYGON (((-76.65072 3...
#> 10   NA     199.1617   1320.407 MULTIPOLYGON (((-76.65066 3...

Created on 2022-06-29 by the reprex package (v2.0.1)

elipousson commented 2 years ago

I think I have a semi-reasonable solution. I didn't solve the mystery of why these requests worked with {httr} and fail with {httr2} but it seems like the common solution is to set a maxRecords value that is well below the maxRecords value suggested by the layer information. I've added maxRecords to the examples in the README which both demonstrates how it works and is required to work with the sample URLs. I also implemented an error message that suggests trying maxRecords when a request fails at the end of getEsriFeatures (although I think there may be a few other issues that can prevent a request from completing so a broader list of recommendations would be preferable).

I added tests for esri2sf() and esriIndex(). I will also plan to add tests for esriInfo().

If this solution works, there are a couple of additional changes which would be needed if we wanted to complete a full migration from {httr} to {httr2}:

Help with the token functions would be welcome because I really haven't worked with authentication for ArcGIS REST API services before. This has been more work than I ever expected but it is exciting to see it is closer than ever!

elipousson commented 2 years ago

I finally found documentation of the limit that is causing the issue:

Users not directly interacting with administrative or REST APIs (for example, using scripts or having a web application firewall configured) may encounter query limitations. In instances where a query string exceeds 2,048 characters in length, the query should be submitted as a POST request.

httr2 has a method parameter that allows POST requests although I still need to figure out how that works.

elipousson commented 2 years ago

It took yet more trial and error but I fixed the issue with the too long URLs using httr2. This fixes the weird random errors but still leaves the removal of the remaining dependency for httr, updates to the URL checks (if needed), and improved documentation still to go.

elipousson commented 2 years ago

And – because nothing about the ArcGIS REST API is evidently ever easy – it turns out that this new approach works 100% fine with the sample servers and doesn't work at all with the Baltimore city or Maryland state GIS portal.

I think I found a post with someone experiencing a similar issue and highlights a possible solution of adding some additional details to the body of the POST request: https://community.esri.com/t5/arcgis-rest-api-questions/arcgis-server-rest-api-http-post-returning-html/td-p/478832

elipousson commented 2 years ago

The solution for this last issue was way easier than I expected: swapping httr2::req_body_json() for httr2::req_body_form() seems like it did the trick. I also added a partial implementation of the API for working with GeocodeServer URLs. This has been a round-about process but I think the major problems are finally taken care of.

elipousson commented 2 years ago

Any chance you're free this fall, @jacpete, to come up for air and review a new pull request for this issue?

I went a good bit beyond the original scope so my current fork includes a few new experimental functions (e.g. esrisearch() and esri2rast() and a rewrite of the URL validation functions to support content URLs for ArcGIS.com but I just dropped the dependency on {httr}. Some tests need to be updated (ESRI retired the sampleserver1.arcgisonline.com server). generateToken() may also need some testing and tweaks but I think I ported it over correctly.

If you're still tied up with your actual work, I can definitely understand but it would be great to finish this up and migrate the improvements back into the package.

elipousson commented 1 year ago

Hello @jacpete + @yonghah – I just wanted to give you a heads up that, since it seems like you don't have the time to manage contributions to esri2sf at this point, I'm planning to migrate my fork to a separate repository with a new name. I'll make sure I credit appropriately per the MIT license and explore changing function names to avoid a namespace conflict in the future. Let me know if you have any questions or suggestions!