Open elipousson opened 2 years ago
@jacpete No rush if you're busy but, if you have time to take a look at #50, I likely have some time next week to work on this update. Even if you just want to flag oustanding issues from your tests, I can start on those before attempting to add any new features!
I've incorporated a bunch of minor fixes and a few exciting new features (including support for caching with the httr2::req_cache() function) into the PR #50 but I'm still running into a few weird issues where there are URLs that the existing version supports but the new version won't.
Here is one example: https://egisdata.baltimorecity.gov/egis/rest/services/Housing/dmxOwnership/MapServer/0
It works fine with the current version of esri2sf()
but returns a "HTTP 404 Not Found." error with the new version. getObjectIds works fine but it fails on getEsriFeaturesByIds and I can't figure out what the issue is. I tried wrapping the call with httr2::with_verbosity()
which is helpful (a nice feature of the switch to httr2) but it didn't turn up anything obvious to my eyes.
Despite this challenge, the switch to using {httr2} and {cli} seems like it still a good idea. I'm not sure how to do a speed comparison with two different versions of the same package but it feels like it may be faster.
I suspect the problem may be related to the length of the request URL when there are a large number of object IDs. I also think there may be a way of throttling the request that doesn't require the two-step process of getting object IDs and then getting the features although, if there is an easier solution, I'm all for it.
The {osmdata} package just completed a migration from httr to httr2 so I'm hoping to review the changes in this related issue to see if there is anything they are doing for that package that is missing for {esri2sf} https://github.com/ropensci/osmdata/issues/272
Wrapping the objectId and outFields parameters with the new I()
flag in {httr2} to allow for skipping encoding for some parameters seems to have fixed a lot of the problems I ran into earlier this month. The version at the pull request now passes all the tests (excepting the one with a secret token that I can't run). It is soooo close at this point!
A few outstanding items: generateToken
and generateOAuthToken
are still using {httr} and the URL check helpers could be rewritten to allow the expanded range of URLs supported by esriIndex
but we could also let httr2
handle the url checks since it returns fairly helpful error messages most of the time.
Unfortunately, he baltimorecity.gov URL linked above still isn't working when I run it and it is returning this error: LibreSSL SSL_read: SSL_ERROR_SYSCALL, errno 54
. Curiously however, if I run the same URL with a manually defined objectIds parameter it works. Hope you can still take a look sometime soon, @jacpete.
Here is a reprex:
library(esri2sf)
esri2sf(
"https://egisdata.baltimorecity.gov/egis/rest/services/Housing/dmxOwnership/MapServer/0"
)
#> ✔ Downloading "Properties" from <https://egisdata.baltimorecity.gov/egis/rest/services/Housing/dmxOwnership/MapServer/0>
#> Layer type: "Feature Layer"
#> Geometry type: "esriGeometryPolygon"
#> Service Coordinate Reference System: "EPSG:2248"
#> Output Coordinate Reference System: "EPSG:4326"
#> Error:
#> ! LibreSSL SSL_read: SSL_ERROR_SYSCALL, errno 54
esri2sf(
"https://egisdata.baltimorecity.gov/egis/rest/services/Housing/dmxOwnership/MapServer/0",
objectIds = paste0(c(1:25), collapse = ",")
)
#> ✔ Downloading "Properties" from <https://egisdata.baltimorecity.gov/egis/rest/services/Housing/dmxOwnership/MapServer/0>
#> Layer type: "Feature Layer"
#> Geometry type: "esriGeometryPolygon"
#> Service Coordinate Reference System: "EPSG:2248"
#> Output Coordinate Reference System: "EPSG:4326"
#> Simple feature collection with 25 features and 38 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -76.65118 ymin: 39.30897 xmax: -76.64997 ymax: 39.30958
#> Geodetic CRS: WGS 84
#> First 10 features:
#> OBJECTID BLOCKLOT BLOCK LOT FULLADDR BLDG_NO FRACTION SPAN_NUM
#> 1 1 0001 001 0001 001 2045 W NORTH AVE 2045 0
#> 2 2 0001 002 0001 002 2043 W NORTH AVE 2043 0
#> 3 3 0001 003 0001 003 2041 W NORTH AVE 2041 0
#> 4 4 0001 004 0001 004 2039 W NORTH AVE 2039 0
#> 5 5 0001 005 0001 005 2037 W NORTH AVE 2037 0
#> 6 6 0001 006 0001 006 2035 W NORTH AVE 2035 0
#> 7 7 0001 007 0001 007 2033 W NORTH AVE 2033 0
#> 8 8 0001 008 0001 008 2031 W NORTH AVE 2031 0
#> 9 9 0001 009 0001 009 2029 W NORTH AVE 2029 0
#> 10 10 0001 010 0001 010 2027 W NORTH AVE 2027 0
#> STDIRPRE ST_NAME ST_TYPE ZIP_CODE TAXBASE FULLCASH
#> 1 W NORTH AVE 21217 161800 0
#> 2 W NORTH AVE 21217 49500 0
#> 3 W NORTH AVE 21217 11000 0
#> 4 W NORTH AVE 21217 11000 0
#> 5 W NORTH AVE 21217 23000 0
#> 6 W NORTH AVE 21217 8000 0
#> 7 W NORTH AVE 21217 23000 0
#> 8 W NORTH AVE 21217 11000 0
#> 9 W NORTH AVE 21217 23000 0
#> 10 W NORTH AVE 21217 23000 0
#> PERMHOME NO_IMPRV ZONECODE SALEPRIC SALEDATE OWNER_ABBR
#> 1 N NA C-1 5000 11282005 NA
#> 2 N NA C-1 5000 11282005 NA
#> 3 N NA OR-1* 0 10171975 NA
#> 4 N NA OR-1* 0 11132014 NA
#> 5 N NA OR-1* 70000 06042021 NA
#> 6 N NA OR-1* 62000 10182021 NA
#> 7 N NA OR-1* 2120 06072006 NA
#> 8 N NA OR-1* 2000 12152021 NA
#> 9 N NA OR-1* 10000 03081995 NA
#> 10 N NA OR-1* 52500 08162021 NA
#> OWNER_1 OWNER_2
#> 1 CHONG, YON UN
#> 2 CHONG, YON UN
#> 3 NEW YORK INC
#> 4 MUHAMMAD, ANNA AVON
#> 5 THE WHITE CARR DEVELOPMENT GROUP, LLC
#> 6 WHITE CARR DEVELOPMENT GROUP LLC, THE
#> 7 NORTH AVENUE REVELATION PROJECT, LLC.
#> 8 VERONA ENTERPRISE LLC
#> 9 DOUGLAS, VERONA E
#> 10 JOHNSON, MONICA
#> OWNER_3 RESPAGCY PROPDESC
#> 1
#> 2
#> 3
#> 4
#> 5
#> 6
#> 7
#> 8
#> 9
#> 10
#> NEIGHBOR YEAR_BUILD EXTD_ZIP DHCDUSE1
#> 1 EASTERWOOD 1920 1221 3052
#> 2 EASTERWOOD 0 1221 0000
#> 3 EASTERWOOD 1915 1221 1008
#> 4 EASTERWOOD 1920 1221 0000
#> 5 EASTERWOOD 1920 1221 5410
#> 6 EASTERWOOD 1915 1221 1007
#> 7 EASTERWOOD 1915 1221 1008
#> 8 EASTERWOOD 1920 1221 1007
#> 9 EASTERWOOD 1915 1221 1008
#> 10 EASTERWOOD 1915 1221 1008
#> SDATLINK
#> 1 http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=001
#> 2 http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=002
#> 3 http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=003
#> 4 http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=004
#> 5 http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=005
#> 6 http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=006
#> 7 http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=007
#> 8 http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=008
#> 9 http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=009
#> 10 http://sdat.dat.maryland.gov/realproperty/pages/viewdetails.aspx?County=03&SearchType=ACCT&Ward=15&SECTION=37&BLOCK=0001&LOT=010
#> BLOCKPLAT
#> 1 https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 2 https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 3 https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 4 https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 5 https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 6 https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 7 https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 8 https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 9 https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> 10 https://gis.baltimorecity.gov/zoning/blockplats/0001.pdf
#> MAILTOADD ProjDemo ReleaseDate VBN_Issued
#> 1 2045 W NORTH AVE, 21217 NA NA NA
#> 2 2045 W NORTH AVE, 21217 NA NA NA
#> 3 SUITE 201, 21208 NA NA 1.327506e+12
#> 4 9726 MENDOZA RD RANDALLSTOWN, MD, 21133 NA NA 1.555750e+12
#> 5 2601 PENNSYLVANIA AVENUE, 21217 NA NA NA
#> 6 2601 PENNSYLVANIA AVE, 21217 NA NA NA
#> 7 2311 OAK DRIVE IJAMSVILLE, MD., 21754 NA NA 1.359728e+12
#> 8 5411 PARK HEIGHTS AVE, 21215 NA NA 1.435767e+12
#> 9 2029 W NORTH AVE, 21217 NA NA NA
#> 10 401 SPIKE CT FLORENCE, SC, 29505 NA NA NA
#> Name Shape_Length Shape_Area geoms
#> 1 NA 200.1155 1359.934 MULTIPOLYGON (((-76.65113 3...
#> 2 NA 195.6736 1174.100 MULTIPOLYGON (((-76.65113 3...
#> 3 NA 196.0319 1188.322 MULTIPOLYGON (((-76.65101 3...
#> 4 NA 196.3390 1201.348 MULTIPOLYGON (((-76.65096 3...
#> 5 NA 194.0874 1106.553 MULTIPOLYGON (((-76.65091 3...
#> 6 NA 197.8996 1267.556 MULTIPOLYGON (((-76.65086 3...
#> 7 NA 194.7870 1136.311 MULTIPOLYGON (((-76.65088 3...
#> 8 NA 194.9372 1142.615 MULTIPOLYGON (((-76.65083 3...
#> 9 NA 193.6459 1088.861 MULTIPOLYGON (((-76.65072 3...
#> 10 NA 199.1617 1320.407 MULTIPOLYGON (((-76.65066 3...
Created on 2022-06-29 by the reprex package (v2.0.1)
I think I have a semi-reasonable solution. I didn't solve the mystery of why these requests worked with {httr} and fail with {httr2} but it seems like the common solution is to set a maxRecords value that is well below the maxRecords value suggested by the layer information. I've added maxRecords to the examples in the README which both demonstrates how it works and is required to work with the sample URLs. I also implemented an error message that suggests trying maxRecords when a request fails at the end of getEsriFeatures (although I think there may be a few other issues that can prevent a request from completing so a broader list of recommendations would be preferable).
I added tests for esri2sf()
and esriIndex()
. I will also plan to add tests for esriInfo()
.
If this solution works, there are a couple of additional changes which would be needed if we wanted to complete a full migration from {httr} to {httr2}:
Help with the token functions would be welcome because I really haven't worked with authentication for ArcGIS REST API services before. This has been more work than I ever expected but it is exciting to see it is closer than ever!
I finally found documentation of the limit that is causing the issue:
Users not directly interacting with administrative or REST APIs (for example, using scripts or having a web application firewall configured) may encounter query limitations. In instances where a query string exceeds 2,048 characters in length, the query should be submitted as a POST request.
httr2 has a method parameter that allows POST requests although I still need to figure out how that works.
It took yet more trial and error but I fixed the issue with the too long URLs using httr2. This fixes the weird random errors but still leaves the removal of the remaining dependency for httr, updates to the URL checks (if needed), and improved documentation still to go.
And – because nothing about the ArcGIS REST API is evidently ever easy – it turns out that this new approach works 100% fine with the sample servers and doesn't work at all with the Baltimore city or Maryland state GIS portal.
I think I found a post with someone experiencing a similar issue and highlights a possible solution of adding some additional details to the body of the POST request: https://community.esri.com/t5/arcgis-rest-api-questions/arcgis-server-rest-api-http-post-returning-html/td-p/478832
The solution for this last issue was way easier than I expected: swapping httr2::req_body_json()
for httr2::req_body_form()
seems like it did the trick. I also added a partial implementation of the API for working with GeocodeServer URLs. This has been a round-about process but I think the major problems are finally taken care of.
Any chance you're free this fall, @jacpete, to come up for air and review a new pull request for this issue?
I went a good bit beyond the original scope so my current fork includes a few new experimental functions (e.g. esrisearch()
and esri2rast()
and a rewrite of the URL validation functions to support content URLs for ArcGIS.com but I just dropped the dependency on {httr}. Some tests need to be updated (ESRI retired the sampleserver1.arcgisonline.com
server). generateToken()
may also need some testing and tweaks but I think I ported it over correctly.
If you're still tied up with your actual work, I can definitely understand but it would be great to finish this up and migrate the improvements back into the package.
Hello @jacpete + @yonghah – I just wanted to give you a heads up that, since it seems like you don't have the time to manage contributions to esri2sf at this point, I'm planning to migrate my fork to a separate repository with a new name. I'll make sure I credit appropriately per the MIT license and explore changing function names to avoid a namespace conflict in the future. Let me know if you have any questions or suggestions!
As discussed previously, this is the issue to support discussion of a forthcoming pull request to switch the package over from {httr} to {httr2}. Simultaneous with making these changes, I also made several other significant (more or less) changes:
geometry
parameter toesri2sf()
to begin the process of allowing other types of spatial filters (not just bounding boxes)esriIndex()
function described in #39esriInfo()
function that includes a sitemap option that provides similar (but more limited) functionality as esriIndexA few things to highlight:
esri2df()
andesri2sf()
.esriInfo()
was very easy to create after I hadesriRequest()
set up.I noticed a few issues that should be resolved before merging the pull request:
httr2::req_perform()
may be helpful to get the package to work better with larger requests (I noticed it was failing with some larger requests that it makes just fine right now)I expect there are a few other changes or bugs that I'm forgetting to mention but hopefully this is enough to get a review started.