ncss-tech / soilDB

soilDB: Simplified Access to National Cooperative Soil Survey Databases
http://ncss-tech.github.io/soilDB/
GNU General Public License v3.0
83 stars 19 forks source link

fetchOSD() auto-chunking #239

Closed dylanbeaudette closed 10 months ago

dylanbeaudette commented 2 years ago

The API used by fetchOSD() is GET-based, and therefore limited to the number of series-per request. Converting to POST is a start, but the function should auto-chunk as-needed.

brownag commented 10 months ago

@dylanbeaudette Would you be opposed to implementing this inside fetchOSD() via makeChunks() or similar, rather than changing the SoilWeb API endpoint to accept POST and some sort of JSON payload?

If you would still rather do the API upgrade, is this something you are planning on doing sometime soon?

dylanbeaudette commented 10 months ago

I think automatic chunking within fetchOSD() makes the most sense and would do the least possible harm.

image

Automatic chunking around 100 series names seems like a safe threshold to prevent GET constraints and limit the amount of data sent per request. I'll implement this.

brownag commented 10 months ago

Closed by https://github.com/ncss-tech/soilDB/commit/17d4808a8bf5defe94f1f6439e9cdf62c30fc5e1

Seems to work well:

library(soilDB)
s <- SDA_query("SELECT DISTINCT TOP 300 compname FROM component 
                WHERE compkind = 'series' AND compname NOT LIKE '%like%' AND compname NOT LIKE '%unnamed%'")$compname
#> single result set, returning a data.frame
x <- fetchOSD(s)
#> 3 requests for 300 total soil series
length(x)
#> [1] 297

# No OSD for these deactivated series
s[!toupper(s) %in% x$id]
#> [1] "Visalia"        "Parker Springs" "Orofino"