ncss-tech / soilDB

soilDB: Simplified Access to National Cooperative Soil Survey Databases
http://ncss-tech.github.io/soilDB/
GNU General Public License v3.0
81 stars 20 forks source link

get_SDA_property() Fails with too many mukeys? #228

Closed MatthieuStigler closed 2 years ago

MatthieuStigler commented 2 years ago

I tried to use get_SDA_property() with many mukeys (25000) and get an error. It seems that there is a small error in the code on how this is handled (see pull request https://github.com/ncss-tech/soilDB/pull/227), but more fundamentally, what is the issue here? Is it correct that I am just requesting too many mukeys at a time? But I see that there is a warnig for that case (Query string is too long) which I don't seem to be hitting?

Thanks!!

library(readr)
library(soilDB)

mukey_df <- readr::read_csv("https://raw.githubusercontent.com/MatthieuStigler/Misc/master/dataset_example/mukeys_list.csv")
#> Rows: 25000 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (2): mukey, n
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

vars_cnts <- c("sandtotal_r","silttotal_r","claytotal_r", "awc_r")# "aws050wta")

res <- soilDB::get_SDA_property(property = vars_cnts,
                                method = "Dominant Component (Numeric)", 
                                mukeys = mukey_df$mukey,
                                top_depth = 25,
                                bottom_depth = 50) 
#> Error : lexical error: invalid char in json text.
#>                                        <!DOCTYPE html>  <html lang="en
#>                      (right here) ------^
#> Error: lexical error: invalid char in json text.
#>                                        <!DOCTYPE html>  <html lang="en
#>                      (right here) ------^
res
#> Error in eval(expr, envir, enclos): object 'res' not found

Created on 2021-12-21 by the reprex package (v2.0.1)

brownag commented 2 years ago

Thanks for reporting this and sorry for any inconvenience.

There are limits on both the input query string as well as on the result. We can estimate if your query string is too long, but will not know if your result is going to be too long or too complex until the server attempts to process the request. In this case it is not the query string that is too long, but an error message / non-JSON response from the server.

If you set query_string=TRUE and paste the resulting query into the web form https://sdmdataaccess.nrcs.usda.gov/Query.aspx you can see the full message you are getting... which is not much more informative. image

Generally when folks want to do many mukeys/features in a query I tell them to chunk it up into smaller sets and iterate over the sets. Your code would work fine with chunks of 10,000 mukeys but seems to fail somewhere above 15000.

For example using soilDB::makeChunks() of n=5000:

library(readr)
library(soilDB)

mukey_df <- readr::read_csv("https://raw.githubusercontent.com/MatthieuStigler/Misc/master/dataset_example/mukeys_list.csv")
#> Rows: 25000 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (2): mukey, n
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

vars_cnts <- c("sandtotal_r","silttotal_r","claytotal_r", "awc_r")# "aws050wta")

chunks <- makeChunks(mukey_df$mukey, 5000)
res <- do.call('rbind', lapply(1:max(chunks), function(i)
       soilDB::get_SDA_property(property = vars_cnts,
                                method = "Dominant Component (Numeric)", 
                                mukeys = mukey_df$mukey[which(chunks == i)],
                                top_depth = 25,
                                bottom_depth = 50,
                                query_string = FALSE))) 
nrow(res)
#> [1] 24893
MatthieuStigler commented 2 years ago

great, thanks for the detailed explanation, much appreciated! So indeed, it was an issue querying too many mukeys, thanks also for showing a nice solution!