ncss-tech / soilDB

soilDB: Simplified Access to National Cooperative Soil Survey Databases
http://ncss-tech.github.io/soilDB/
79 stars 19 forks source link

`.create_wide_reason` failing when `dsn` passed to `get_SDA_interpretation` #316

Closed kevinwolz closed 7 months ago

kevinwolz commented 8 months ago

The new functionality to get NCCPI subrules using get_SDA_interpretation() is now working for me when it gets data from SDA. However, it is failing for me when using a local SSURGO database. I am using THIS local database that is all of SSURGO., which is generally working for me with other soilDB functions.

MYDIR <- ""
soil_db <- DBI::dbConnect(RSQLite::SQLite(), file.path(MYDIR, "All_SSURGO_10_06_2023.gpkg"))

nccpi_from_SDA <- get_SDA_interpretation(rulename    = "NCCPI - National Commodity Crop Productivity Index (Ver 3.0)",
method      = "Dominant Component",
mukeys      = mukeys,
wide_reason = TRUE,
dsn = soil_db) 

Error in strsplit(x[[n]], "; ") : non-character argument

This is happening because the object res that is passed to .create_wide_reason has a column reason_NCCPINationalCommodityCropProductivityIndexVer30 that is 100% NAs. When this happens in a data.frame, R by defaults classifies that column as logical. So, the strsplit(x[[n]], "; ") line in .create_wide_reason is not happy because it is expecting a character vector.

Changing this line to strsplit(as.character(x[[n]]), "; ") solves the problem.

That fix aside, the more important question is why is this column 100% NAs when using the local database, when it is working just fine in the SDA query? @brownag mentioned in https://github.com/ncss-tech/soilDB/discussions/312 that not all soilDB functions are yet functioning robustly with local databases. Perhaps this a specific example of this?

brownag commented 8 months ago

Thanks @kevinwolz for pointing this out.

get_SDA_interpretation() method="Dominant Condition" was supposed to work with SQLite... but I broke something as I was messing around with the final formatting that caused the wide_reason routine to fail silently (or loudly in some cases, as you found). It appears there was a mismatch in the way the subrule ratings were being formatted and concatenated in the T-SQL v.s. SQLite queries that caused the parsing to fail, and subrule ratings to come back Not Rated/NA. Sorry about that, should now be fixed in a495936f.

I haven't tested this yet on the full set, but here is a small self-contained example that previously did not work properly for wide_reason=TRUE, and now works as expected:

library(soilDB)
destdir <- "~/FY24/SQLiteTest/"
soil_db <- file.path(destdir, "ca041.sqlite")
downloadSSURGO(areasymbols = "CA041", destdir = destdir)
createSSURGO(soil_db, exdir = destdir)

x <- get_SDA_interpretation(
    rulename    = "NCCPI - National Commodity Crop Productivity Index (Ver 3.0)",
    method      = "Dominant Component",
    areasymbols = "CA041",
    wide_reason = TRUE,
    dsn = soil_db
  )

Based on this issue I am going to add to the extended soilDB unit tests some things that help me to properly ensure the SQLite functionality is working, so will not close until that is implemented.

I have also added the as.character() bit, thanks for the explanation there... I run up against this kind of type mismatch a lot chasing down the edge cases. Now it will not have a hard error as you encountered at first, but in theory the fixed concatenation should all but ensure that a character anyway.

brownag commented 8 months ago

OK, @kevinwolz there is a much more general issue with the SSURGO Portal geopackage for CONUS. I thought it was odd you were getting all NA for the reason and my little test didnt have that issue... it turns out the above thing I fixed was a related, but different, issue than yours

My code above works with the createSSURGO() derived SQLite file, but it turns out the default SSURGO Portal behavior is using a reduced version of the cointerp table. The default is only the RV ratings and classes whereas the interplrc (the RV "reason" field) is missing, along with the low and high ratings. That is what we are using to come up with the "reason" field that gets parsed in the "wide reason" routine.

Note below that "Include Interpretation Sub Rules" is unchecked by default, and must have been unchecked for the big .gpkg I linked you to. image

Will follow up as I find more. For now this may mean that SDA is the only convenient "all of SSURGO" source to use, unless you want to download everything and re-create the database with subrules included.

EDIT: rather than rebuilding the whole thing, probably you could drop the cointerp table, then insert from the full cointerp CSV source, to create a hybrid.

kevinwolz commented 8 months ago

This makes me feel a lot better! I have been unable to get the subrules out of any of these local databases I have been playing with, and I thought it was me!

Thanks for reporting this.

EDIT: rather than rebuilding the whole thing, probably you could drop the cointerp table, then insert from the full cointerp CSV source, to create a hybrid.

Unfortunately, my database built from the CSV files also fails in the same way when trying to get the NCCPI sub-rules. Those must have been exported with the same box unchecked.

brownag commented 8 months ago

This makes me feel a lot better! I have been unable to get the subrules out of any of these local databases I have been playing with, and I thought it was me! [...] Unfortunately, my database built from the CSV files also fails in the same way when trying to get the NCCPI sub-rules. Those must have been exported with the same box unchecked.

My apologies for suggesting them in this context of getting full interpretation details. It is relatively rare folks need greater rule depth or reasons for interpretive maps of most interpretations, which is probably why this simplification has been promoted for these large prebuilt databases.

It seems the only ways to get them locally would be to 1) export cointerp from gSSURGO File Geodatabase (at least the ArcMap version of the gSSURGO tools produces a complete table); 2) build from SDA; or 3) download all the surveys from Web Soil Survey and feed them into SSURGO Portal.

It is about ~6.9M records for all SSURGO NCCPI v3.0 alone, of which ~5.7m have ruledepth>0, so even for a single interp a non-trivial amount of data to obtain.

kevinwolz commented 8 months ago

No worries, it is clear that I'm heading down a fringe path here!

In this case, I think option 2, building from SDA will work just fine for me. This is a much more limited use case for me than my broader needs for a local database.

brownag commented 7 months ago

I think this issue is resolved to the degree that it can be.

  1. If using a createSSURGO() SQLite source for dsn get_SDA_interpretation(wide_reason=TRUE) works.
  2. If using a SSURGO Portal SQLite source, the database needs to have been created with "Include Interpretation Sub-rules" checked in order for any subrule data to be available for "widening".
  3. The default "all SSURGO" geopackage available for download on Box for FY2024 does not include subrule ratings.
kevinwolz commented 6 months ago

@brownag FYI it seems like the FY2024 CSV file export was done with the subrules box checked, so get_SDA_interpretation(wide_reason = TRUE) is now working for that data source!