ncss-tech / SoilKnowledgeBase

Unified descriptions and structures for USDA-NRCS / National Cooperative Soil Survey / Soil and Plant Science Division standards.
https://ncss-tech.github.io/SoilKnowledgeBase/
7 stars 1 forks source link

add "Additional Data" section to OSD parsing code / output #38

Closed dylanbeaudette closed 2 years ago

dylanbeaudette commented 2 years ago

This is currently lumped into the "REMARKS" section.

brownag commented 2 years ago

That was by design, I think. Additional Data is not a required section in NSSH and is only on an as needed basis. We probably could parse it out but there are a variety of ways it gets called out. Will look into it.

dylanbeaudette commented 2 years ago

Ok thanks. I need to talk some more with SRSS about how their use / expectations for this field are.

brownag commented 2 years ago

So, about 1/3 (n=8639) of the OSDs have a readily parse-able ADDITIONAL DATA section--definitely worth trying to separate it out. About 150 OSDs have no remarks -- only additional data.

OSD_directory <- "../SoilKnowledgeBase/inst/extdata/OSD/"
s <- gsub(".json", "", basename(list.files(OSD_directory, recursive = TRUE)))

res <- soilDB::get_OSD(series = s, base_url = OSD_directory, result = "json")

table(grepl("additional data:", res$REMARKS, ignore.case = TRUE), useNA = "ifany")
#> FALSE  TRUE 
#> 15768  8639 

I'll be messing around with some of my core logic around the last sections / end of file handling in this PR: https://github.com/ncss-tech/SoilKnowledgeBase/pull/40

brownag commented 2 years ago

Additional Data section has been added, and Tabular Series Data where present are considered to be a type of additional data.