ncss-tech / SoilKnowledgeBase

Unified descriptions and structures for USDA-NRCS / National Cooperative Soil Survey / Soil and Plant Science Division standards.
https://ncss-tech.github.io/SoilKnowledgeBase/
7 stars 1 forks source link

osd_to_json: rare incorrect labeling in output #10

Closed brownag closed 3 years ago

brownag commented 3 years ago

Incorrect labeling due to bad order and/or missing groups. The algorithm is likely getting confused somehow, as this is handled correctly even when things are out of order a very high percentage of the time.

It may be that the only way to fix the edgiest of cases is with some sort of post-processing of content versus the parsed results -- but I feel like when I look and it will be something wrong with the implementation.

An example is the ZADE series OSD JSON where everything after Geographic Setting is out of order [offset by one]

https://github.com/ncss-tech/SoilKnowledgeBase/blob/f2dfb547c70f12fff3ae7ba0f1ce76f2f4a98706/inst/extdata/OSD/Z/ZADE.json#L26-L54

If we pull up the OSD nothing pops out as being immediately wrong with it... until you see that GEOGRAPHICALLY ASSOCIATED SOILS is missing. They use a long list-form Competing section -- I suppose instead? -- pretty nice.

https://github.com/ncss-tech/OSDRegistry/blob/main/OSD/Z/ZADE.txt

brownag commented 3 years ago

The code at one point concatenated a list of numeric index vectors -- some of which could be zero length...

w/ 2ea007a vector is properly buffered with NA -- which makes subsequent stuff work right -- I think.

Going to chance it and run the refresh-extdata Action and see what changes... in a branch.

brownag commented 3 years ago

image