ncss-tech / soilDB

soilDB: Simplified Access to National Cooperative Soil Survey Databases
http://ncss-tech.github.io/soilDB/
79 stars 19 forks source link

get_chorizon_from_SDA: concatenate many:1 texcl, lieutex within RV chtexturegrp #353

Closed brownag closed 1 month ago

brownag commented 2 months ago

This PR will resolve at least one case that can lead to issues like #348.

We already limit horizon texture group information for get_chorizon_from_SDA() to just representative, but about 2% of the time the RV group has multiple chtexture records and multiple texcl or lieutex associated with it. At least for the case of stratified textures this is permissible.

This PR concatenates texcl and lieutex columns in the result with a comma to make the result 1:1 with the RV chtexturegrp record. Thus it will be less likely to artificially duplicate horizons / cause some profiles in result SPC to be invalid.

In general, it is not correct to have multiple RVs marked--the point of the RV is to provide a single representative record. Component horizons with multiple RV texture groups (not "allowed" but does exist in rare cases) are not handled by this new aggregation step and still will result in duplication--as is the case with most cases where multiple RVs are marked.

This change means that when running fetchSDA() or similar with NASISDomainsAsFactor(TRUE) NA values will result for records that have concatenated texcl, rather than one of the 21 factor levels specified in aqp::SoilTextureLevels().

While this is a change in behavior, these days the automatic factor level setting is not default and probably not that widely used. My preference is that we empower users to assign logical factor settings to their data themselves, and deal with any missing/duplicate values prior to that. This may mean a vignette covering get_NASIS_column_metadata(), NASISChoiceList(), uncode() and similar methods may be worth developing.