ncss-tech / SoilTaxonomy

A System of Soil Classification for Making and Interpreting Soil Surveys
https://ncss-tech.github.io/SoilTaxonomy/
GNU General Public License v3.0
14 stars 2 forks source link

parseFamily #9

Closed dylanbeaudette closed 1 year ago

dylanbeaudette commented 5 years ago

EDIT by AGB:

Dylan's concept drawing from #8: image

dylanbeaudette commented 2 years ago

A minor addition to current functionality: family differentiae split into both:

This would make it possible access elements by name that are used for any given taxa (ragged, named list), or smash multiple taxa into a single data.frame that is padded with NA in unused cases.

brownag commented 2 years ago

I think I get what you mean. I'll implement a couple things that come to mind that will need to be there to support this. Notably the family class info parsed from the keys should get linked to the appropriate NASIS domains

brownag commented 2 years ago

I started sketching this out in parseFamily.R--but commented out/temporarily removed the bits that add new dependencies /rely on soilDB 2.7.3+. Was going to submit a release today but I am gonna hold off until early next week probably after doing some more noodling.

brownag commented 2 years ago

I've made several updates and merged the changes that lower soilDB dependency into master branch.

Caught some bugs, found edge cases and cleaned up the output tables using this example code as a starting point:

library(soilDB)
library(SoilTaxonomy)
sc <- get_soilseries_from_NASIS()
scx <- subset(sc, sc$soilseriesstatus == "established")
system.time({x <- parse_family(scx$taxclname)})

Several new tests have been added for complicated family-level taxonomies.

I am still thinking on dealing with the "child" type family classes where it is potentially valid to have more than one comma-separated class (e.g. taxminalogyand taxfamother). Examples such as: "shallow, ortstein".

Currently this is handled , but puts non-standard values into standard column names. I may want to have a list column with individual elements having the official name and appending "_concat" or similar for the "flat" column data.

Also still unhandled/TBD are the mineralogies associated with "strongly contrasting" family classes e.g. "amorphic over isotic". However, currently our choice lists do have the strongly contrasting PSCs in them (for the combinations that are defined on p322 in keys)

brownag commented 2 years ago

An update: strongly contrasting particle size classes are in the domain choice lists, but the associated possible combinations of e.g. mineralogy class are not. Which makes sense: the latter is a combinatorial explosion, while the former is constrained to a (fairly large) list of specific conditions.

Regarding concatenation of family "other" class, mineralogy, etc. I am going to add a new argument flat=TRUE that by default will use the standard NASIS physical column names names with concatenated choice list items. When flat=FALSE rather than the concatenated result will return a list column. This will include parsing combination classes concatenated with " over ", such that the results map 1:1 with choice lists. The implied vertical order is ascending order... i.e. the first element is over the second

dylanbeaudette commented 2 years ago

Nice. I like this approach. I'd like to build the SoilWeb seriesTree application from these new tools, vs. the current approach.

brownag commented 1 year ago

All items in this issue have been resolved.

As part of https://github.com/ncss-tech/SoilTaxonomy/issues/38 parse_family() may be refactored and functionality split to support higher taxonomic classes as input, returning an analogous data.frame output.