ncss-tech / SoilTaxonomy

A System of Soil Classification for Making and Interpreting Soil Surveys
https://ncss-tech.github.io/SoilTaxonomy/
GNU General Public License v3.0
14 stars 2 forks source link

data.tree wrapper functions #43

Open brownag opened 1 year ago

brownag commented 1 year ago

I've noticed it's not super easy to get the trees put back out in a clean text-based format and that it would be good to extend on @dylanbeaudette old examples that were in the readme.


Here is a quick sample (modified version of second old example) based on 13th edition keys and an order->subgroup path string.

library(SoilTaxonomy)
library(data.tree)
data("ST_higher_taxa_codes_13th", package = "SoilTaxonomy")
# create ST-style dataset from higher taxa codes
ST13 <- getTaxonAtLevel(ST_higher_taxa_codes_13th$taxon,
                        level = c("order", "suborder", "greatgroup", "subgroup"))
ST13 <- ST13[order(ST_higher_taxa_codes_13th$code),]
ST13 <- ST13[complete.cases(ST13),]
ST13$root <- "Soil Taxonomy (13th Edition)"
ST13$pathString <- with(ST13, paste0(root, "/", order, "/", suborder, "/", greatgroup, "/", subgroup))
n <- as.Node(ST13)
print(n, limit = NULL)

Ideas for output:

dylanbeaudette commented 1 year ago

Good ideas. I remember struggling with trying to balance intuitive vs. compact displays of these data. The hierarchy is "wide" and "shallow" so many of the standards methods for displaying trees break down. It might be best to display each soil order in its own tree. Added some slightly updated examples to misc/.

I wonder if the author of the data.tree package would be open to alternative tree listing styles.

dylanbeaudette commented 1 year ago

Since I've had such good luck with the maintainer of data.tree in the past, I tried posting some questions / ideas over there:

https://github.com/gluc/data.tree/issues/167

brownag commented 1 year ago

Cool, I have some ideas on that that won't require changes to data.tree.

To remove line numbers I am thinking I can make a subclass of {data.tree} Node that I can dispatch a custom S3 print method on. The print method could most simply cat() out the levelName contents (no line numbers) e.g.:


taxonTree <- function(...) {
# ...
  attr(n, "class") <- c("SoilTaxonNode", attr(n, "class"))
  invisible(n)
}

#' @export
print.SoilTaxonNode <- function(x, ...) {
  # print the tree without rownames
  res <- as.data.frame(x)
  cat(res$levelName, sep = "\n")
}
dylanbeaudette commented 1 year ago

Getting a little closer to the output from fs::dir_tree() with:

taxonTree(c('palexeralfs', 'rhodoxeralfs'), special.chars = c("\u2502", "\u2514", "\u2500 "))

However, we can't get the exact output without using an additional character (tree.R):

"h" = "\u2500",                   # horizontal
"v" = "\u2502",                   # vertical
"l" = "\u2514",
"j" = "\u251C"

Not sure, but this might require changes in data.tree.

brownag commented 1 year ago

I don't think this particular request requires changes to data.tree. Just a minor change to the print method.

Now this works well, thanks for the suggestion to emulate fs::dir_tree(), I originally was not really going for a direct clone

library(SoilTaxonomy)
taxonTree(c('palexeralfs', 'rhodoxeralfs'), special.chars = c("\u251c","\u2502", "\u2514", "\u2500 "))
#> Loading required namespace: data.tree
#> Soil Taxonomy                           
#>  └─ alfisols                            
#>      └─ xeralfs                         
#>          ├─ rhodoxeralfs                
#>          │   ├─ lithic rhodoxeralfs     
#>          │   ├─ vertic rhodoxeralfs     
#>          │   ├─ petrocalcic rhodoxeralfs
#>          │   ├─ calcic rhodoxeralfs     
#>          │   ├─ inceptic rhodoxeralfs   
#>          │   └─ typic rhodoxeralfs      
#>          └─ palexeralfs                 
#>              ├─ vertic palexeralfs      
#>              ├─ aquandic palexeralfs    
#>              ├─ andic palexeralfs       
#>              ├─ vitrandic palexeralfs   
#>              ├─ fragiaquic palexeralfs  
#>              ├─ aquic palexeralfs       
#>              ├─ petrocalcic palexeralfs 
#>              ├─ lamellic palexeralfs    
#>              ├─ psammentic palexeralfs  
#>              ├─ arenic palexeralfs      
#>              ├─ natric palexeralfs      
#>              ├─ fragic palexeralfs      
#>              ├─ calcic palexeralfs      
#>              ├─ plinthic palexeralfs    
#>              ├─ ultic palexeralfs       
#>              ├─ haplic palexeralfs      
#>              ├─ mollic palexeralfs      
#>              └─ typic palexeralfs
dylanbeaudette commented 1 year ago

Very cool, thanks. I kind of like this incantation:

taxonTree(c('xerorthents', 'rhodoxeralfs', 'endoaqualfs'), special.chars = c("\u251c","\u2502", "\u2570", "\u2500 "))
brownag commented 1 year ago

It might be nice to pick a unicode output we like as the default.

I was thinking ASCII might be a better default, but the package does use UTF-8 encoding per the DESCRIPTION, so there's no reason we couldn't have that. I like the above suggestion

To finish up this issue I will also need to abstract out the contents of the print() method to capture our transformed result for writing out as CSV and/or HTML