Closed teixeirak closed 5 years ago
More broadly than the comment above, I'm finding the tables difficult to work with from a "user" standpoint, and in some cases I think they may be missing important information. Here's my running list (including modification of the one above):
[x] The equations table needs a field to indicate the taxa on which it was developed. This is (partially) given in the sitespecies table (equation_taxa
), but upon looking at the chojnacky_2014_ugbe equations, I noted that it includes only the taxa to which the species in question belongs and not full information about taxa used in the equation. For example, it (1) doesn't differentiate between "Abies < 0.35 spg" (eq. 4872ed) and Abies≥ 0.35 spg (eq. 74dd65) or (2) give the information that taxa other than the target taxa were included in the allometry ("Fabaceae/ Juglandaceae, other"; eq. 7f7777). This will be something for Erika to decide on, but here's my recommendation: (1) move equation_taxa
to the equations table, and modify to indicate when multiple taxa were included (e.g., "Fabaceae/ Juglandaceae"); (2) add a field to equations table original_equation_id
or such so that equations can be easily matched to those in the original publication (for chojnacky_2014_ugbe, this would be, e.g., "Abies≥ 0.35 spg"); (3) add a field in sitepecies briefly describing the type of trees for which the equation was developed (e.g., "Abies≥ 0.35 spg", "Fabaceae/ Juglandaceae, other", "Pinus saplings"). The purpose of this field will be for users to be able to evaluate how closely a certain species is matched to an equation without having to merge tables and examine several fields.
[x] Related to the above -- and (partly) solved if we follow my recommendations)-- is the issue of the sitespecies table indicating the appropriate range of dbh. We'd originally planned that this table would have its own minDBH and maxDBH fields, indicating the DBH range over which the equation would be applied. This may be different from the min and max in the equations table when (1) there's >1 option for a given DBH or (2) we decide that the best option is to extrapolate beyond the min/max of the equation that we have. We've now been writing/ planning the code to automate this (e.g., issue #17), so maybe that's not needed anymore (although potentially useful to communicate to the user what's being applied). If we implement my recommendation 3 in the item above, that could at least give a sense of which equations are being applied (e.g., "Pinus saplings" and "Pinus >5cm").
[x] The current warning
field (equations table) is currently notes/ warnings to ourselves. I thought the purpose of that field would be to generate warnings to the user when potentially untrustworthy equations are used. Rename notes
? Or do we need a field warning about the equations themselves?
[x] Related to the above, it seems that the sitespecies table would be an appropriate place for a warning
field. Warnings would be generated when there's no truly appropriate equation for a certain species. An example warning (currently in notes) might be "using allometries for Pseudotsuga menziesii as generic small conifer proxy for Pinus longaeva at utah". One would not generate such a warning when using the same equation for small Pseudotsuga menziesii.
[x] I noted that some ref_id
differ from the convention that we we were planning to follow (e.g., means_22_NA, whereas convention is [last name of first author_publication year_first letter of first four words in title]). What are these intended to capture? My guess may be a particular equation within a publication, in which case implementing my suggestion 2 in the top comment (add a field to equations table original_equation_id
) would allow you to specify the equation number within a publication while describing the source according to the convention we had planned.
Sorry for jumping in your conversation. Just thought it'd be useful to clarify that the individual tables are a low-level detail that users shouldn't need to know about. The individual tables are developer-oriented (i.e. to make our life easier by maintaining normalized data). Users can access all the information with a higher level interface such as the master()
or whatever wrapper you think is useful. Just name the table you want users to have and we can create it on the fly from the low-level tables we maintain and provide an evocative wrapper. This approach follows how databases maintain tables and Views (more info at https://github.com/forestgeo/allodb/issues/78#issuecomment-473403301).
library(allodb)
library(tidyverse)
glimpse(
allodb::master()
)
#> Joining `equations` and `sitespecies` by 'equation_id'; then `sites_info` by 'site'.
#> Observations: 769
#> Variables: 43
#> $ ref_id <chr> "jenkins_2004_cdod", "jen...
#> $ equation_id <chr> "2060ea", "2060ea", "a4d8...
#> $ equation_allometry <chr> "10^(1.1891+1.419*(log10(...
#> $ equation_form <chr> "10^(a+b*(log10(dbh^c)))"...
#> $ dependent_variable_biomass_component <chr> "Total aboveground biomas...
#> $ independent_variable <chr> "DBH", "DBH", "DBH", "DBH...
#> $ allometry_specificity <chr> "Species", "Species", "Sp...
#> $ geographic_area <chr> "Ohio, USA", "Ohio, USA",...
#> $ dbh_min_cm <chr> "0.21", "0.21", "0.19", "...
#> $ dbh_max_cm <chr> "5.73", "5.73", "3.86", "...
#> $ sample_size <chr> NA, NA, NA, NA, NA, NA, N...
#> $ dbh_units_original <chr> "cm", "cm", "cm", "cm", "...
#> $ biomass_units_original <chr> "g", "g", "g", "g", "g", ...
#> $ allometry_development_method <chr> "harvest", "harvest", "ha...
#> $ regression_model <chr> NA, NA, NA, NA, NA, NA, N...
#> $ other_equations_tested <chr> NA, NA, NA, NA, NA, NA, N...
#> $ log_biomass <chr> NA, NA, NA, NA, NA, NA, N...
#> $ bias_corrected <chr> "1", "1", "1", "1", "1", ...
#> $ bias_correction_factor <chr> "1.056", "1.056", "1.016"...
#> $ notes_fitting_model <chr> NA, NA, NA, NA, NA, NA, N...
#> $ original_data_availability <chr> NA, NA, NA, NA, NA, NA, N...
#> $ warning <chr> NA, NA, NA, NA, NA, NA, N...
#> $ site <chr> "lilly dicky", "tyson", "...
#> $ family <chr> "Sapindaceae", "Sapindace...
#> $ species <chr> "Acer rubrum", "Acer rubr...
#> $ species_code <chr> "316", "acerub", "318", "...
#> $ life_form <chr> "Tree", "Tree", "Tree", "...
#> $ equation_group <chr> "Expert", "Expert", "Expe...
#> $ equation_taxa <chr> "Acer rubrum", "Acer rubr...
#> $ notes_on_species <chr> NA, NA, NA, NA, NA, NA, N...
#> $ wsg_id <chr> NA, NA, NA, NA, NA, NA, N...
#> $ wsg_specificity <chr> NA, NA, NA, NA, NA, NA, N...
#> $ id <chr> NA, NA, NA, NA, NA, "34",...
#> $ Site <chr> NA, NA, NA, NA, NA, "SCBI...
#> $ lat <chr> NA, NA, NA, NA, NA, "38.8...
#> $ long <chr> NA, NA, NA, NA, NA, "-78....
#> $ UTM_Zone <chr> NA, NA, NA, NA, NA, "17",...
#> $ UTM_X <chr> NA, NA, NA, NA, NA, "7475...
#> $ UTM_Y <chr> NA, NA, NA, NA, NA, "4308...
#> $ intertropical <chr> NA, NA, NA, NA, NA, "Othe...
#> $ size.ha <chr> NA, NA, NA, NA, NA, NA, N...
#> $ E <chr> NA, NA, NA, NA, NA, "1.57...
#> $ wsg.site.name <chr> NA, NA, NA, NA, NA, NA, "...
As per how you can now quickly see the information you care about, use either the high level master()
function or join whatever tables you are interested in:
library(allodb)
library(tidyverse)
equations %>% left_join(sitespecies)
#> Joining, by = "equation_id"
#> # A tibble: 769 x 32
#> ref_id equation_id equation_allome~ equation_form dependent_varia~
#> <chr> <chr> <chr> <chr> <chr>
#> 1 jenki~ 2060ea 10^(1.1891+1.41~ 10^(a+b*(log~ Total abovegrou~
#> 2 jenki~ 2060ea 10^(1.1891+1.41~ 10^(a+b*(log~ Total abovegrou~
#> 3 jenki~ a4d879 10^(1.2315+1.63~ 10^(a+b*(log~ Total abovegrou~
#> 4 jenki~ a4d879 10^(1.2315+1.63~ 10^(a+b*(log~ Total abovegrou~
#> 5 jenki~ c59e03 exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#> 6 jenki~ c59e03 exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#> 7 jenki~ c59e03 exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#> 8 jenki~ c59e03 exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#> 9 jenki~ c59e03 exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#> 10 jenki~ c59e03 exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#> # ... with 759 more rows, and 27 more variables:
#> # independent_variable <chr>, allometry_specificity <chr>,
#> # geographic_area <chr>, dbh_min_cm <chr>, dbh_max_cm <chr>,
#> # sample_size <chr>, dbh_units_original <chr>,
#> # biomass_units_original <chr>, allometry_development_method <chr>,
#> # regression_model <chr>, other_equations_tested <chr>,
#> # log_biomass <chr>, bias_corrected <chr>, bias_correction_factor <chr>,
#> # notes_fitting_model <chr>, original_data_availability <chr>,
#> # warning <chr>, site <chr>, family <chr>, species <chr>,
#> # species_code <chr>, life_form <chr>, equation_group <chr>,
#> # equation_taxa <chr>, notes_on_species <chr>, wsg_id <chr>,
#> # wsg_specificity <chr>
If you don't want to use R, and instead you prefer to explore the database online, we can certainly build a simple shiny app that joins the tables in the background while the user simple points and clicks. Something along the lines of https://shiny.rstudio.com/gallery/datatables-demo.html but where the check boxes don't refer to column to show but to tables to join.
Non-urgent question for @gonzalezeb - Can we move equation_taxa be in the equations table? I recognize that its handy to have in sitespecies, but its also necessary to interpret the equations table, and its a property of the equation.