ropensci / allodb

An R package for biomass estimation at extratropical forest plots.
https://docs.ropensci.org/allodb/
GNU General Public License v3.0
36 stars 11 forks source link

Table arrangement and conventions #85

Closed teixeirak closed 5 years ago

teixeirak commented 5 years ago

Non-urgent question for @gonzalezeb - Can we move equation_taxa be in the equations table? I recognize that its handy to have in sitespecies, but its also necessary to interpret the equations table, and its a property of the equation.

teixeirak commented 5 years ago

More broadly than the comment above, I'm finding the tables difficult to work with from a "user" standpoint, and in some cases I think they may be missing important information. Here's my running list (including modification of the one above):

maurolepore commented 5 years ago

Sorry for jumping in your conversation. Just thought it'd be useful to clarify that the individual tables are a low-level detail that users shouldn't need to know about. The individual tables are developer-oriented (i.e. to make our life easier by maintaining normalized data). Users can access all the information with a higher level interface such as the master() or whatever wrapper you think is useful. Just name the table you want users to have and we can create it on the fly from the low-level tables we maintain and provide an evocative wrapper. This approach follows how databases maintain tables and Views (more info at https://github.com/forestgeo/allodb/issues/78#issuecomment-473403301).

library(allodb)
library(tidyverse)

glimpse(
  allodb::master()
)
#> Joining `equations` and `sitespecies` by 'equation_id'; then `sites_info` by 'site'.
#> Observations: 769
#> Variables: 43
#> $ ref_id                               <chr> "jenkins_2004_cdod", "jen...
#> $ equation_id                          <chr> "2060ea", "2060ea", "a4d8...
#> $ equation_allometry                   <chr> "10^(1.1891+1.419*(log10(...
#> $ equation_form                        <chr> "10^(a+b*(log10(dbh^c)))"...
#> $ dependent_variable_biomass_component <chr> "Total aboveground biomas...
#> $ independent_variable                 <chr> "DBH", "DBH", "DBH", "DBH...
#> $ allometry_specificity                <chr> "Species", "Species", "Sp...
#> $ geographic_area                      <chr> "Ohio, USA", "Ohio, USA",...
#> $ dbh_min_cm                           <chr> "0.21", "0.21", "0.19", "...
#> $ dbh_max_cm                           <chr> "5.73", "5.73", "3.86", "...
#> $ sample_size                          <chr> NA, NA, NA, NA, NA, NA, N...
#> $ dbh_units_original                   <chr> "cm", "cm", "cm", "cm", "...
#> $ biomass_units_original               <chr> "g", "g", "g", "g", "g", ...
#> $ allometry_development_method         <chr> "harvest", "harvest", "ha...
#> $ regression_model                     <chr> NA, NA, NA, NA, NA, NA, N...
#> $ other_equations_tested               <chr> NA, NA, NA, NA, NA, NA, N...
#> $ log_biomass                          <chr> NA, NA, NA, NA, NA, NA, N...
#> $ bias_corrected                       <chr> "1", "1", "1", "1", "1", ...
#> $ bias_correction_factor               <chr> "1.056", "1.056", "1.016"...
#> $ notes_fitting_model                  <chr> NA, NA, NA, NA, NA, NA, N...
#> $ original_data_availability           <chr> NA, NA, NA, NA, NA, NA, N...
#> $ warning                              <chr> NA, NA, NA, NA, NA, NA, N...
#> $ site                                 <chr> "lilly dicky", "tyson", "...
#> $ family                               <chr> "Sapindaceae", "Sapindace...
#> $ species                              <chr> "Acer rubrum", "Acer rubr...
#> $ species_code                         <chr> "316", "acerub", "318", "...
#> $ life_form                            <chr> "Tree", "Tree", "Tree", "...
#> $ equation_group                       <chr> "Expert", "Expert", "Expe...
#> $ equation_taxa                        <chr> "Acer rubrum", "Acer rubr...
#> $ notes_on_species                     <chr> NA, NA, NA, NA, NA, NA, N...
#> $ wsg_id                               <chr> NA, NA, NA, NA, NA, NA, N...
#> $ wsg_specificity                      <chr> NA, NA, NA, NA, NA, NA, N...
#> $ id                                   <chr> NA, NA, NA, NA, NA, "34",...
#> $ Site                                 <chr> NA, NA, NA, NA, NA, "SCBI...
#> $ lat                                  <chr> NA, NA, NA, NA, NA, "38.8...
#> $ long                                 <chr> NA, NA, NA, NA, NA, "-78....
#> $ UTM_Zone                             <chr> NA, NA, NA, NA, NA, "17",...
#> $ UTM_X                                <chr> NA, NA, NA, NA, NA, "7475...
#> $ UTM_Y                                <chr> NA, NA, NA, NA, NA, "4308...
#> $ intertropical                        <chr> NA, NA, NA, NA, NA, "Othe...
#> $ size.ha                              <chr> NA, NA, NA, NA, NA, NA, N...
#> $ E                                    <chr> NA, NA, NA, NA, NA, "1.57...
#> $ wsg.site.name                        <chr> NA, NA, NA, NA, NA, NA, "...

As per how you can now quickly see the information you care about, use either the high level master() function or join whatever tables you are interested in:

library(allodb)
library(tidyverse)

equations %>% left_join(sitespecies)
#> Joining, by = "equation_id"
#> # A tibble: 769 x 32
#>    ref_id equation_id equation_allome~ equation_form dependent_varia~
#>    <chr>  <chr>       <chr>            <chr>         <chr>           
#>  1 jenki~ 2060ea      10^(1.1891+1.41~ 10^(a+b*(log~ Total abovegrou~
#>  2 jenki~ 2060ea      10^(1.1891+1.41~ 10^(a+b*(log~ Total abovegrou~
#>  3 jenki~ a4d879      10^(1.2315+1.63~ 10^(a+b*(log~ Total abovegrou~
#>  4 jenki~ a4d879      10^(1.2315+1.63~ 10^(a+b*(log~ Total abovegrou~
#>  5 jenki~ c59e03      exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#>  6 jenki~ c59e03      exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#>  7 jenki~ c59e03      exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#>  8 jenki~ c59e03      exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#>  9 jenki~ c59e03      exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#> 10 jenki~ c59e03      exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#> # ... with 759 more rows, and 27 more variables:
#> #   independent_variable <chr>, allometry_specificity <chr>,
#> #   geographic_area <chr>, dbh_min_cm <chr>, dbh_max_cm <chr>,
#> #   sample_size <chr>, dbh_units_original <chr>,
#> #   biomass_units_original <chr>, allometry_development_method <chr>,
#> #   regression_model <chr>, other_equations_tested <chr>,
#> #   log_biomass <chr>, bias_corrected <chr>, bias_correction_factor <chr>,
#> #   notes_fitting_model <chr>, original_data_availability <chr>,
#> #   warning <chr>, site <chr>, family <chr>, species <chr>,
#> #   species_code <chr>, life_form <chr>, equation_group <chr>,
#> #   equation_taxa <chr>, notes_on_species <chr>, wsg_id <chr>,
#> #   wsg_specificity <chr>

If you don't want to use R, and instead you prefer to explore the database online, we can certainly build a simple shiny app that joins the tables in the background while the user simple points and clicks. Something along the lines of https://shiny.rstudio.com/gallery/datatables-demo.html but where the check boxes don't refer to column to show but to tables to join.