ropensci / allodb

An R package for biomass estimation at extratropical forest plots.
https://docs.ropensci.org/allodb/
GNU General Public License v3.0
36 stars 11 forks source link

Units problem to tackle #42

Closed gonzalezeb closed 5 years ago

gonzalezeb commented 6 years ago

An issue (more of a reminder to myself) to tackle is to convert biomass units from original equations to the final output unit we want allodb to give (kg, Mg). That conversion factor (convert from inches, mm to cm, etc) should be incorporated in the equation. Of special attention is the DBH used to built the original allometry. We have two options:

  1. leave it as it and the site will have to convert
  2. or better, alert the site that their input dbh should be in cm and we "rewrite" all equations to reflect that
maurolepore commented 6 years ago

Great to know you are thinking about this issue. I note some things also as a reminder for further discussion.

I think it's crucial to ensure that the output of the equations is always in the same unit. That allows summarizing the resulting biomass across rows. For example, to sum the biomass of all species in a census the biomass of all species must be in the same unit.

Does any equation convert units? Assuming that no equation converts units, the output unit will only depend on the input unit. So the important question is this: Do different equations expect input in different units?

If all equations expect input in the same unit, then we can ask users to convert their data to whatever units we want. Users can convert their entire DBH column with, for example, with measurements::conv_unit().

The problem, I think, is if different equations expect different units. We can't ask users to convert different rows of the DBH column in different way. That's too hard to do and error prone. To deal with this problem I see two options:

  1. Use not the original equations but a version of them that expect input in the same unit.
  2. Use the original equations, then automatically convert the known temporary-output unit to a consistent final-output unit. What users get is only the final, consistent unit.
gonzalezeb commented 6 years ago

I believe option one is the best solution: modify the original equation so the output is a standard metric (cm, kg). We already have dbh and biomass units from original pub listed in allodb_master (although not all of them need conversion) so it should be simple to do it programmatically (I will try first!).
We then can request users that their input DBH should be in cm.

Shortly I will add few more temperate sites to allodb_master so we can start working on that.

teixeirak commented 6 years ago

Actually, I think its better tow go with option two, which I see as simpler and less error-prone. I'd like to see a workflow like this:

  1. user inputs diameter data in cm (or maybe mm, in which case it gets converted to cm).
  2. based on the input (i.e., diameter) units of the original equation, DBH is multiplied by the appropriate conversion factor (e.g., all equations using inches get DBH*0.39)
  3. biomass is calculated according to original equation
  4. based on output units of original equation, biomass is multiplied by the appropriate conversion factor
teixeirak commented 6 years ago

The motivations for this option are:

  1. avoid potential errors in converting the equations
  2. avoid situations where original equation may have reduced accuracy under converted units
  3. make it easy to go back and compare equations against original publications
maurolepore commented 6 years ago

Thanks for the feedback. In the light of @teixeirak suggestion (use solution 2 of my previous comment) you may disregard this comment. But I write it anyway because I had already devloped the example and it illustrates a useful function: dplyr::case_when().

RE: @gonzalezeb's https://github.com/forestgeo/allodb/issues/42#issuecomment-423271429, the implementation would look something like this:


library(glue)
library(dplyr)

# Dummy database showing that different equations expect different units.
db <- tribble(
  ~equations, ~units,
   "2 * dbh",   "mm",
   "5 * dbh",    "m"
)

db
#> # A tibble: 2 x 2
#>   equations units
#>   <chr>     <chr>
#> 1 2 * dbh   mm   
#> 2 5 * dbh   m

# Implementing solution 1 of my previous comment: The original equations remain the same
# but we create a copy that converts the output of all equations to the same unit -- based
# on the expected input-unit.
db_edited <- db %>% 
  mutate(equations_edited_cm = case_when(
      units == "mm" ~ glue("{equations} / 10"),
      units == "m"  ~ glue("{equations} * 10")
    )
  )

db_edited
#> # A tibble: 2 x 3
#>   equations units equations_edited_cm
#>   <chr>     <chr> <S3: glue>         
#> 1 2 * dbh   mm    2 * dbh / 10       
#> 2 5 * dbh   m     5 * dbh * 10
maurolepore commented 6 years ago

And this illustrates the implementation of solution 2 (my previous last comment above), by which we first calculate biomass with the original function and then convert the result to a consistent unit.

library(dplyr)

db <- tribble(
  ~equations, ~units,
   "2 * dbh",   "mm",
   "5 * dbh",    "m"
)
db
#> # A tibble: 2 x 2
#>   equations units
#>   <chr>     <chr>
#> 1 2 * dbh   mm   
#> 2 5 * dbh   m

# Little helper function to hide unimportant details
evaluate_column <- function(column, dbh = dbh) {
  dummy_dbh <- 10
  eval(parse(text = db[[column]]), envir = list(dbh = dummy_dbh))
}

# Implementation
mutate(db,
  biomass_asis = evaluate_column("equations"),
  biomass_cm = case_when(
    units == "mm" ~ biomass_asis / 10,
    units ==  "m" ~ biomass_asis * 10
  )
)
#> # A tibble: 2 x 4
#>   equations units biomass_asis biomass_cm
#>   <chr>     <chr>        <dbl>      <dbl>
#> 1 2 * dbh   mm              50          5
#> 2 5 * dbh   m               50        500

Created on 2018-09-20 by the reprex package (v0.2.1)

maurolepore commented 6 years ago

Summary

This summarizes all comments above and a meeting with @gonzalezeb.

Issues:

  1. The units may be different of the dbh values from different users.

  2. Right now, different equations in allodb expect dbh values in different units.

  3. Output should be always in the same unit. This allows summarizing the resulting biomass across rows. For example, to sum the biomass of all species in a census the biomass of all species must be in the same unit.

Possible solution

  1. Ask users to provide dbh values in a specific a unit, say, in centimeters: dbh_cm. Users can convert units with e.g. measurements::conv_unit(). This would keep the code focused, simple, and less prone to bugs.

  2. For each equation, the code can use a conversion factor to convert the input (dbh_cm) from cm to the unit that is specific for that equation: dbh_eq . This conversion factor should be stored in the equations table in a column, say, cf_dbh.

  3. To output biomass in consistent units, the code must finally convert the biomass from the equation-specific unit to "cm" (for consistency with the input we expect). This conversion factor should also be stored in the equations table in the column, say, cf_bmss.

Example

(The actual numbers make no sense)

USER DATA      EQUATIONS TABLE
---------      -------------------------------------
 dbh_cm        cf_dbh          equation      cf_bmss
  <dbl>         <dbl>             <chr>        <dbl>
---------      -------------------------------------
    55           0.39   "888 + (dbh_eq)"        2.54
    66              1   "999 + (dbh_eq)"           1
...

# Each row:
# * INTERNAL COMPUTATION BEFORE APLYING THE EQUAITON
dbh_cm * cf_dbh = dbh_eq

# * INTERNAL COMPUTATION AFTER APLYING THE EQUAITON
biomass_eq * cf_bmss = biomass_cm
teixeirak commented 6 years ago

This sounds good to me.

maurolepore commented 6 years ago

I just came across the quantities package, which may be handy for dealing with units: https://github.com/r-quantities/quantities/blob/master/README.md

maurolepore commented 5 years ago

@gonzalezeb , please see if this is getting close to what you think the code should do.

# EXPLORATION
# Here I just expresses my thoughts in code

library(tidyverse)

# * Good: The columns giving units have no misisng values
# * Unexpected: Some units are not of distance (e.g. cm) but of area (e.g. cm^2)
allodb::equations %>%
  select(matches("unit")) %>%
  unique()
#> # A tibble: 11 x 2
#>    dbh_units_original biomass_units_original
#>    <chr>              <chr>                 
#>  1 cm                 g                     
#>  2 in                 g                     
#>  3 cm                 kg                    
#>  4 in                 lb                    
#>  5 mm                 kg                    
#>  6 in^2               lb                    
#>  7 in                 kg                    
#>  8 cm^2               g                     
#>  9 cm                 Mg                    
#> 10 cm                 gm                    
#> 11 m                  t

# * Conversion factors not in the table. I'll do the conversion myself
allodb::equations %>%
  select(matches("unit|convert|correct|factor")) %>%
  head(3)
#> # A tibble: 3 x 4
#>   dbh_units_original biomass_units_orig~ bias_corrected bias_correction_fa~
#>   <chr>              <chr>               <chr>          <chr>              
#> 1 cm                 g                   1              1.056              
#> 2 cm                 g                   1              1.016              
#> 3 cm                 g                   1              included in model

# QUESTIONS
# 1. What should we do when `dbh_original_units` are in units of area, e.g.
# [cm^2]? For example, if the user inputs `DBH` in [cm], and the value of
# `dbh_units_original` is `cm^2`, Should the code simply square the input?
# 
# 2. Wouldn't that square `DBH` twice if `equation_form` is `a * (DBH^2)^b`?
allodb::equations %>%
  dplyr::select(matches("dbh_unit"), matches("equation_form")) %>%
  dplyr::filter(grepl("2", dbh_units_original))
#> # A tibble: 4 x 2
#>   dbh_units_original equation_form
#>   <chr>              <chr>        
#> 1 in^2               a*(DBH^2)^b  
#> 2 in^2               a*(DBH^2)^b  
#> 3 in^2               a*(DBH^2)^b  
#> 4 cm^2               a+b*BA

# EXAMPLE
# 3. Is this what you would do?

# User inputs dbh in [cm] (i.e. we arbitrary request [cm])
(cns <- tibble(dbh_cm = c(100, 200, 300)))
#> # A tibble: 3 x 1
#>   dbh_cm
#>    <dbl>
#> 1    100
#> 2    200
#> 3    300

# Here `dbh_unit_original` indicates that the dbh input to the equation should
# be in [in^2] and the output will be in [lb] (same as [lbs], right?)
eqn <- allodb::equations %>%
  dplyr::select(matches("dbh_unit|biomass_unit|equation_allometry")) %>% 
  dplyr::filter(dbh_units_original == "in^2")
eqn
#> # A tibble: 3 x 3
#>   equation_allometry      dbh_units_original biomass_units_original
#>   <chr>                   <chr>              <chr>                 
#> 1 3.08355*(dbh^2)^1.14915 in^2               lb                    
#> 2 2.17565*(dbh^2)^1.2481  in^2               lb                    
#> 3 2.56795*(dbh^2)^1.18685 in^2               lb

# So we convert the input from [cm] to [in^2] and the output to from [lbs] to
# [g](for example)
cm_to_inch_squared <- function(x) {
  measurements::conv_unit(x, from = "cm", to = "inch")^2
}
lbs_to_g <- function(x) {
  measurements::conv_unit(x, from = "lbs", to = "g")
}

eqn %>% 
  mutate(
    dbh = cm_to_inch_squared(cns$dbh_cm),
    # We evaluate dbh in the original units
    biomass_original = eval(parse(text = equation_allometry), envir = eqn),
    # We convert the original output units to the units we choose (e.g. [g])
    biomass = lbs_to_g(biomass_original)
  )
#> # A tibble: 3 x 6
#>   equation_allome~ dbh_units_origi~ biomass_units_o~    dbh
#>   <chr>            <chr>            <chr>             <dbl>
#> 1 3.08355*(dbh^2)~ in^2             lb                1550.
#> 2 2.17565*(dbh^2)~ in^2             lb                6200.
#> 3 2.56795*(dbh^2)~ in^2             lb               13950.
#> # ... with 2 more variables: biomass_original <dbl>, biomass <dbl>

Created on 2019-02-25 by the reprex package (v0.2.1)

maurolepore commented 5 years ago

allo_evaluate() now warns that we still don't handle units.

You can see this in action at https://forestgeo.github.io/fgeo.biomass/ (search for allo_evaluate()).

teixeirak commented 5 years ago

The units issue is one that should be solvable in a a few minutes. Do you need any guidance on this?

maurolepore commented 5 years ago

Do you need any guidance on this?

I can certainly move this forward but to be sure the result is correct I need @gonzalezeb or @teixeirak to answer these questions:

https://github.com/forestgeo/allodb/issues/42#issuecomment-467088166

image

teixeirak commented 5 years ago

If inputs are in units of area (cm2), we can either (1) calculate cm^2 before passing to the equation or (2) modify the equation so that the squaring takes place in the equation. The second option would be slightly tidier, but either works. I think the answer depends on how much work it would be to go back and change those equations. We (@gonzalezeb ) may need to check all of them anyway, as the example you gave where DHB is squared twice is suspicious. Of course, when the input is basal area (BA), that's pi*(DBH/2)^2.

maurolepore commented 5 years ago

where DHB is squared twice is suspicious

This is what I’m mainly asking. (I’m not concerned about the implementation. I’d go with the approach I showed in the example and we can refactor it anytime later.)

maurolepore commented 5 years ago

when the input is basal area (BA), that's pi*(DBH/2)^2

Do you prefer me to replace this via code in fgeo.biomass, or for you to change it in allodb? That is, wherever it now says “BA” replace it with “pi*(DBH/2)^2”?

teixeirak commented 5 years ago

I'd go ahead and do it in allodb (unless @gonzalezeb has any objection). Be sure to change the input units when you do that.

maurolepore commented 5 years ago

@gonzalezeb,

I assume that [t] means ton, right? Which of the following "_ton" alternatives do you think is the right one? I have no idea what "short_ton" or "long_ton" means and unfortunetaly ?measurements::conv_unit_options doesn't give details on this.

measurements::conv_unit_options$mass
#>  [1] "Da"         "fg"         "pg"         "ng"         "ug"        
#>  [6] "mg"         "g"          "kg"         "Mg"         "Gg"        
#> [11] "Tg"         "Pg"         "carat"      "metric_ton" "oz"        
#> [16] "lbs"        "short_ton"  "long_ton"   "stone"

Created on 2019-03-06 by the reprex package (v0.2.1)

gonzalezeb commented 5 years ago

I though I changed all t to to metric _ton already.

maurolepore commented 5 years ago

@gonzalezeb, what unit is [gm]?

sort(unique(allodb::equations$biomass_units_original))
#> [1] "g"  "gm" "kg" "lb" "Mg" "t"

Created on 2019-03-06 by the reprex package (v0.2.1)

And which of these do you think is the right replacement?

measurements::conv_unit_options$mass
#>  [1] "Da"         "fg"         "pg"         "ng"         "ug"        
#>  [6] "mg"         "g"          "kg"         "Mg"         "Gg"        
#> [11] "Tg"         "Pg"         "carat"      "metric_ton" "oz"        
#> [16] "lbs"        "short_ton"  "long_ton"   "stone"

Created on 2019-03-06 by the reprex package (v0.2.1)

maurolepore commented 5 years ago

I though I changed all t to to metric _ton already

Likely I just need to update pull and reinstall allodb

maurolepore commented 5 years ago

FYI, I'm now working on replacing the units in allodb with the units that the measurements package knows of. This is to avoid manually inserting conversion factors and instead reuse available tools (i.e. measurements). I think this is the safest approach. You may or may not want to change that in allodb.

library(fgeo.biomass)

unique(
  allodb::equations$dbh_units_original
)
#> [1] "cm"   "in"   "mm"   "in^2" "cm^2" "m"

unique(
  fixme_units(allodb::equations$dbh_units_original)
)
#> [1] "cm"    "inch"  "mm"    "inch2" "cm2"   "m"

unique(
  fixme_units(allodb::equations$biomass_units_original)
)
#> [1] "g"                   "kg"                  "FIXME: Unknown unit"
#> [4] "Mg"

# All valid unites
valid_units()
#> $acceleration
#>  [1] "mm_per_sec2"   "cm_per_sec2"   "m_per_sec2"    "km_per_sec2"  
#>  [5] "grav"          "inch_per_sec2" "ft_per_sec2"   "mi_per_sec2"  
#>  [9] "kph_per_sec"   "mph_per_sec"  
#> 
#> $angle
#> [1] "degree" "radian" "grad"   "arcmin" "arcsec" "turn"  
#> 
#> $area
#>  [1] "nm2"      "um2"      "mm2"      "cm2"      "m2"       "hectare" 
#>  [7] "km2"      "inch2"    "ft2"      "yd2"      "acre"     "mi2"     
#> [13] "naut_mi2"
#> 
#> $coordinate
#> [1] "dec_deg"     "deg_dec_min" "deg_min_sec"
#> 
#> $count
#> [1] "fmol" "pmol" "nmol" "umol" "mmol" "mol" 
#> 
#> $duration
#>  [1] "nsec" "usec" "msec" "sec"  "min"  "hr"   "day"  "wk"   "mon"  "yr"  
#> [11] "dec"  "cen"  "mil"  "Ma"  
#> 
#> $energy
#> [1] "J"    "kJ"   "erg"  "cal"  "Cal"  "Wsec" "kWh"  "MWh"  "BTU" 
#> 
#> $flow
#>  [1] "ml_per_sec"  "ml_per_min"  "ml_per_hr"   "l_per_sec"   "l_per_min"  
#>  [6] "l_per_hr"    "m3_per_sec"  "m3_per_min"  "m3_per_hr"   "Sv"         
#> [11] "gal_per_sec" "gal_per_min" "gal_per_hr"  "ft3_per_sec" "ft3_per_min"
#> [16] "ft3_per_hr" 
#> 
#> $length
#>  [1] "angstrom" "nm"       "um"       "mm"       "cm"       "dm"      
#>  [7] "m"        "km"       "inch"     "ft"       "yd"       "fathom"  
#> [13] "mi"       "naut_mi"  "au"       "light_yr" "parsec"   "point"   
#> 
#> $mass
#>  [1] "Da"         "fg"         "pg"         "ng"         "ug"        
#>  [6] "mg"         "g"          "kg"         "Mg"         "Gg"        
#> [11] "Tg"         "Pg"         "carat"      "metric_ton" "oz"        
#> [16] "lbs"        "short_ton"  "long_ton"   "stone"     
#> 
#> $power
#>  [1] "uW"          "mW"          "W"           "kW"          "MW"         
#>  [6] "GW"          "erg_per_sec" "cal_per_sec" "cal_per_hr"  "Cal_per_sec"
#> [11] "Cal_per_hr"  "BTU_per_sec" "BTU_per_hr"  "hp"         
#> 
#> $pressure
#>  [1] "uatm"  "atm"   "Pa"    "hPa"   "kPa"   "torr"  "mmHg"  "inHg" 
#>  [9] "cmH2O" "inH2O" "mbar"  "bar"   "dbar"  "psi"  
#> 
#> $speed
#>  [1] "mm_per_sec"   "cm_per_sec"   "m_per_sec"    "km_per_sec"  
#>  [5] "inch_per_sec" "ft_per_sec"   "kph"          "km_per_hr"   
#>  [9] "mph"          "mi_per_hr"    "km_per_day"   "mi_per_day"  
#> [13] "knot"         "mach"         "light"       
#> 
#> $temperature
#> [1] "C" "F" "K" "R"
#> 
#> $volume
#>  [1] "ul"        "ml"        "dl"        "l"         "cm3"      
#>  [6] "dm3"       "m3"        "km3"       "us_tsp"    "us_tbsp"  
#> [11] "us_oz"     "us_cup"    "us_pint"   "us_quart"  "us_gal"   
#> [16] "inch3"     "ft3"       "mi3"       "imp_tsp"   "imp_tbsp" 
#> [21] "imp_oz"    "imp_cup"   "imp_pint"  "imp_quart" "imp_gal"

Created on 2019-03-06 by the reprex package (v0.2.1)

gonzalezeb commented 5 years ago

Sure, I was going to do it manually in the csv file but you already got it.

gonzalezeb commented 5 years ago

--We (@gonzalezeb ) may need to check all of them anyway, as the example you gave where DHB is squared twice is suspicious

Maybe I am misreading the papers where I am taking equations. For example, for Russell_2005 (equation Id=c94845) it defines: The regression equation for Point Reyes National Seashore was biomass (grams) = 51.68 +0.02 basal area (centimeters2) with an adjusted R2 = 0.53 and P= 0.01

Similar case for the Jenkins (jenkins_2004_cdod) equations.

maurolepore commented 5 years ago

@gonzalezeb, did you see this one? https://github.com/forestgeo/allodb/issues/42#issuecomment-470284990

gonzalezeb commented 5 years ago

I don't see 'gm' but in any case it should be 'g' for grams (I see 5 biomass units in the equation table: g, kg, lbs, Mg, metric_ton)

teixeirak commented 5 years ago

Note: Mg = metric_ton. (I tend to use Mg because it avoids any potential confusion with U.S. tons.)

maurolepore commented 5 years ago

@gonzalezeb ,

I'm reinstalling allodb to ensure I have the latest commits. I find 'gm' in equation_id == b30744. Do you see the same?

remotes::install_github("forestgeo/allodb")
#> Skipping install of 'allodb' from a github remote, the SHA1 (6929cc95) has not changed since last install.
#>   Use `force = TRUE` to force installation

library(tidyverse)

allodb::equations %>% 
  filter(biomass_units_original == "gm") %>% 
  select(equation_id, matches("unit"))
#> # A tibble: 1 x 3
#>   equation_id dbh_units_original biomass_units_original
#>   <chr>       <chr>              <chr>                 
#> 1 b30744      cm                 gm

Created on 2019-03-06 by the reprex package (v0.2.1)

maurolepore commented 5 years ago

@gonzalezeb,

We discussed that the code should expect users to provide dbh in [cm]. There are a few equations that seem to need squared inches. How should we handle this? That is, where should the code take the second dimension to convert from a unit of lengh to a unit of area?

library(tidyverse)

allodb::equations %>% 
  filter(dbh_units_original == "in^2") %>% 
  select(dbh_units_original, matches("equ"))
#> # A tibble: 3 x 5
#>   dbh_units_origi~ equation_allome~ equation_id equation_form
#>   <chr>            <chr>            <chr>       <chr>        
#> 1 in^2             3.08355*(dbh^2)~ f3e023      a*(DBH^2)^b  
#> 2 in^2             2.17565*(dbh^2)~ 8eca60      a*(DBH^2)^b  
#> 3 in^2             2.56795*(dbh^2)~ e8ebc2      a*(DBH^2)^b  
#> # ... with 1 more variable: other_equations_tested <chr>

For now I'm just dropping the rows where dbh can't be converted to the unit expected by the equation (notice the last warning "#> Warning: Dropping 828 rows where units can't be converted").

library(fgeo.biomass)

census <- fgeo.biomass::scbi_tree1
species <- fgeo.biomass::scbi_species
census_species <- add_species(census, species, site = "scbi")
#> `sp` now stores Latin species names

allo_find(census_species)
#> Warning: Dropping 58 equations that can't be evaluated.
#> Identify failing equations with `fixme_pull_failing_eqn(allodb::master())`
#> Joining, by = c("sp", "site")
#> Warning:   The input and output datasets have different number of rows:
#>   * Input: 40283.
#>   * Output: 30229.
#> Warning: Dropping 828 rows where units can't be converted
#> # A tibble: 29,401 x 11
#>    eqn_type rowid site  sp      dbh equation_id eqn   eqn_source
#>    <chr>    <int> <chr> <chr> <dbl> <chr>       <chr> <chr>     
#>  1 species      4 scbi  nyss~  53.1 8da09d      1.54~ default   
#>  2 species     21 scbi  liri~  91.4 34fe5a      1.02~ default   
#>  3 species     29 scbi  acer~ 326.  7c72ed      exp(~ default   
#>  4 species     38 scbi  frax~  42.8 0edaff      0.16~ default   
#>  5 species     72 scbi  acer~ 289.  7c72ed      exp(~ default   
#>  6 species     77 scbi  quer~ 251.  07dba7      1.56~ default   
#>  7 species     79 scbi  tili~ 187.  3f99ba      1.44~ default   
#>  8 species     79 scbi  tili~ 475   76d19b      0.00~ default   
#>  9 species     84 scbi  frax~ 170.  0edaff      0.16~ default   
#> 10 species     89 scbi  fagu~  10.7 74186d      2.03~ default   
#> # ... with 29,391 more rows, and 3 more variables:
#> #   anatomic_relevance <chr>, dbh_unit <chr>, bms_unit <chr>

Created on 2019-03-06 by the reprex package (v0.2.1)

gonzalezeb commented 5 years ago

I'm reinstalling allodb to ensure I have the latest commits. I find 'gm' in equation_id == b30744. Do you see the same?

No, I see 'g'. Committed in be8160f, line 144 (jan 16, 2019)

gonzalezeb commented 5 years ago

https://github.com/forestgeo/allodb/issues/42#issuecomment-470317291. This was my fault, I just checked the original reference and dbh unit should be inch. Same with this https://github.com/forestgeo/allodb/issues/42#issuecomment-470288242. dbh unit shoudl be cm. I fixed the equation table, committed here 33f8411

maurolepore commented 5 years ago

RE: https://github.com/forestgeo/allodb/issues/42#issuecomment-470349164

Great, thanks! No things are the way they ought to be, i.e. I no longer see 'gm' in equation_id == b30744.

I was out of sync because I hadn't updated data/. To update data/ I need to run data-raw/data.R after you update any .csv file. Also I need to update the strored references in tests/testthat/. This extra step is a little annoying but It's a good way to force myself to slow down a bit and see precisely what's changed. For now, I'll leave the workflow untouched. Maybe I'll rethink later.

remotes::install_github("forestgeo/allodb")
#> Skipping install of 'allodb' from a github remote, the SHA1 (3fe53456) has not changed since last install.
#>   Use `force = TRUE` to force installation

library(tidyverse)

allodb::equations %>% 
  filter(biomass_units_original == "gm") %>% 
  select(equation_id, matches("unit"))
#> # A tibble: 0 x 3
#> # ... with 3 variables: equation_id <chr>, dbh_units_original <chr>,
#> #   biomass_units_original <chr>

Created on 2019-03-07 by the reprex package (v0.2.1)

maurolepore commented 5 years ago

Awesome!

Now that I'm in sync with allodb I confirm that I no longer see [in^2] or [cm^].

library(tidyverse)

allodb::equations %>% 
  select(matches("unit")) %>% 
  unique()
#> # A tibble: 8 x 2
#>   dbh_units_original biomass_units_original
#>   <chr>              <chr>                 
#> 1 cm                 g                     
#> 2 inch               g                     
#> 3 cm                 kg                    
#> 4 inch               lbs                   
#> 5 mm                 kg                    
#> 6 inch               kg                    
#> 7 cm                 Mg                    
#> 8 cm                 metric_ton

Created on 2019-03-07 by the reprex package (v0.2.1)


Following Krista's https://github.com/forestgeo/allodb/issues/42#issuecomment-470299961, should [metric_ton] be replaced by [Mg]?

maurolepore commented 5 years ago

This comment by Krista suggests that the output biomass should be in [kg]. @gonzalezeb and @teixeirak can you confirm?

Right now the output is in [g].

gonzalezeb commented 5 years ago

yes, output should be in [kg]

maurolepore commented 5 years ago

allo_evaluate() now output biomass in [kg].

library(tidyverse)
library(fgeo.biomass)

census <- fgeo.biomass::scbi_tree1 %>% dplyr::sample_n(30)
species <- fgeo.biomass::scbi_species
with_equations <- census %>% 
  add_species(species, "scbi") %>% 
  allo_find()
#> `sp` now stores Latin species names
#> Joining, by = c("sp", "site")
#> Warning: The input and output datasets have different number of rows:
#> * Input: 30.
#> * Output: 21.
#> Converting `dbh` based on `dbh_unit`.

# Notice this message
allo_evaluate(with_equations)
#> Assuming `dbh` units in [cm] (to convert units see `?measurements::conv_unit()`).
#> `biomass` values are given in [kg].
#> # A tibble: 21 x 12
#>    eqn_type rowid site  sp      dbh equation_id eqn   eqn_source
#>    <chr>    <int> <chr> <chr> <dbl> <chr>       <chr> <chr>     
#>  1 species      7 scbi  frax~  77.3 973c05      2.36~ default   
#>  2 species     26 scbi  liri~  60.6 34fe5a      1.02~ default   
#>  3 genus        9 scbi  cary~  50.1 9c4cc9      10^(~ default   
#>  4 genus       10 scbi  cary~  85.8 9c4cc9      10^(~ default   
#>  5 genus       27 scbi  ulmu~  38.0 455839      2.04~ default   
#>  6 mixed_h~     2 scbi  asim~  26.3 ae65ed      exp(~ default   
#>  7 mixed_h~     5 scbi  asim~  18.6 ae65ed      exp(~ default   
#>  8 mixed_h~    11 scbi  asim~  14.4 ae65ed      exp(~ default   
#>  9 mixed_h~    14 scbi  carp~  18.7 ae65ed      exp(~ default   
#> 10 mixed_h~    18 scbi  asim~  13.6 ae65ed      exp(~ default   
#> # ... with 11 more rows, and 4 more variables: anatomic_relevance <chr>,
#> #   dbh_unit <chr>, bms_unit <chr>, biomass <dbl>

Created on 2019-03-12 by the reprex package (v0.2.1)

gonzalezeb commented 5 years ago

I think the units issue is solved now but I just noticed that in the example right above it says: Assuming dbh units in [cm] but scbi.tree1dbh in mm. Maybe we need to specify in fgeo.biomass right before Finding the best available allometric-equations that anyone using the package need to ensure that dbh units should be in cm (and give the function to do it).. what do you think?

maurolepore commented 5 years ago

... Assuming dbh units in [cm] but scbi.tree1 dbh in mm.

Yeah, that's what I imagined as all other ForestGEO sites seem to also record dbh in [mm]. Considering that fgeo.biomass is focuses on ForestGEO sites, What is the reason you prefer to expect [cm]?

teixeirak commented 5 years ago

I think that fgeo.biomass should ensure that all units are converted to cm before passing to allodb. That is, input to allodb function should be in cm. Allodb can then convert cm to whatever units are used in the equation.

maurolepore commented 5 years ago

input ... should be in cm

Why expect input data in [cm] instead of [mm]?

I'm sure you have a good reason but I would love to know what it is. I ask because ForestGEO tables record dbh in [mm]. If we expect [mm] most users will be ready to go. If we expect [cm] most users will need to first convert from [cm] to [mm].

... can then convert cm to whatever units are used in the equation.

This is already implemented (although in fgeo.biomass -- not in allodb)

gonzalezeb commented 5 years ago

It is true, most forestgeo sites record their dbh in mm, but some in cm. I was only inclined to use cm because it is more universal.

teixeirak commented 5 years ago

I agree. I suppose it doesn't matter. What's essential is that anyone using the package needs to input the units of their DBH measurements.

maurolepore commented 5 years ago

Okay, allo_find() gains the argument dbh_unit. For now it defaults to [mm] to favor ForestGEO users. If you still prefer [cm] I can change the default.

We can also default to NULL and force all users to give a value. It's a bit annoying but certainly makes users think more carefully about what they are doing. Let me know if you prefer this option.

library(fgeo.biomass)

census <- dplyr::sample_n(fgeo.biomass::scbi_tree1, 30)
species <- fgeo.biomass::scbi_species

census_species <- census %>% 
  add_species(species, site = "scbi")
#> Adding `site`.
#> Overwriting `sp`; it now stores Latin species names.
#> Adding `rowid`.

with_equations_mm <- allo_find(census_species)
#> Assuming `dbh` data in [mm].
#> Joining, by = c("sp", "site")
#> Converting `dbh` based on `dbh_unit`.

with_equations_cm <- allo_find(census_species, dbh_unit = "cm")
#> Assuming `dbh` data in [cm].
#> Joining, by = c("sp", "site")
#> Converting `dbh` based on `dbh_unit`.

Created on 2019-03-13 by the reprex package (v0.2.1)

teixeirak commented 5 years ago

I doubt this would be the best place in the code to do it, but it is essential that users enter dbh units at some point. Alternatively, it would be easy to detect automatically which is used (minimum DBH values = 10 (mm) or 1 (cm).

maurolepore commented 5 years ago

I like the idea of guessing the units (with a message).

https://github.com/forestgeo/fgeo.biomass/issues/20

teixeirak commented 5 years ago

Yes, let’s do it. That will be less error prone. However, I’d add another check in case someone feeds in a census of a subset of trees >10cm (for example, scbi mortality census)— Min dbh < 1.1 & Max dbh < 500—> cm Min dbh > 9 & Max dbh > 500—> mm Max dbh is tricky- there are probably a few exceptions. It would help to check max DBHs at dome of the sites to refine this. Lutz et al 2018 may have appropriate info. Smallest max DBH will be at Mpala, Scotty Creek, or Palamanui. I believe the largest is at Sinharaja.

maurolepore commented 5 years ago

See forestgeo/fgeo.biomass#20