Closed gonzalezeb closed 5 years ago
Great to know you are thinking about this issue. I note some things also as a reminder for further discussion.
I think it's crucial to ensure that the output of the equations is always in the same unit. That allows summarizing the resulting biomass across rows. For example, to sum the biomass of all species in a census the biomass of all species must be in the same unit.
Does any equation convert units? Assuming that no equation converts units, the output unit will only depend on the input unit. So the important question is this: Do different equations expect input in different units?
If all equations expect input in the same unit, then we can ask users to convert their data to whatever units we want. Users can convert their entire DBH
column with, for example, with measurements::conv_unit()
.
The problem, I think, is if different equations expect different units. We can't ask users to convert different rows of the DBH
column in different way. That's too hard to do and error prone. To deal with this problem I see two options:
I believe option one is the best solution: modify the original equation so the output is a standard metric (cm, kg). We already have dbh and biomass units from original pub listed in allodb_master (although not all of them need conversion) so it should be simple to do it programmatically (I will try first!).
We then can request users that their input DBH should be in cm.
Shortly I will add few more temperate sites to allodb_master so we can start working on that.
Actually, I think its better tow go with option two, which I see as simpler and less error-prone. I'd like to see a workflow like this:
The motivations for this option are:
Thanks for the feedback. In the light of @teixeirak suggestion (use solution 2 of my previous comment) you may disregard this comment. But I write it anyway because I had already devloped the example and it illustrates a useful function: dplyr::case_when()
.
RE: @gonzalezeb's https://github.com/forestgeo/allodb/issues/42#issuecomment-423271429, the implementation would look something like this:
library(glue)
library(dplyr)
# Dummy database showing that different equations expect different units.
db <- tribble(
~equations, ~units,
"2 * dbh", "mm",
"5 * dbh", "m"
)
db
#> # A tibble: 2 x 2
#> equations units
#> <chr> <chr>
#> 1 2 * dbh mm
#> 2 5 * dbh m
# Implementing solution 1 of my previous comment: The original equations remain the same
# but we create a copy that converts the output of all equations to the same unit -- based
# on the expected input-unit.
db_edited <- db %>%
mutate(equations_edited_cm = case_when(
units == "mm" ~ glue("{equations} / 10"),
units == "m" ~ glue("{equations} * 10")
)
)
db_edited
#> # A tibble: 2 x 3
#> equations units equations_edited_cm
#> <chr> <chr> <S3: glue>
#> 1 2 * dbh mm 2 * dbh / 10
#> 2 5 * dbh m 5 * dbh * 10
And this illustrates the implementation of solution 2 (my previous last comment above), by which we first calculate biomass with the original function and then convert the result to a consistent unit.
library(dplyr)
db <- tribble(
~equations, ~units,
"2 * dbh", "mm",
"5 * dbh", "m"
)
db
#> # A tibble: 2 x 2
#> equations units
#> <chr> <chr>
#> 1 2 * dbh mm
#> 2 5 * dbh m
# Little helper function to hide unimportant details
evaluate_column <- function(column, dbh = dbh) {
dummy_dbh <- 10
eval(parse(text = db[[column]]), envir = list(dbh = dummy_dbh))
}
# Implementation
mutate(db,
biomass_asis = evaluate_column("equations"),
biomass_cm = case_when(
units == "mm" ~ biomass_asis / 10,
units == "m" ~ biomass_asis * 10
)
)
#> # A tibble: 2 x 4
#> equations units biomass_asis biomass_cm
#> <chr> <chr> <dbl> <dbl>
#> 1 2 * dbh mm 50 5
#> 2 5 * dbh m 50 500
Created on 2018-09-20 by the reprex package (v0.2.1)
This summarizes all comments above and a meeting with @gonzalezeb.
The units may be different of the dbh
values from different users.
Right now, different equations in allodb expect dbh
values in different units.
Output should be always in the same unit. This allows summarizing the resulting biomass across rows. For example, to sum the biomass of all species in a census the biomass of all species must be in the same unit.
Ask users to provide dbh
values in a specific a unit, say, in centimeters: dbh_cm
. Users can convert units with e.g. measurements::conv_unit()
. This would keep the code focused, simple, and less prone to bugs.
For each equation, the code can use a conversion factor to convert the input (dbh_cm
) from cm to the unit that is specific for that equation: dbh_eq
. This conversion factor should be stored in the equations
table in a column, say, cf_dbh
.
To output biomass in consistent units, the code must finally convert the biomass from the equation-specific unit to "cm" (for consistency with the input we expect). This conversion factor should also be stored in the equations
table in the column, say, cf_bmss
.
(The actual numbers make no sense)
USER DATA EQUATIONS TABLE
--------- -------------------------------------
dbh_cm cf_dbh equation cf_bmss
<dbl> <dbl> <chr> <dbl>
--------- -------------------------------------
55 0.39 "888 + (dbh_eq)" 2.54
66 1 "999 + (dbh_eq)" 1
...
# Each row:
# * INTERNAL COMPUTATION BEFORE APLYING THE EQUAITON
dbh_cm * cf_dbh = dbh_eq
# * INTERNAL COMPUTATION AFTER APLYING THE EQUAITON
biomass_eq * cf_bmss = biomass_cm
This sounds good to me.
I just came across the quantities package, which may be handy for dealing with units: https://github.com/r-quantities/quantities/blob/master/README.md
@gonzalezeb , please see if this is getting close to what you think the code should do.
# EXPLORATION
# Here I just expresses my thoughts in code
library(tidyverse)
# * Good: The columns giving units have no misisng values
# * Unexpected: Some units are not of distance (e.g. cm) but of area (e.g. cm^2)
allodb::equations %>%
select(matches("unit")) %>%
unique()
#> # A tibble: 11 x 2
#> dbh_units_original biomass_units_original
#> <chr> <chr>
#> 1 cm g
#> 2 in g
#> 3 cm kg
#> 4 in lb
#> 5 mm kg
#> 6 in^2 lb
#> 7 in kg
#> 8 cm^2 g
#> 9 cm Mg
#> 10 cm gm
#> 11 m t
# * Conversion factors not in the table. I'll do the conversion myself
allodb::equations %>%
select(matches("unit|convert|correct|factor")) %>%
head(3)
#> # A tibble: 3 x 4
#> dbh_units_original biomass_units_orig~ bias_corrected bias_correction_fa~
#> <chr> <chr> <chr> <chr>
#> 1 cm g 1 1.056
#> 2 cm g 1 1.016
#> 3 cm g 1 included in model
# QUESTIONS
# 1. What should we do when `dbh_original_units` are in units of area, e.g.
# [cm^2]? For example, if the user inputs `DBH` in [cm], and the value of
# `dbh_units_original` is `cm^2`, Should the code simply square the input?
#
# 2. Wouldn't that square `DBH` twice if `equation_form` is `a * (DBH^2)^b`?
allodb::equations %>%
dplyr::select(matches("dbh_unit"), matches("equation_form")) %>%
dplyr::filter(grepl("2", dbh_units_original))
#> # A tibble: 4 x 2
#> dbh_units_original equation_form
#> <chr> <chr>
#> 1 in^2 a*(DBH^2)^b
#> 2 in^2 a*(DBH^2)^b
#> 3 in^2 a*(DBH^2)^b
#> 4 cm^2 a+b*BA
# EXAMPLE
# 3. Is this what you would do?
# User inputs dbh in [cm] (i.e. we arbitrary request [cm])
(cns <- tibble(dbh_cm = c(100, 200, 300)))
#> # A tibble: 3 x 1
#> dbh_cm
#> <dbl>
#> 1 100
#> 2 200
#> 3 300
# Here `dbh_unit_original` indicates that the dbh input to the equation should
# be in [in^2] and the output will be in [lb] (same as [lbs], right?)
eqn <- allodb::equations %>%
dplyr::select(matches("dbh_unit|biomass_unit|equation_allometry")) %>%
dplyr::filter(dbh_units_original == "in^2")
eqn
#> # A tibble: 3 x 3
#> equation_allometry dbh_units_original biomass_units_original
#> <chr> <chr> <chr>
#> 1 3.08355*(dbh^2)^1.14915 in^2 lb
#> 2 2.17565*(dbh^2)^1.2481 in^2 lb
#> 3 2.56795*(dbh^2)^1.18685 in^2 lb
# So we convert the input from [cm] to [in^2] and the output to from [lbs] to
# [g](for example)
cm_to_inch_squared <- function(x) {
measurements::conv_unit(x, from = "cm", to = "inch")^2
}
lbs_to_g <- function(x) {
measurements::conv_unit(x, from = "lbs", to = "g")
}
eqn %>%
mutate(
dbh = cm_to_inch_squared(cns$dbh_cm),
# We evaluate dbh in the original units
biomass_original = eval(parse(text = equation_allometry), envir = eqn),
# We convert the original output units to the units we choose (e.g. [g])
biomass = lbs_to_g(biomass_original)
)
#> # A tibble: 3 x 6
#> equation_allome~ dbh_units_origi~ biomass_units_o~ dbh
#> <chr> <chr> <chr> <dbl>
#> 1 3.08355*(dbh^2)~ in^2 lb 1550.
#> 2 2.17565*(dbh^2)~ in^2 lb 6200.
#> 3 2.56795*(dbh^2)~ in^2 lb 13950.
#> # ... with 2 more variables: biomass_original <dbl>, biomass <dbl>
Created on 2019-02-25 by the reprex package (v0.2.1)
allo_evaluate()
now warns that we still don't handle units.
You can see this in action at https://forestgeo.github.io/fgeo.biomass/ (search for allo_evaluate()
).
The units issue is one that should be solvable in a a few minutes. Do you need any guidance on this?
Do you need any guidance on this?
I can certainly move this forward but to be sure the result is correct I need @gonzalezeb or @teixeirak to answer these questions:
https://github.com/forestgeo/allodb/issues/42#issuecomment-467088166
If inputs are in units of area (cm2), we can either (1) calculate cm^2 before passing to the equation or (2) modify the equation so that the squaring takes place in the equation. The second option would be slightly tidier, but either works. I think the answer depends on how much work it would be to go back and change those equations. We (@gonzalezeb ) may need to check all of them anyway, as the example you gave where DHB is squared twice is suspicious. Of course, when the input is basal area (BA), that's pi*(DBH/2)^2.
where DHB is squared twice is suspicious
This is what I’m mainly asking. (I’m not concerned about the implementation. I’d go with the approach I showed in the example and we can refactor it anytime later.)
when the input is basal area (BA), that's pi*(DBH/2)^2
Do you prefer me to replace this via code in fgeo.biomass, or for you to change it in allodb? That is, wherever it now says “BA” replace it with “pi*(DBH/2)^2”?
I'd go ahead and do it in allodb (unless @gonzalezeb has any objection). Be sure to change the input units when you do that.
@gonzalezeb,
I assume that [t] means ton, right? Which of the following "?measurements::conv_unit_options
doesn't give details on this.
measurements::conv_unit_options$mass
#> [1] "Da" "fg" "pg" "ng" "ug"
#> [6] "mg" "g" "kg" "Mg" "Gg"
#> [11] "Tg" "Pg" "carat" "metric_ton" "oz"
#> [16] "lbs" "short_ton" "long_ton" "stone"
Created on 2019-03-06 by the reprex package (v0.2.1)
I though I changed all t to to metric _ton already.
@gonzalezeb, what unit is [gm]?
sort(unique(allodb::equations$biomass_units_original))
#> [1] "g" "gm" "kg" "lb" "Mg" "t"
Created on 2019-03-06 by the reprex package (v0.2.1)
And which of these do you think is the right replacement?
measurements::conv_unit_options$mass
#> [1] "Da" "fg" "pg" "ng" "ug"
#> [6] "mg" "g" "kg" "Mg" "Gg"
#> [11] "Tg" "Pg" "carat" "metric_ton" "oz"
#> [16] "lbs" "short_ton" "long_ton" "stone"
Created on 2019-03-06 by the reprex package (v0.2.1)
I though I changed all t to to metric _ton already
Likely I just need to update pull and reinstall allodb
FYI, I'm now working on replacing the units in allodb with the units that the measurements package knows of. This is to avoid manually inserting conversion factors and instead reuse available tools (i.e. measurements). I think this is the safest approach. You may or may not want to change that in allodb.
library(fgeo.biomass)
unique(
allodb::equations$dbh_units_original
)
#> [1] "cm" "in" "mm" "in^2" "cm^2" "m"
unique(
fixme_units(allodb::equations$dbh_units_original)
)
#> [1] "cm" "inch" "mm" "inch2" "cm2" "m"
unique(
fixme_units(allodb::equations$biomass_units_original)
)
#> [1] "g" "kg" "FIXME: Unknown unit"
#> [4] "Mg"
# All valid unites
valid_units()
#> $acceleration
#> [1] "mm_per_sec2" "cm_per_sec2" "m_per_sec2" "km_per_sec2"
#> [5] "grav" "inch_per_sec2" "ft_per_sec2" "mi_per_sec2"
#> [9] "kph_per_sec" "mph_per_sec"
#>
#> $angle
#> [1] "degree" "radian" "grad" "arcmin" "arcsec" "turn"
#>
#> $area
#> [1] "nm2" "um2" "mm2" "cm2" "m2" "hectare"
#> [7] "km2" "inch2" "ft2" "yd2" "acre" "mi2"
#> [13] "naut_mi2"
#>
#> $coordinate
#> [1] "dec_deg" "deg_dec_min" "deg_min_sec"
#>
#> $count
#> [1] "fmol" "pmol" "nmol" "umol" "mmol" "mol"
#>
#> $duration
#> [1] "nsec" "usec" "msec" "sec" "min" "hr" "day" "wk" "mon" "yr"
#> [11] "dec" "cen" "mil" "Ma"
#>
#> $energy
#> [1] "J" "kJ" "erg" "cal" "Cal" "Wsec" "kWh" "MWh" "BTU"
#>
#> $flow
#> [1] "ml_per_sec" "ml_per_min" "ml_per_hr" "l_per_sec" "l_per_min"
#> [6] "l_per_hr" "m3_per_sec" "m3_per_min" "m3_per_hr" "Sv"
#> [11] "gal_per_sec" "gal_per_min" "gal_per_hr" "ft3_per_sec" "ft3_per_min"
#> [16] "ft3_per_hr"
#>
#> $length
#> [1] "angstrom" "nm" "um" "mm" "cm" "dm"
#> [7] "m" "km" "inch" "ft" "yd" "fathom"
#> [13] "mi" "naut_mi" "au" "light_yr" "parsec" "point"
#>
#> $mass
#> [1] "Da" "fg" "pg" "ng" "ug"
#> [6] "mg" "g" "kg" "Mg" "Gg"
#> [11] "Tg" "Pg" "carat" "metric_ton" "oz"
#> [16] "lbs" "short_ton" "long_ton" "stone"
#>
#> $power
#> [1] "uW" "mW" "W" "kW" "MW"
#> [6] "GW" "erg_per_sec" "cal_per_sec" "cal_per_hr" "Cal_per_sec"
#> [11] "Cal_per_hr" "BTU_per_sec" "BTU_per_hr" "hp"
#>
#> $pressure
#> [1] "uatm" "atm" "Pa" "hPa" "kPa" "torr" "mmHg" "inHg"
#> [9] "cmH2O" "inH2O" "mbar" "bar" "dbar" "psi"
#>
#> $speed
#> [1] "mm_per_sec" "cm_per_sec" "m_per_sec" "km_per_sec"
#> [5] "inch_per_sec" "ft_per_sec" "kph" "km_per_hr"
#> [9] "mph" "mi_per_hr" "km_per_day" "mi_per_day"
#> [13] "knot" "mach" "light"
#>
#> $temperature
#> [1] "C" "F" "K" "R"
#>
#> $volume
#> [1] "ul" "ml" "dl" "l" "cm3"
#> [6] "dm3" "m3" "km3" "us_tsp" "us_tbsp"
#> [11] "us_oz" "us_cup" "us_pint" "us_quart" "us_gal"
#> [16] "inch3" "ft3" "mi3" "imp_tsp" "imp_tbsp"
#> [21] "imp_oz" "imp_cup" "imp_pint" "imp_quart" "imp_gal"
Created on 2019-03-06 by the reprex package (v0.2.1)
Sure, I was going to do it manually in the csv file but you already got it.
--We (@gonzalezeb ) may need to check all of them anyway, as the example you gave where DHB is squared twice is suspicious
Maybe I am misreading the papers where I am taking equations. For example, for Russell_2005 (equation Id=c94845) it defines: The regression equation for Point Reyes National Seashore was biomass (grams) = 51.68 +0.02 basal area (centimeters2) with an adjusted R2 = 0.53 and P= 0.01
Similar case for the Jenkins (jenkins_2004_cdod) equations.
@gonzalezeb, did you see this one? https://github.com/forestgeo/allodb/issues/42#issuecomment-470284990
I don't see 'gm' but in any case it should be 'g' for grams (I see 5 biomass units in the equation table: g, kg, lbs, Mg, metric_ton)
Note: Mg = metric_ton. (I tend to use Mg because it avoids any potential confusion with U.S. tons.)
@gonzalezeb ,
I'm reinstalling allodb to ensure I have the latest commits. I find 'gm' in equation_id == b30744
. Do you see the same?
remotes::install_github("forestgeo/allodb")
#> Skipping install of 'allodb' from a github remote, the SHA1 (6929cc95) has not changed since last install.
#> Use `force = TRUE` to force installation
library(tidyverse)
allodb::equations %>%
filter(biomass_units_original == "gm") %>%
select(equation_id, matches("unit"))
#> # A tibble: 1 x 3
#> equation_id dbh_units_original biomass_units_original
#> <chr> <chr> <chr>
#> 1 b30744 cm gm
Created on 2019-03-06 by the reprex package (v0.2.1)
@gonzalezeb,
We discussed that the code should expect users to provide dbh
in [cm]. There are a few equations that seem to need squared inches. How should we handle this? That is, where should the code take the second dimension to convert from a unit of lengh to a unit of area?
library(tidyverse)
allodb::equations %>%
filter(dbh_units_original == "in^2") %>%
select(dbh_units_original, matches("equ"))
#> # A tibble: 3 x 5
#> dbh_units_origi~ equation_allome~ equation_id equation_form
#> <chr> <chr> <chr> <chr>
#> 1 in^2 3.08355*(dbh^2)~ f3e023 a*(DBH^2)^b
#> 2 in^2 2.17565*(dbh^2)~ 8eca60 a*(DBH^2)^b
#> 3 in^2 2.56795*(dbh^2)~ e8ebc2 a*(DBH^2)^b
#> # ... with 1 more variable: other_equations_tested <chr>
For now I'm just dropping the rows where dbh
can't be converted to the unit expected by the equation (notice the last warning "#> Warning: Dropping 828 rows where units can't be converted").
library(fgeo.biomass)
census <- fgeo.biomass::scbi_tree1
species <- fgeo.biomass::scbi_species
census_species <- add_species(census, species, site = "scbi")
#> `sp` now stores Latin species names
allo_find(census_species)
#> Warning: Dropping 58 equations that can't be evaluated.
#> Identify failing equations with `fixme_pull_failing_eqn(allodb::master())`
#> Joining, by = c("sp", "site")
#> Warning: The input and output datasets have different number of rows:
#> * Input: 40283.
#> * Output: 30229.
#> Warning: Dropping 828 rows where units can't be converted
#> # A tibble: 29,401 x 11
#> eqn_type rowid site sp dbh equation_id eqn eqn_source
#> <chr> <int> <chr> <chr> <dbl> <chr> <chr> <chr>
#> 1 species 4 scbi nyss~ 53.1 8da09d 1.54~ default
#> 2 species 21 scbi liri~ 91.4 34fe5a 1.02~ default
#> 3 species 29 scbi acer~ 326. 7c72ed exp(~ default
#> 4 species 38 scbi frax~ 42.8 0edaff 0.16~ default
#> 5 species 72 scbi acer~ 289. 7c72ed exp(~ default
#> 6 species 77 scbi quer~ 251. 07dba7 1.56~ default
#> 7 species 79 scbi tili~ 187. 3f99ba 1.44~ default
#> 8 species 79 scbi tili~ 475 76d19b 0.00~ default
#> 9 species 84 scbi frax~ 170. 0edaff 0.16~ default
#> 10 species 89 scbi fagu~ 10.7 74186d 2.03~ default
#> # ... with 29,391 more rows, and 3 more variables:
#> # anatomic_relevance <chr>, dbh_unit <chr>, bms_unit <chr>
Created on 2019-03-06 by the reprex package (v0.2.1)
I'm reinstalling allodb to ensure I have the latest commits. I find 'gm' in
equation_id == b30744
. Do you see the same?No, I see 'g'. Committed in be8160f, line 144 (jan 16, 2019)
https://github.com/forestgeo/allodb/issues/42#issuecomment-470317291. This was my fault, I just checked the original reference and dbh unit should be inch. Same with this https://github.com/forestgeo/allodb/issues/42#issuecomment-470288242. dbh unit shoudl be cm. I fixed the equation table, committed here 33f8411
RE: https://github.com/forestgeo/allodb/issues/42#issuecomment-470349164
Great, thanks! No things are the way they ought to be, i.e. I no longer see 'gm' in equation_id == b30744
.
I was out of sync because I hadn't updated data/. To update data/ I need to run data-raw/data.R after you update any .csv file. Also I need to update the strored references in tests/testthat/. This extra step is a little annoying but It's a good way to force myself to slow down a bit and see precisely what's changed. For now, I'll leave the workflow untouched. Maybe I'll rethink later.
remotes::install_github("forestgeo/allodb")
#> Skipping install of 'allodb' from a github remote, the SHA1 (3fe53456) has not changed since last install.
#> Use `force = TRUE` to force installation
library(tidyverse)
allodb::equations %>%
filter(biomass_units_original == "gm") %>%
select(equation_id, matches("unit"))
#> # A tibble: 0 x 3
#> # ... with 3 variables: equation_id <chr>, dbh_units_original <chr>,
#> # biomass_units_original <chr>
Created on 2019-03-07 by the reprex package (v0.2.1)
Awesome!
Now that I'm in sync with allodb I confirm that I no longer see [in^2] or [cm^].
library(tidyverse)
allodb::equations %>%
select(matches("unit")) %>%
unique()
#> # A tibble: 8 x 2
#> dbh_units_original biomass_units_original
#> <chr> <chr>
#> 1 cm g
#> 2 inch g
#> 3 cm kg
#> 4 inch lbs
#> 5 mm kg
#> 6 inch kg
#> 7 cm Mg
#> 8 cm metric_ton
Created on 2019-03-07 by the reprex package (v0.2.1)
Following Krista's https://github.com/forestgeo/allodb/issues/42#issuecomment-470299961, should [metric_ton] be replaced by [Mg]?
This comment by Krista suggests that the output biomass should be in [kg]. @gonzalezeb and @teixeirak can you confirm?
Right now the output is in [g].
yes, output should be in [kg]
allo_evaluate()
now output biomass in [kg].
library(tidyverse)
library(fgeo.biomass)
census <- fgeo.biomass::scbi_tree1 %>% dplyr::sample_n(30)
species <- fgeo.biomass::scbi_species
with_equations <- census %>%
add_species(species, "scbi") %>%
allo_find()
#> `sp` now stores Latin species names
#> Joining, by = c("sp", "site")
#> Warning: The input and output datasets have different number of rows:
#> * Input: 30.
#> * Output: 21.
#> Converting `dbh` based on `dbh_unit`.
# Notice this message
allo_evaluate(with_equations)
#> Assuming `dbh` units in [cm] (to convert units see `?measurements::conv_unit()`).
#> `biomass` values are given in [kg].
#> # A tibble: 21 x 12
#> eqn_type rowid site sp dbh equation_id eqn eqn_source
#> <chr> <int> <chr> <chr> <dbl> <chr> <chr> <chr>
#> 1 species 7 scbi frax~ 77.3 973c05 2.36~ default
#> 2 species 26 scbi liri~ 60.6 34fe5a 1.02~ default
#> 3 genus 9 scbi cary~ 50.1 9c4cc9 10^(~ default
#> 4 genus 10 scbi cary~ 85.8 9c4cc9 10^(~ default
#> 5 genus 27 scbi ulmu~ 38.0 455839 2.04~ default
#> 6 mixed_h~ 2 scbi asim~ 26.3 ae65ed exp(~ default
#> 7 mixed_h~ 5 scbi asim~ 18.6 ae65ed exp(~ default
#> 8 mixed_h~ 11 scbi asim~ 14.4 ae65ed exp(~ default
#> 9 mixed_h~ 14 scbi carp~ 18.7 ae65ed exp(~ default
#> 10 mixed_h~ 18 scbi asim~ 13.6 ae65ed exp(~ default
#> # ... with 11 more rows, and 4 more variables: anatomic_relevance <chr>,
#> # dbh_unit <chr>, bms_unit <chr>, biomass <dbl>
Created on 2019-03-12 by the reprex package (v0.2.1)
I think the units issue is solved now but I just noticed that in the example right above it says: Assuming dbh
units in [cm] but scbi.tree1
dbh in mm.
Maybe we need to specify in fgeo.biomass right before Finding the best available allometric-equations that anyone using the package need to ensure that dbh units should be in cm (and give the function to do it)..
what do you think?
... Assuming dbh units in [cm] but scbi.tree1 dbh in mm.
Yeah, that's what I imagined as all other ForestGEO sites seem to also record dbh in [mm]. Considering that fgeo.biomass is focuses on ForestGEO sites, What is the reason you prefer to expect [cm]?
I think that fgeo.biomass should ensure that all units are converted to cm before passing to allodb. That is, input to allodb function should be in cm. Allodb can then convert cm to whatever units are used in the equation.
input ... should be in cm
Why expect input data in [cm] instead of [mm]?
I'm sure you have a good reason but I would love to know what it is. I ask because ForestGEO tables record dbh
in [mm]. If we expect [mm] most users will be ready to go. If we expect [cm] most users will need to first convert from [cm] to [mm].
... can then convert cm to whatever units are used in the equation.
This is already implemented (although in fgeo.biomass -- not in allodb)
It is true, most forestgeo sites record their dbh in mm, but some in cm. I was only inclined to use cm because it is more universal.
I agree. I suppose it doesn't matter. What's essential is that anyone using the package needs to input the units of their DBH measurements.
Okay, allo_find()
gains the argument dbh_unit
. For now it defaults to [mm] to favor ForestGEO users. If you still prefer [cm] I can change the default.
We can also default to NULL
and force all users to give a value. It's a bit annoying but certainly makes users think more carefully about what they are doing. Let me know if you prefer this option.
library(fgeo.biomass)
census <- dplyr::sample_n(fgeo.biomass::scbi_tree1, 30)
species <- fgeo.biomass::scbi_species
census_species <- census %>%
add_species(species, site = "scbi")
#> Adding `site`.
#> Overwriting `sp`; it now stores Latin species names.
#> Adding `rowid`.
with_equations_mm <- allo_find(census_species)
#> Assuming `dbh` data in [mm].
#> Joining, by = c("sp", "site")
#> Converting `dbh` based on `dbh_unit`.
with_equations_cm <- allo_find(census_species, dbh_unit = "cm")
#> Assuming `dbh` data in [cm].
#> Joining, by = c("sp", "site")
#> Converting `dbh` based on `dbh_unit`.
Created on 2019-03-13 by the reprex package (v0.2.1)
I doubt this would be the best place in the code to do it, but it is essential that users enter dbh units at some point. Alternatively, it would be easy to detect automatically which is used (minimum DBH values = 10 (mm) or 1 (cm).
I like the idea of guessing the units (with a message).
Yes, let’s do it. That will be less error prone. However, I’d add another check in case someone feeds in a census of a subset of trees >10cm (for example, scbi mortality census)— Min dbh < 1.1 & Max dbh < 500—> cm Min dbh > 9 & Max dbh > 500—> mm Max dbh is tricky- there are probably a few exceptions. It would help to check max DBHs at dome of the sites to refine this. Lutz et al 2018 may have appropriate info. Smallest max DBH will be at Mpala, Scotty Creek, or Palamanui. I believe the largest is at Sinharaja.
See forestgeo/fgeo.biomass#20
An issue (more of a reminder to myself) to tackle is to convert biomass units from original equations to the final output unit we want allodb to give (kg, Mg). That conversion factor (convert from inches, mm to cm, etc) should be incorporated in the equation. Of special attention is the DBH used to built the original allometry. We have two options: