ropensci / allodb

An R package for biomass estimation at extratropical forest plots.
https://docs.ropensci.org/allodb/
GNU General Public License v3.0
36 stars 11 forks source link

create function to predicted biomass against DBH, by species #73

Closed teixeirak closed 3 years ago

teixeirak commented 5 years ago

@maurolepore,

In order to check current calculations, and for future users to be able to visualize what allodb is giving in terms of predicted biomass, we'll want to make the following plot type: x-axis: DBH (cm) y-axis: biomass (kg) one colored line or series of points for each species at a site, spanning the range of sizes observed at the site

maurolepore commented 5 years ago

Thanks!

spanning the range of sizes observed at the site

How would you cut the dhb range for each species? Here are some alternatives:

dbh_species1 <- c(11.1, 11.3, 15, 25.8, 25.9, 27, 30.1, 33)

ggplot2::cut_number(dbh_species1, 3)
#> [1] [11.1,18.6] [11.1,18.6] [11.1,18.6] (18.6,26.6] (18.6,26.6] (26.6,33]  
#> [7] (26.6,33]   (26.6,33]  
#> Levels: [11.1,18.6] (18.6,26.6] (26.6,33]

ggplot2::cut_width(dbh_species1, 3)
#> [1] [10.5,13.5] [10.5,13.5] (13.5,16.5] (25.5,28.5] (25.5,28.5] (25.5,28.5]
#> [7] (28.5,31.5] (31.5,34.5]
#> 8 Levels: [10.5,13.5] (13.5,16.5] (16.5,19.5] (19.5,22.5] ... (31.5,34.5]

ggplot2::cut_interval(dbh_species1, 3)
#> [1] [11.1,18.4] [11.1,18.4] [11.1,18.4] (25.7,33]   (25.7,33]   (25.7,33]  
#> [7] (25.7,33]   (25.7,33]  
#> Levels: [11.1,18.4] (18.4,25.7] (25.7,33]

Created on 2019-03-12 by the reprex package (v0.2.1)

teixeirak commented 5 years ago

We don't want to bin sizes-- just display a continuous linear function of biomass as a function of DBH. It could be a scatter plot or a line plot of the equation. I'm including an example of how I'd expect this to look for SCBI, with the modification that I'd like a separate color for each species + legend.

scbi_Allometries

maurolepore commented 5 years ago

I'm tagging this as high priority because it's a good way to engage reviewers, which is crucial in the development process.

maurolepore commented 5 years ago

Here are two plots similar to what you describe, except that here I'm not using species names but species codes. Notice that agb is the one that comes with the dataset -- not the one we calculate. That'll come a little later.

ALTERNATIVE 1

image

ALTERNATIVE 2

image

image

image

teixeirak commented 5 years ago

I'd actually like to have both. The first is valuable for picking out anomolies, the second for seeing what's going on with each species.

teixeirak commented 5 years ago

Another note on this— option 1 should be the first one that users would see. Option 2 is if they want to dig in more.

maurolepore commented 5 years ago

@teixeirak and @gonzalezeb,

Here is an article exploring dbh vs. biomass (compared to dbh vs. agb). Please review and comment if this makes sense.

https://github.com/forestgeo/fgeo.biomass/blob/master/.buildignore/dbh-vs-biomass.md

Notice that we still lack some features that should produce better results. For example, For each row in a census dataset, the code currently sums the biomass for all equaitons associated to it (based on site and species). This should be correct except when the mulitple equations refer not to different parts of a tree but refer to trees of different diameter. We still don't handle dbh-specific equations.

Also, some species have no value and I'll be exploring why.

maurolepore commented 5 years ago

BTW,

teixeirak commented 5 years ago

To clarify, by biomass do you mean total aboveground biomass from allodb? Whereas 'agb' is what's currently in the SCBI data table (calculated based off tropical allometry)?

teixeirak commented 5 years ago

@gonzalezeb, assuming my interpretation there is correct, it appears that our equations for Platanus occidentalis and Nyssa sylvatica are off. Can you please check?

maurolepore commented 5 years ago

To clarify, by biomass do you mean total aboveground biomass from allodb? Whereas 'agb' is what's currently in the SCBI data table (calculated based off tropical allometry)?

Yes,

teixeirak commented 5 years ago

Okay, I take it your example plot (# 1 below; from the link Mauro sent) would look more like the example I sent (# 2 below; made by Valentine based on Erika's compilation of equations) when you include larger diameters (but exclude Nyssa sylvatica and Platanus occidentalis)? @gonzalezeb, plotting on the scale of # 1 below highlights huge divergence in predicted biomass for trees of ~50cm, both based on our allometries and the tropical allometries. Does this make sense? It's hard to tell which species are in which group. Oaks are definitely in the higher-biomass group, and maybe pines in the lower group? I'd assume this is driven by wood density?

image

image

teixeirak commented 5 years ago

Regarding review of these equations, I do not see that as the role for an intern. @maurolepore, you've already produced code to generate these plots. @gonzalezeb and I (to a lesser extent) should be the ones reviewing these plots to make sure output looks reasonable. In the longer term, issue #16 calls for code to flag equations that give unreasonable output. Right now/ to start (and maybe this is all we'll ever want), it may be worth writing a very simple script where each equation will be evaluated at ~3 dbh values (e.g., 1 cm, 50 cm, 100 cm; counting only those within range of the equation's DBH limits) and equations flagged if they don't fall within a range that we consider reasonable for that size (@gonzalezeb could provide guidance on those threshholds).

maurolepore commented 5 years ago

huge divergence in predicted biomass for trees of ~50cm, both based on our allometries and the tropical allometries. ... It's hard to tell which species are in which group.

Good point. Here I added a reference to tell those species appart.

source

image

image

image

maurolepore commented 5 years ago

Right now/ to start (and maybe this is all we'll ever want), it may be worth writing a very simple script where each equation will be evaluated at ~3 dbh values (e.g., 1 cm, 50 cm, 100 cm; counting only those within range of the equation's DBH limits) and equations flagged if they don't fall within a range that we consider reasonable for that size (@gonzalezeb could provide guidance on those threshholds)

Good idea. I'll follow up at https://github.com/forestgeo/fgeo.biomass/issues/22

gonzalezeb commented 5 years ago

I see the problems in the equations for Platanus, Nyssa and two others.. I am fixing it.. you are just to fast for me!

teixeirak commented 5 years ago

Thanks for the plot, @maurolepore! From this, its mainly the hickories and Tilia that are low. This doesn't make much sense, as their wood density tends to be on the high end. @gonzalezeb, what do you think?

maurolepore commented 5 years ago

equations flagged if they don't fall within a range that we consider reasonable for that size (@gonzalezeb could provide guidance on those threshholds)

How about creating that reasonable range from the deviation from the curve fit to dbh vs the preexisting agb value for for each species?

Althought the preexisting agb was calculated for tropical trees, the plots above show that it is likely close enough to the biomass we should get from allodb -- that is, enough to pick obvious errors in the equations.

teixeirak commented 5 years ago

We can try that as a rough first approximation.

maurolepore commented 5 years ago

I see the problems in the equations for Platanus, Nyssa and two others...

@gonzalezeb, can you confirm this are the changes you did? I still don't see changes in Platanus and Nyssa, Are those yet to come or am I missing something?

library(tidyverse)

edited_eqn <- c("333c34" , "e9d686")

allodb::master() %>% 
  filter(equation_id %in% edited_eqn) %>% 
  select(equation_id, site, species, equation_allometry)
#> Joining `allodb::equations` with `allodb::sitespecies` by 'equation_id'.
#> Then joining with `allodb::sites_info` by 'site'.
#> # A tibble: 10 x 4
#>    equation_id site              species             equation_allometry    
#>    <chr>       <chr>             <chr>               <chr>                 
#>  1 333c34      Lilly Dicky       Robinia pseudoacac~ 4.06014*(dbh^2)^1.052~
#>  2 333c34      SCBI              Robinia pseudoacac~ 4.06014*(dbh^2)^1.052~
#>  3 333c34      UMBC              Gleditsia triacant~ 4.06014*(dbh^2)^1.052~
#>  4 333c34      UMBC              Robinia pseudoacac~ 4.06014*(dbh^2)^1.052~
#>  5 333c34      Michigan Big Woo~ Robinia pseudoacac~ 4.06014*(dbh^2)^1.052~
#>  6 e9d686      Lilly Dicky       Robinia pseudoacac~ 0.77201*(dbh^2)^1.412~
#>  7 e9d686      SCBI              Robinia pseudoacac~ 0.77201*(dbh^2)^1.412~
#>  8 e9d686      UMBC              Gleditsia triacant~ 0.77201*(dbh^2)^1.412~
#>  9 e9d686      UMBC              Robinia pseudoacac~ 0.77201*(dbh^2)^1.412~
#> 10 e9d686      Michigan Big Woo~ Robinia pseudoacac~ 0.77201*(dbh^2)^1.412~

Created on 2019-03-15 by the reprex package (v0.2.1)

gonzalezeb commented 5 years ago

I haven't included them because it required a deeper search.. I am working them now.

gonzalezeb commented 5 years ago

I made changes for Platanus and Nyssa (and I am reviewing others, ie. quercus velutina, caryas, etc).

maurolepore commented 5 years ago

Thanks @gonzalezeb,

code

Platanus and Nyssa are looking good.

image . .

There seem to be a six outliers where calculated biomass is much higher than the reference agb. Let me know if anything jumps into your eyes

image

image

. .

But after excluding those potencial outliers the general picture looks reasonable.

image

gonzalezeb commented 5 years ago

But look at this when a test the largest quercus velutina at scbi

dbh=153/2.54 #cm to lbs 2.1457*(dbh^2.503) #our equation [1] 61172.03 #results in lbs [1] 27742.42 #result in kg

gonzalezeb commented 5 years ago

Same for eq c70dea

dbh=153/2.54 10^(1.00005+2.10621*(log10(dbh)))
[1] 56079.5 #in lbs 56079.5/2.205 [1] 25432.88 #in kg

teixeirak commented 5 years ago

We need to understand what’s going on with that divergence among species in Mauro’s figure. Why are some species so much higher than others?

On Mar 15, 2019, at 5:01 PM, Erika Gonzalez-Akre notifications@github.com<mailto:notifications@github.com> wrote:

Same for eq c70dea

dbh=153/2.54 10^(1.00005+2.10621*(log10(dbh))) [1] 56079.5 #in lbs 56079.5/2.205 [1] 25432.88 #in kg

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fforestgeo%2Fallodb%2Fissues%2F73%23issuecomment-473439702&data=02%7C01%7Cteixeirak%40si.edu%7C3a114c07d7624d65284808d6a98962b0%7C989b5e2a14e44efe93b78cdd5fc5d11c%7C1%7C0%7C636882804865700409&sdata=twGmWoUJLAgoGyn79aQurMuMFZaHoAbNGAPEiI5b%2F1M%3D&reserved=0, or mute the threadhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGD7jkmA78QGpqaTsOHNTGEuYMZzRK9nks5vXAojgaJpZM4brEIc&data=02%7C01%7Cteixeirak%40si.edu%7C3a114c07d7624d65284808d6a98962b0%7C989b5e2a14e44efe93b78cdd5fc5d11c%7C1%7C0%7C636882804865700409&sdata=iIDbBjWIF1RgC5HKmO%2BC5GVpzD3eYVh0wptr3%2BLupdw%3D&reserved=0.

maurolepore commented 5 years ago

Why are some species so much higher than others?

Just noting that it seems that it is not a problem of allodb equations, as the preexisting agb (shown in grey) follows the same pattern.

teixeirak commented 5 years ago

I wouldn’t put much stake in the calculations using the tropical equation. Also, do the same species separate out that way in the tropical calculations, or is it a different set?

We need to understand what drives that difference (mathematically) and whether it makes sense biologically. From my initial review, it doesn’t seem to (many of the lower biomass species have higher wood density.) Erika, what is the source of these equations?

maurolepore commented 5 years ago

do the same species separate out that way in the tropical calculations, or is it a different set?

I think so. See the grey reference here:

image

gonzalezeb commented 5 years ago

Those equations are from Jenkins 2004.

maurolepore commented 5 years ago
  1. Is this it? https://www.fs.fed.us/ne/durham/4104/papers/ne_gtr319_jenkins_and_others.pdf

  2. Are those the equations allodb will use as generic equations to fall back when species+site-specific equations are not available?

  3. Are they already in allodb?

gonzalezeb commented 5 years ago
  1. Yes, that's the pub.
  2. (and 3) Many of them already are in allodb. Not all equations are generic, many are species specific allometries (and a few genus specific), we consider them the best available (or 'Expert' selected) for a particular site. Note that many sites (specially in North America) don't have site specific equations, there are actually very few sites with those kind (still not in allodb).
maurolepore commented 5 years ago

I see. Then the grey and black dots should overlap even more than they do, right?

teixeirak commented 5 years ago

@maurolepore, could you please verify that you're plotting predictions for each species using only the equations specified for the size range?

maurolepore commented 5 years ago

[Are you] plotting predictions for each species using only the equations specified for the size range?

No, not yet.

https://github.com/forestgeo/allodb/issues/73#issuecomment-472918150 image

teixeirak commented 5 years ago

Okay, that might explain this divergence where some species (Liriodendron, Quercus) have much higher biomass at a given size than others (Juglans, Carya). From examining the Liriodendron equations, it looks like that might be going on there (although I get even more radical divergence). Could you please make it high priority to implement this restriction? It's pointless for us to try to evaluate the equations using these plots before that's done.

teixeirak commented 5 years ago

One thing about these plots remains very confusing. The grey dots indicate calculations from the tropical equations (based on wood density), correct? Tulip poplar has a much lower wood density than others (Quercus, Juglans, Carya), and I've confirmed that this is accurately reflected in the CTFS wood density data set used to calculate these values, so it doesn't make sense that it would have higher biomass at a given DBH.

maurolepore commented 5 years ago

Could you please make it high priority to implement this restriction?

Sure, I understand that getting accurate biomass values is high priority. But first I need to fully understand why some equations can't be computed at all. Erika just closed an issue that should make things much better (https://github.com/forestgeo/allodb/issues/78).

It's pointless for us to try to evaluate the equations using these plots before that's done.

I agree that discussing the precision of biomass values right now is like trying to run before we can walk. I'm sorry if my comment https://github.com/forestgeo/allodb/issues/73#issuecomment-472918150 wasn't clear.

However this thread has been super useful. These are some of the lessons we learned:

I highlight these lessons because I need you to help me help you. The more you engage in the process, the faster we will all move.

maurolepore commented 5 years ago

The grey dots indicate calculations from the tropical equations (based on wood density), correct?

Apparently not.

I thought so, but then Erika suggested that the agb column in the SCBI data (see below) -- comes from this paper on allometries for North America -- not from the paper on equations for tropical trees.

library(tidyverse)
#> Warning: package 'purrr' was built under R version 3.5.3

fgeo.biomass::scbi_tree1 %>% 
  select(agb, everything())
#> # A tibble: 40,283 x 20
#>        agb treeID stemID tag   StemTag sp    quadrat    gx    gy DBHID
#>      <dbl>  <int>  <int> <chr> <chr>   <chr> <chr>   <dbl> <dbl> <int>
#>  1 1.59e-3      1      1 10079 1       libe  0104     3.70  73       1
#>  2 9.74e-4      2      2 10168 1       libe  0103    17.3   58.9     3
#>  3 9.88e-4      3      3 10567 1       libe  0110     9    197.      5
#>  4 1.24e-1      4      4 12165 1       nysy  0122    14.2  428.      7
#>  5 5.12e-2      5      5 12190 1       havi  0122     9.40 436.      9
#>  6 1.45e-3      6      6 12192 1       havi  0122     1.30 434      13
#>  7 7.05e-3      7      7 12212 1       unk   0123    17.8  447.     15
#>  8 9.65e-3      8      8 12261 1       libe  0125    18    484.     17
#>  9 6.25e-3      9      9 12456 1       vipr  0130    18    598.     19
#> 10 5.41e-4     10     10 12551 1       astr  0132     5.60 628.     22
#> # ... with 40,273 more rows, and 10 more variables: CensusID <int>,
#> #   dbh <dbl>, pom <chr>, hom <dbl>, ExactDate <chr>, DFstatus <chr>,
#> #   codes <chr>, nostems <dbl>, date <dbl>, status <chr>

Created on 2019-03-18 by the reprex package (v0.2.1)

teixeirak commented 5 years ago

Given the uncertainty about the sources of the grey dots, let's just ignore those data.

teixeirak commented 5 years ago

Please alert me once we're correctly plotting all equations as intended, and I'll return to the review.

gonzalezeb commented 5 years ago

I thought so, but then Erika suggested that the agb column in the SCBI data (see below) -- comes from this paper on allometries for North America -- not from the paper on equations for tropical trees.

Sorry, my mistake.. I though you were asking for the equations from allodb. The agb in the scbi data (data that comes from the ctfs main database) is calculated based on tropical allometries using an old code in the old CTFS R package (as explained here).

On the metadata for the scbi.full or scbi.stem tables (here), agb is described as: agbAbove-ground-biomass of all stems on the tree, in Mg (= metric tons or 106 g). Note that agb=0 for dead trees.

maurolepore commented 5 years ago

@gonzalezeb,

FYI, in this update, the biomass of Liriodendron sp. appears too high, which Krista suggests might be solved once I support dbh-specific equations. Also see a few suspicious points over 40,000 [kg].

The most important accomplishment of today is that I can now explain all missing biomass values. I'm ready to move on to more sophisticated features that should result in more precise biomass.

maurolepore commented 5 years ago

Regarding Lliriodendron tulipifera, I wonder if the issue may be one of units. That may explain such a difference -- where the resulting biomass values` greater not by a little but by orders of magnitude.

teixeirak commented 5 years ago

Incorrect units could definitely cause this kind of problem, but I believe @gonzalezeb already checked that.

gonzalezeb commented 5 years ago

Interestingly, I did n't change that equation in my recent fixes so not sure why those large biomass biomass values didn't show before. At the same time, I just confirmed that equation 94f593 (the one giving those bad values) is incorrect so I will review it.

teixeirak commented 5 years ago

It wouldn't have shown up bad before if it was being applied only within the specified DBH range.

maurolepore commented 5 years ago

Thanks @gonzalezeb, it's now looking much better. It will continue to improve as we support the more features.

image

maurolepore commented 5 years ago

@gonzalezeb,

We now support dbh-specific equations. There is a new update at http://bit.ly/demo-dbh-vs-biomass but please see also https://github.com/forestgeo/fgeo.biomass/issues/27

image

image

teixeirak commented 5 years ago

Wonderful! At a first pass, this looks much more like what I expected. At a first pass, I don't see any problems.