Closed teixeirak closed 3 years ago
Thanks!
spanning the range of sizes observed at the site
How would you cut the dhb range for each species? Here are some alternatives:
dbh_species1 <- c(11.1, 11.3, 15, 25.8, 25.9, 27, 30.1, 33)
ggplot2::cut_number(dbh_species1, 3)
#> [1] [11.1,18.6] [11.1,18.6] [11.1,18.6] (18.6,26.6] (18.6,26.6] (26.6,33]
#> [7] (26.6,33] (26.6,33]
#> Levels: [11.1,18.6] (18.6,26.6] (26.6,33]
ggplot2::cut_width(dbh_species1, 3)
#> [1] [10.5,13.5] [10.5,13.5] (13.5,16.5] (25.5,28.5] (25.5,28.5] (25.5,28.5]
#> [7] (28.5,31.5] (31.5,34.5]
#> 8 Levels: [10.5,13.5] (13.5,16.5] (16.5,19.5] (19.5,22.5] ... (31.5,34.5]
ggplot2::cut_interval(dbh_species1, 3)
#> [1] [11.1,18.4] [11.1,18.4] [11.1,18.4] (25.7,33] (25.7,33] (25.7,33]
#> [7] (25.7,33] (25.7,33]
#> Levels: [11.1,18.4] (18.4,25.7] (25.7,33]
Created on 2019-03-12 by the reprex package (v0.2.1)
We don't want to bin sizes-- just display a continuous linear function of biomass as a function of DBH. It could be a scatter plot or a line plot of the equation. I'm including an example of how I'd expect this to look for SCBI, with the modification that I'd like a separate color for each species + legend.
I'm tagging this as high priority because it's a good way to engage reviewers, which is crucial in the development process.
Here are two plots similar to what you describe, except that here I'm not using species names but species codes. Notice that agb is the one that comes with the dataset -- not the one we calculate. That'll come a little later.
ALTERNATIVE 1
ALTERNATIVE 2
I'd actually like to have both. The first is valuable for picking out anomolies, the second for seeing what's going on with each species.
Another note on this— option 1 should be the first one that users would see. Option 2 is if they want to dig in more.
@teixeirak and @gonzalezeb,
Here is an article exploring dbh
vs. biomass
(compared to dbh
vs. agb
). Please review and comment if this makes sense.
https://github.com/forestgeo/fgeo.biomass/blob/master/.buildignore/dbh-vs-biomass.md
Notice that we still lack some features that should produce better results. For example, For each row in a census dataset, the code currently sums the biomass for all equaitons associated to it (based on site and species). This should be correct except when the mulitple equations refer not to different parts of a tree but refer to trees of different diameter. We still don't handle dbh-specific equations.
Also, some species have no value and I'll be exploring why.
BTW,
I don't think the word "prediction" applies in this analysis. Am I missinterpreting what you want? Is the issue-title accurate?
This is the kind of anlaysis that I think an intern could start doing right now. Later, the code will have nicer functions and more features, but it would be great to have someone closer to the biology of these trees checking for scientific correctness as the code evolves.
To clarify, by biomass
do you mean total aboveground biomass from allodb? Whereas 'agb' is what's currently in the SCBI data table (calculated based off tropical allometry)?
@gonzalezeb, assuming my interpretation there is correct, it appears that our equations for Platanus occidentalis and Nyssa sylvatica are off. Can you please check?
To clarify, by biomass do you mean total aboveground biomass from allodb? Whereas 'agb' is what's currently in the SCBI data table (calculated based off tropical allometry)?
Yes,
biomass
: Calculated based on equations from allodb.agb
: From the agb
column in the SCBI data -- presumably calculated using equations for tropical species. Should be a good reference against which to compare the results using equations from allodb.Okay, I take it your example plot (# 1 below; from the link Mauro sent) would look more like the example I sent (# 2 below; made by Valentine based on Erika's compilation of equations) when you include larger diameters (but exclude Nyssa sylvatica and Platanus occidentalis)? @gonzalezeb, plotting on the scale of # 1 below highlights huge divergence in predicted biomass for trees of ~50cm, both based on our allometries and the tropical allometries. Does this make sense? It's hard to tell which species are in which group. Oaks are definitely in the higher-biomass group, and maybe pines in the lower group? I'd assume this is driven by wood density?
Regarding review of these equations, I do not see that as the role for an intern. @maurolepore, you've already produced code to generate these plots. @gonzalezeb and I (to a lesser extent) should be the ones reviewing these plots to make sure output looks reasonable. In the longer term, issue #16 calls for code to flag equations that give unreasonable output. Right now/ to start (and maybe this is all we'll ever want), it may be worth writing a very simple script where each equation will be evaluated at ~3 dbh values (e.g., 1 cm, 50 cm, 100 cm; counting only those within range of the equation's DBH limits) and equations flagged if they don't fall within a range that we consider reasonable for that size (@gonzalezeb could provide guidance on those threshholds).
huge divergence in predicted biomass for trees of ~50cm, both based on our allometries and the tropical allometries. ... It's hard to tell which species are in which group.
Good point. Here I added a reference to tell those species appart.
Right now/ to start (and maybe this is all we'll ever want), it may be worth writing a very simple script where each equation will be evaluated at ~3 dbh values (e.g., 1 cm, 50 cm, 100 cm; counting only those within range of the equation's DBH limits) and equations flagged if they don't fall within a range that we consider reasonable for that size (@gonzalezeb could provide guidance on those threshholds)
Good idea. I'll follow up at https://github.com/forestgeo/fgeo.biomass/issues/22
I see the problems in the equations for Platanus, Nyssa and two others.. I am fixing it.. you are just to fast for me!
Thanks for the plot, @maurolepore! From this, its mainly the hickories and Tilia that are low. This doesn't make much sense, as their wood density tends to be on the high end. @gonzalezeb, what do you think?
equations flagged if they don't fall within a range that we consider reasonable for that size (@gonzalezeb could provide guidance on those threshholds)
How about creating that reasonable range from the deviation from the curve fit to dbh
vs the preexisting agb
value for for each species?
Althought the preexisting agb
was calculated for tropical trees, the plots above show that it is likely close enough to the biomass
we should get from allodb -- that is, enough to pick obvious errors in the equations.
We can try that as a rough first approximation.
I see the problems in the equations for Platanus, Nyssa and two others...
@gonzalezeb, can you confirm this are the changes you did? I still don't see changes in Platanus and Nyssa, Are those yet to come or am I missing something?
library(tidyverse)
edited_eqn <- c("333c34" , "e9d686")
allodb::master() %>%
filter(equation_id %in% edited_eqn) %>%
select(equation_id, site, species, equation_allometry)
#> Joining `allodb::equations` with `allodb::sitespecies` by 'equation_id'.
#> Then joining with `allodb::sites_info` by 'site'.
#> # A tibble: 10 x 4
#> equation_id site species equation_allometry
#> <chr> <chr> <chr> <chr>
#> 1 333c34 Lilly Dicky Robinia pseudoacac~ 4.06014*(dbh^2)^1.052~
#> 2 333c34 SCBI Robinia pseudoacac~ 4.06014*(dbh^2)^1.052~
#> 3 333c34 UMBC Gleditsia triacant~ 4.06014*(dbh^2)^1.052~
#> 4 333c34 UMBC Robinia pseudoacac~ 4.06014*(dbh^2)^1.052~
#> 5 333c34 Michigan Big Woo~ Robinia pseudoacac~ 4.06014*(dbh^2)^1.052~
#> 6 e9d686 Lilly Dicky Robinia pseudoacac~ 0.77201*(dbh^2)^1.412~
#> 7 e9d686 SCBI Robinia pseudoacac~ 0.77201*(dbh^2)^1.412~
#> 8 e9d686 UMBC Gleditsia triacant~ 0.77201*(dbh^2)^1.412~
#> 9 e9d686 UMBC Robinia pseudoacac~ 0.77201*(dbh^2)^1.412~
#> 10 e9d686 Michigan Big Woo~ Robinia pseudoacac~ 0.77201*(dbh^2)^1.412~
Created on 2019-03-15 by the reprex package (v0.2.1)
I haven't included them because it required a deeper search.. I am working them now.
I made changes for Platanus and Nyssa (and I am reviewing others, ie. quercus velutina, caryas, etc).
Thanks @gonzalezeb,
Platanus and Nyssa are looking good.
. .
There seem to be a six outliers where calculated biomass
is much higher than the reference agb
. Let me know if anything jumps into your eyes
. .
But after excluding those potencial outliers the general picture looks reasonable.
But look at this when a test the largest quercus velutina at scbi
dbh=153/2.54 #cm to lbs 2.1457*(dbh^2.503) #our equation [1] 61172.03 #results in lbs [1] 27742.42 #result in kg
Same for eq c70dea
dbh=153/2.54 10^(1.00005+2.10621*(log10(dbh)))
[1] 56079.5 #in lbs 56079.5/2.205 [1] 25432.88 #in kg
We need to understand what’s going on with that divergence among species in Mauro’s figure. Why are some species so much higher than others?
On Mar 15, 2019, at 5:01 PM, Erika Gonzalez-Akre notifications@github.com<mailto:notifications@github.com> wrote:
Same for eq c70dea
dbh=153/2.54 10^(1.00005+2.10621*(log10(dbh))) [1] 56079.5 #in lbs 56079.5/2.205 [1] 25432.88 #in kg
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fforestgeo%2Fallodb%2Fissues%2F73%23issuecomment-473439702&data=02%7C01%7Cteixeirak%40si.edu%7C3a114c07d7624d65284808d6a98962b0%7C989b5e2a14e44efe93b78cdd5fc5d11c%7C1%7C0%7C636882804865700409&sdata=twGmWoUJLAgoGyn79aQurMuMFZaHoAbNGAPEiI5b%2F1M%3D&reserved=0, or mute the threadhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGD7jkmA78QGpqaTsOHNTGEuYMZzRK9nks5vXAojgaJpZM4brEIc&data=02%7C01%7Cteixeirak%40si.edu%7C3a114c07d7624d65284808d6a98962b0%7C989b5e2a14e44efe93b78cdd5fc5d11c%7C1%7C0%7C636882804865700409&sdata=iIDbBjWIF1RgC5HKmO%2BC5GVpzD3eYVh0wptr3%2BLupdw%3D&reserved=0.
Why are some species so much higher than others?
Just noting that it seems that it is not a problem of allodb equations, as the preexisting agb
(shown in grey) follows the same pattern.
I wouldn’t put much stake in the calculations using the tropical equation. Also, do the same species separate out that way in the tropical calculations, or is it a different set?
We need to understand what drives that difference (mathematically) and whether it makes sense biologically. From my initial review, it doesn’t seem to (many of the lower biomass species have higher wood density.) Erika, what is the source of these equations?
do the same species separate out that way in the tropical calculations, or is it a different set?
I think so. See the grey reference here:
Those equations are from Jenkins 2004.
Is this it? https://www.fs.fed.us/ne/durham/4104/papers/ne_gtr319_jenkins_and_others.pdf
Are those the equations allodb will use as generic equations to fall back when species+site-specific equations are not available?
Are they already in allodb?
I see. Then the grey and black dots should overlap even more than they do, right?
@maurolepore, could you please verify that you're plotting predictions for each species using only the equations specified for the size range?
[Are you] plotting predictions for each species using only the equations specified for the size range?
No, not yet.
https://github.com/forestgeo/allodb/issues/73#issuecomment-472918150
Okay, that might explain this divergence where some species (Liriodendron, Quercus) have much higher biomass at a given size than others (Juglans, Carya). From examining the Liriodendron equations, it looks like that might be going on there (although I get even more radical divergence). Could you please make it high priority to implement this restriction? It's pointless for us to try to evaluate the equations using these plots before that's done.
One thing about these plots remains very confusing. The grey dots indicate calculations from the tropical equations (based on wood density), correct? Tulip poplar has a much lower wood density than others (Quercus, Juglans, Carya), and I've confirmed that this is accurately reflected in the CTFS wood density data set used to calculate these values, so it doesn't make sense that it would have higher biomass at a given DBH.
Could you please make it high priority to implement this restriction?
Sure, I understand that getting accurate biomass
values is high priority. But first I need to fully understand why some equations can't be computed at all. Erika just closed an issue that should make things much better (https://github.com/forestgeo/allodb/issues/78).
It's pointless for us to try to evaluate the equations using these plots before that's done.
I agree that discussing the precision of biomass
values right now is like trying to run before we can walk. I'm sorry if my comment https://github.com/forestgeo/allodb/issues/73#issuecomment-472918150 wasn't clear.
However this thread has been super useful. These are some of the lessons we learned:
agb
values in grey come from; biomass
value at all which let to removing redundant columns.I highlight these lessons because I need you to help me help you. The more you engage in the process, the faster we will all move.
The grey dots indicate calculations from the tropical equations (based on wood density), correct?
Apparently not.
I thought so, but then Erika suggested that the agb
column in the SCBI data (see below) -- comes from this paper on allometries for North America -- not from the paper on equations for tropical trees.
library(tidyverse)
#> Warning: package 'purrr' was built under R version 3.5.3
fgeo.biomass::scbi_tree1 %>%
select(agb, everything())
#> # A tibble: 40,283 x 20
#> agb treeID stemID tag StemTag sp quadrat gx gy DBHID
#> <dbl> <int> <int> <chr> <chr> <chr> <chr> <dbl> <dbl> <int>
#> 1 1.59e-3 1 1 10079 1 libe 0104 3.70 73 1
#> 2 9.74e-4 2 2 10168 1 libe 0103 17.3 58.9 3
#> 3 9.88e-4 3 3 10567 1 libe 0110 9 197. 5
#> 4 1.24e-1 4 4 12165 1 nysy 0122 14.2 428. 7
#> 5 5.12e-2 5 5 12190 1 havi 0122 9.40 436. 9
#> 6 1.45e-3 6 6 12192 1 havi 0122 1.30 434 13
#> 7 7.05e-3 7 7 12212 1 unk 0123 17.8 447. 15
#> 8 9.65e-3 8 8 12261 1 libe 0125 18 484. 17
#> 9 6.25e-3 9 9 12456 1 vipr 0130 18 598. 19
#> 10 5.41e-4 10 10 12551 1 astr 0132 5.60 628. 22
#> # ... with 40,273 more rows, and 10 more variables: CensusID <int>,
#> # dbh <dbl>, pom <chr>, hom <dbl>, ExactDate <chr>, DFstatus <chr>,
#> # codes <chr>, nostems <dbl>, date <dbl>, status <chr>
Created on 2019-03-18 by the reprex package (v0.2.1)
Given the uncertainty about the sources of the grey dots, let's just ignore those data.
Please alert me once we're correctly plotting all equations as intended, and I'll return to the review.
I thought so, but then Erika suggested that the agb column in the SCBI data (see below) -- comes from this paper on allometries for North America -- not from the paper on equations for tropical trees.
Sorry, my mistake.. I though you were asking for the equations from allodb. The agb
in the scbi data (data that comes from the ctfs main database) is calculated based on tropical allometries using an old code in the old CTFS R package (as explained here).
On the metadata for the scbi.full or scbi.stem tables (here), agb is described as:
agb
Above-ground-biomass of all stems on the tree, in Mg (= metric tons or 106 g). Note that agb=0 for dead trees.
@gonzalezeb,
FYI, in this update, the biomass
of Liriodendron sp. appears too high, which Krista suggests might be solved once I support dbh-specific equations. Also see a few suspicious points over 40,000 [kg].
The most important accomplishment of today is that I can now explain all missing biomass
values. I'm ready to move on to more sophisticated features that should result in more precise biomass
.
Regarding Lliriodendron tulipifera, I wonder if the issue may be one of units. That may explain such a difference -- where the resulting biomass
values` greater not by a little but by orders of magnitude.
Incorrect units could definitely cause this kind of problem, but I believe @gonzalezeb already checked that.
Interestingly, I did n't change that equation in my recent fixes so not sure why those large biomass biomass values didn't show before. At the same time, I just confirmed that equation 94f593 (the one giving those bad values) is incorrect so I will review it.
It wouldn't have shown up bad before if it was being applied only within the specified DBH range.
Thanks @gonzalezeb, it's now looking much better. It will continue to improve as we support the more features.
@gonzalezeb,
We now support dbh-specific equations. There is a new update at http://bit.ly/demo-dbh-vs-biomass but please see also https://github.com/forestgeo/fgeo.biomass/issues/27
Wonderful! At a first pass, this looks much more like what I expected. At a first pass, I don't see any problems.
@maurolepore,
In order to check current calculations, and for future users to be able to visualize what allodb is giving in terms of predicted biomass, we'll want to make the following plot type: x-axis: DBH (cm) y-axis: biomass (kg) one colored line or series of points for each species at a site, spanning the range of sizes observed at the site