Closed teixeirak closed 4 years ago
đź‘Ť I also really like the flow chart in fig 1 of RĂ©jou-MĂ©chain et al. (2017) <doi:10.1111/2041-210X.12753>.
First attempt. More details to be added.
Awesome!
Just a couple of suggestions/questions:
I think that the output is "AGB", not "Calculate AGB" because "Calculate" is an action.
Where does the calculation of AGB happen? When in BIOMASS? And when in fgeo.biomass?
I would leave fgeo out of this graph. fgeo if a meta-package; all it does it to install and attach other packages (e.g. fgeo.biomass).
Here's a rough schematic that I just drew. It's different in style, with a few functionally meaningful differences. I'll let @gonzalezeb make a nice version that merges the two.
I just noticed an error on my diagram-- it omits an arrow to the eq. table for tropical species in instances where we have equations. This should also include our wsg table.
note that we can't finalize this diagram until we resolve #67.
Okay, I see some difference with the architecture we have now (the bit including the BIOMASS package doesn't yet exist). This is flexible; fgeo.biomass and allodb could be merged, but I think it'd be better for you to have more control over allodb.
The allodb package performs no action, it only stores tables. All logic and actions happen inside the fgeo.biomass package. fgeo.biomass gets data as input from users and matches with tables in allodb in the search of allometric equations.
If fgeo.biomass finds a suitable equation for a particular data-row, it computes biomass and stores it in the biomass
column of the corresponding row.
If fgeo.biomass does not find a suitable equation for a particular row, it will try to compute biomass via the BIOMASS package, and store it in the biomass
column of the corresponding row.
The output should be a single table with as many rows as the input and a column biomass
with biomass values for each row.
@maurolepore, most of the logic outlined in words makes sense to me. My biggest comment on both the text and figure is that fgeo.biomass doesn't include BIOMASS and allodb (at least not how I'd currently envision it). We can discuss the relationship between allodb and fgeo.biomass. It makes sense to have the code in fgeo.biomass. At the same time, while allodb is designed primarily for ForestGEO, we don't want to limit its use. But similarly, fgeo.biomass doesn't necessarily have to be restricted to ForestGEO.
These comment seems most relevant here https://github.com/forestgeo/allodb/issues/67#issue-414683916
@teixeirak,
My biggest comment on both the text and figure is that fgeo.biomass doesn't include BIOMASS and allodb (at least not how I'd currently envision it).
Sorry to confuse you. fgeo.biomass is the biggest circle including the smaller ones. Here is an improved version of my graph.
data (input)
|
fgeo.biomass |
----------------------|-----------------------|
| | |
| allodb | |
| -------------- | |
| | species | | Join user data |
| | sites |--------> with allodb |
| | equations | tables and |
| | etc. | search for |
| | | equations |
| | | | |
| -------------- | |
| Found equation? |
| | |
| No__/ \__Yes |
| | \ |
| BIOMASS \ |
| ------------- \ |
| | Calculate |--------------|----> Biomass (output)
| | biomass | |
| ------------- |
| |
|----------------------------------------------
while allodb is designed primarily for ForestGEO, we don't want to limit its use. But similarly, fgeo.biomass doesn't necessarily have to be restricted to ForestGEO
My proposition (and current architecture) is that allodb is totally independent and not restricted to ForestGEO. In contrast, fgeo.biomass depends on allodb (and will later depend on BIOMASS) and restricted to dealing with ForestGEO data as it comes from the ForestGEO database. That is what defines its membership in the fgeo ecosystem of packages. This is, in general, my priority and I see anything else as nice to have enhancements. At least temporarily this approach helps me set priorities.
Your diagram was clear. I thought it would be misleading to present BIOMASS inside of Defoe.biomass, as that might seem to imply that it’s a component, as opposed to a separate package.
I'm listing this as high priority because its essential that we map out conceptually how everything will work ASAP. We don't need an aesthetically perfect diagram, but it would be good if we could agree on a decent working draft that can be tweaked/perfected later if needed. This will be useful for our own understanding and to communicate with collaborators.
This clarifies issue #59.
FYI, this is exactly the same architecture, except with names that should be more appealing to you. Users should know only of allodb, the other two packages are called internally. Modularity is a common way of managing complexity, particularly in software.
data (input)
|
allodb |
----------------------|-----------------------|
| | |
| allodb.tables | |
| -------------- | |
| | species | | Join user data |
| | sites |--------> with allodb |
| | equations | tables and |
| | etc. | search for |
| | | equations |
| | | | |
| -------------- | |
| Found equation? |
| | |
| No__/ \__Yes |
| | \ |
| BIOMASS \ |
| ------------- \ |
| | Calculate |--------------|----> Biomass (output)
| | biomass | |
| ------------- |
| |
|----------------------------------------------
@maurolepore's diagram basically makes sense. However, we need to define where (1) data cleaning/formatting and (2) taper correction come in. Regarding (1), I'd propose that this occurs within "fgeo.biomass", who's role is defined as preparing data for biomass calculation and doing something with biomass values when returned. The role of allodb is to calculate biomass given inputs of trusted data. Regarding (2), it would be nice if taper correction could be included in allodb, but technically I think it could fit within fgeo.biomass or allodb.
@maurolepore, @gonzalezeb , please comment on this ASAP.
@teixeirak, your suggestions sound good. I'm curious why you want to define low level details (low as in closer to the computer than the user). In practice, the internal structure of software evolves continuously. Users only need to know about the interface (i.e. function names, arguments, etc.).
It's important to define what is part of allodb and hence Erika's publication. These need to be prioritized. For example, if taper correction is part of "allodb", we need to push that forward ASAP.
The role of allodb is to calculate biomass given inputs of trusted data.
I may not fully understand what you mean, because all of the work I'm doing now seems to fit this description. In other words, I'm not working on anything after biomass is calculated. And I'm now working 100% of my time in this project.
I agree we need to set priorities, and we may need to be more creative as it is not obvious to me what to drop from what I am doing now.
My only suggestion is to publish equations alone -- without code. That might maximize the cost-benefit of Erika's work. Delivering code might give you a relatively small pay off relative to the effort.
And yet, not even that separation is so clear, as with the code we continuously discover little things that need to be fixed in the equations.
Here is a flowchart summary what the code does now: add_biomass()
. The code will continue to evolve and I don't commit to update these flowcharts.
These are functions that are called internally, but can also be called directly: add_species()
and add_equations()
Closing issue because allodb scope changed.
For purposes of documenting and presenting this package, we need a diagram illustrating the entire process of biomass calculation within fgeo/allodb, we need a diagram. This should include data cleaning, taper correction, determining which equation will be used, applying equation, integrating BIOMASS package (#67).
Draft version is on Krista's white board.