ropensci / allodb

An R package for biomass estimation at extratropical forest plots.
https://docs.ropensci.org/allodb/
GNU General Public License v3.0
36 stars 11 forks source link

Create diagram of fgeo biomass calculation process #68

Closed teixeirak closed 4 years ago

teixeirak commented 5 years ago

For purposes of documenting and presenting this package, we need a diagram illustrating the entire process of biomass calculation within fgeo/allodb, we need a diagram. This should include data cleaning, taper correction, determining which equation will be used, applying equation, integrating BIOMASS package (#67).

Draft version is on Krista's white board.

maurolepore commented 5 years ago

đź‘Ť I also really like the flow chart in fig 1 of RĂ©jou-MĂ©chain et al. (2017) <doi:10.1111/2041-210X.12753>.

image

gonzalezeb commented 5 years ago

First attempt. More details to be added. sliede 1

maurolepore commented 5 years ago

Awesome!

Just a couple of suggestions/questions:

teixeirak commented 5 years ago

Here's a rough schematic that I just drew. It's different in style, with a few functionally meaningful differences. I'll let @gonzalezeb make a nice version that merges the two.

image

teixeirak commented 5 years ago

I just noticed an error on my diagram-- it omits an arrow to the eq. table for tropical species in instances where we have equations. This should also include our wsg table.

teixeirak commented 5 years ago

note that we can't finalize this diagram until we resolve #67.

maurolepore commented 5 years ago

Okay, I see some difference with the architecture we have now (the bit including the BIOMASS package doesn't yet exist). This is flexible; fgeo.biomass and allodb could be merged, but I think it'd be better for you to have more control over allodb.


The allodb package performs no action, it only stores tables. All logic and actions happen inside the fgeo.biomass package. fgeo.biomass gets data as input from users and matches with tables in allodb in the search of allometric equations.

If fgeo.biomass finds a suitable equation for a particular data-row, it computes biomass and stores it in the biomass column of the corresponding row.

If fgeo.biomass does not find a suitable equation for a particular row, it will try to compute biomass via the BIOMASS package, and store it in the biomass column of the corresponding row.

The output should be a single table with as many rows as the input and a column biomass with biomass values for each row.

image

teixeirak commented 5 years ago

@maurolepore, most of the logic outlined in words makes sense to me. My biggest comment on both the text and figure is that fgeo.biomass doesn't include BIOMASS and allodb (at least not how I'd currently envision it). We can discuss the relationship between allodb and fgeo.biomass. It makes sense to have the code in fgeo.biomass. At the same time, while allodb is designed primarily for ForestGEO, we don't want to limit its use. But similarly, fgeo.biomass doesn't necessarily have to be restricted to ForestGEO.

maurolepore commented 5 years ago

These comment seems most relevant here https://github.com/forestgeo/allodb/issues/67#issue-414683916

image

maurolepore commented 5 years ago

@teixeirak,

My biggest comment on both the text and figure is that fgeo.biomass doesn't include BIOMASS and allodb (at least not how I'd currently envision it).

Sorry to confuse you. fgeo.biomass is the biggest circle including the smaller ones. Here is an improved version of my graph.


                    data (input)
                      |                                          
fgeo.biomass          |                                          
----------------------|-----------------------|                     
|                     |                       |                   
|  allodb             |                       |                   
|  --------------     |                       |                   
|  | species    |     |     Join user data    |                   
|  | sites      |-------->  with allodb       |                   
|  | equations  |           tables and        |                   
|  | etc.       |           search for        |                   
|  |            |           equations         |                   
|  |            |               |             |                   
|  --------------               |             |                   
|                             Found equation? |                   
|                               |             |                    
|                          No__/ \__Yes       |                      
|                          |          \       |                      
|                     BIOMASS          \      |                      
|                  -------------        \     |                      
|                  | Calculate |--------------|----> Biomass (output)
|                  |  biomass  |              |                      
|                  -------------              |                      
|                                             |                      
|----------------------------------------------                      

while allodb is designed primarily for ForestGEO, we don't want to limit its use. But similarly, fgeo.biomass doesn't necessarily have to be restricted to ForestGEO

My proposition (and current architecture) is that allodb is totally independent and not restricted to ForestGEO. In contrast, fgeo.biomass depends on allodb (and will later depend on BIOMASS) and restricted to dealing with ForestGEO data as it comes from the ForestGEO database. That is what defines its membership in the fgeo ecosystem of packages. This is, in general, my priority and I see anything else as nice to have enhancements. At least temporarily this approach helps me set priorities.

teixeirak commented 5 years ago

Your diagram was clear. I thought it would be misleading to present BIOMASS inside of Defoe.biomass, as that might seem to imply that it’s a component, as opposed to a separate package.

teixeirak commented 5 years ago

I'm listing this as high priority because its essential that we map out conceptually how everything will work ASAP. We don't need an aesthetically perfect diagram, but it would be good if we could agree on a decent working draft that can be tweaked/perfected later if needed. This will be useful for our own understanding and to communicate with collaborators.

teixeirak commented 5 years ago

This clarifies issue #59.

maurolepore commented 5 years ago

FYI, this is exactly the same architecture, except with names that should be more appealing to you. Users should know only of allodb, the other two packages are called internally. Modularity is a common way of managing complexity, particularly in software.

                    data (input)
                        |                                          
  allodb                |                                          
  ----------------------|-----------------------|                     
  |                     |                       |                   
  |  allodb.tables      |                       |                   
  |  --------------     |                       |                   
  |  | species    |     |     Join user data    |                   
  |  | sites      |-------->  with allodb       |                   
  |  | equations  |           tables and        |                   
  |  | etc.       |           search for        |                   
  |  |            |           equations         |                   
  |  |            |               |             |                   
  |  --------------               |             |                   
  |                             Found equation? |                   
  |                               |             |                    
  |                          No__/ \__Yes       |                      
  |                          |          \       |                      
  |                     BIOMASS          \      |                      
  |                  -------------        \     |                      
  |                  | Calculate |--------------|----> Biomass (output)
  |                  |  biomass  |              |                      
  |                  -------------              |                      
  |                                             |                      
  |----------------------------------------------                      
teixeirak commented 5 years ago

@maurolepore's diagram basically makes sense. However, we need to define where (1) data cleaning/formatting and (2) taper correction come in. Regarding (1), I'd propose that this occurs within "fgeo.biomass", who's role is defined as preparing data for biomass calculation and doing something with biomass values when returned. The role of allodb is to calculate biomass given inputs of trusted data. Regarding (2), it would be nice if taper correction could be included in allodb, but technically I think it could fit within fgeo.biomass or allodb.

teixeirak commented 5 years ago

@maurolepore, @gonzalezeb , please comment on this ASAP.

maurolepore commented 5 years ago

@teixeirak, your suggestions sound good. I'm curious why you want to define low level details (low as in closer to the computer than the user). In practice, the internal structure of software evolves continuously. Users only need to know about the interface (i.e. function names, arguments, etc.).

teixeirak commented 5 years ago

It's important to define what is part of allodb and hence Erika's publication. These need to be prioritized. For example, if taper correction is part of "allodb", we need to push that forward ASAP.

maurolepore commented 5 years ago

The role of allodb is to calculate biomass given inputs of trusted data.

I may not fully understand what you mean, because all of the work I'm doing now seems to fit this description. In other words, I'm not working on anything after biomass is calculated. And I'm now working 100% of my time in this project.

I agree we need to set priorities, and we may need to be more creative as it is not obvious to me what to drop from what I am doing now.

My only suggestion is to publish equations alone -- without code. That might maximize the cost-benefit of Erika's work. Delivering code might give you a relatively small pay off relative to the effort.

And yet, not even that separation is so clear, as with the code we continuously discover little things that need to be fixed in the equations.

maurolepore commented 5 years ago

Here is a flowchart summary what the code does now: add_biomass(). The code will continue to evolve and I don't commit to update these flowcharts.

These are functions that are called internally, but can also be called directly: add_species() and add_equations()

gonzalezeb commented 4 years ago

Closing issue because allodb scope changed.