trotsiuk / r3PG

An R package for forest growth simulation using the 3-PG process-based model
https://trotsiuk.github.io/r3PG/
GNU General Public License v3.0
27 stars 16 forks source link

How to manage external contributions that may be provided from a group at KIT? #62

Closed florianhartig closed 1 year ago

florianhartig commented 2 years ago

Hi @trotsiuk / @DavidForrester,

I though it's easiest to discuss this via an issue - I had mentioned at least to Vova that there is a group at KIT who is planning to use r3PG for an applied project that they have on doing some forestry predictions. The technical person in charge is @Rasilgon (Ramiro).

For the project, Ramiro is planning to do a bunch of changes / addition to the r3PG package to adjust it for their purposes. This is a brainstorming list:

image

I had suggested previously to Ramiro that it would be nice to profit from those chances for the development of r3PG, and he was asking now how this could be done in practice, i.e. how could we receive their changes e.g. via a fork, PR etc.

I will give my recommendation below, feel free to add your opinion!

florianhartig commented 2 years ago

OK, first of all, r3PG is under GPL3, so you can of course modify the code in whatever way you wish, as long as downstream use is compatible with GPL3 licence. So, the only question is really how we could feed back possible modifications of the package into r3PG.

Regarding the latter, I would say that there are two types of modifications:

a) generic things that can easily be integrated in the package

b) more specialised things that are maybe only relevant for your specific application

Regarding b), I would say that even if you fork, I don't see the need to put this into the package. For example, to run the model with several parameters, the question is if you really need a package function for that. A package function needs to be maintained, it requires help and all that, while in this case, it seems a simple for loop would also suffice. So, what I would warn about is now to create a highly specialised package function for each purpose that you have, for this I would simply use normal R code.

Regarding more generic things, e.g. a master table with priors for parameters etc: it would be nice if you could feed this back into our development, and I think there are two options:

a) if you fork and you have a separate branch for the respective feature, you could then create a PR on our repo

b) alternatively, you could simply create an issue where you describe the feature, and attach the code to the issue, so that we can discuss whether we would like to integrate it in the package. However, this could get messy if the feature requires code changes that are in several regions of the package.

Either way, I think it would be very much preferable if those changes would be submitted feature per feature and not as a whole, because I think it's quite likely that we would like to discuss whether and how we would integrate it to the main package per feature!

Cheers, Florian

@trotsiuk - in case similar questions arise - some R packages have this contributing.md file, e.g. https://github.com/r-lib/pkgdown/blob/master/.github/CONTRIBUTING.md where the preferred way to contribute is explained. I have contemplating adding one to my R packages, but haven't found the time so far. Maybe something for r3PG as well?

Rasilgon commented 2 years ago

Hi @trotsiuk and @DavidForrester ! And thanks Florian for opening the issue and the quick introduction! I agree that the process should go feature per feature, and there must be approval from your side about whether the feature should be part of the package or not. One item of the list overlaps with an existing issue in this repo #57 : Including the parameters in the package. Maybe I could start working on this, while you agree on how to go about the contribution process. Forking and submitting a PR is OK for me. I will have a look that the code of the package, and follow the same style. I can also join the discussion in #57 . Cheers, Ramiro

DavidForrester commented 2 years ago

Adding additional input and output features sounds great to me!

I agree with Florian, that it will be important to distinguish between general additions (that are potentially useful for many users) and more specific additions. We gave a lot of thought to this while working on 3-PG in the past. However, it was difficult to know what was general enough to put in the scripts. For example, one feature that was added into the main 3-PG script was the calculation of different thinning outputs (the thinned volume, parameters to calculate size distributions, and an ability to calculate outputs for a specific number of crop trees (e.g. largest 200) as opposed to the whole stand). This was added because it could not be calculated from the existing outputs. Other things like carbon stocks were not added because they are relatively easy to calculate from the existing outputs, and there were many potential outputs (e.g. different ecosystem services) that we listed as being of potential interest but for the sake of simplicity and due to the variability in how different studies calculate them, we left them all out. This is not to say that they should not now be added, it is just to mention why they are currently not included.

Some of the additions listed in the screen shot of Florians comment are relatively specific, or at least they are specific to the region of the species where the parameters, biomass equations, and soil data are applicable (i.e. Europe?). But perhaps this is considered general enough (?), especially if it is done in a way where it is easy to later add information for other parts of the world.

A difficulty we have had with biomass equations is the variability in the explanatory variables they include and the variability in the situations where they are applicable e.g. tree size ranges, site conditions, stand density etc where the equations were developed in comparison to the conditions where the equations are going to be used (by 3-PG). We recently published an analysis and database of all the biomass equations we could find for Europe, and in the database we specifically included the conditions where each equation is applicable. When we apply the biomass equation parameters from that database in our own work, most of the script is checking the applicability of the equations as opposed to extracting the biomass parameters from the database. This biomass equation analysis/database described in https://www.sciencedirect.com/science/article/pii/S0378112717301238. I have found several papers including biomass equations that were published since our paper was published – if you want these, just let me know.

I added some information about 3-PG parameters to #57.

I often receive questions about how to calculate 3-PG parameters or variables equivalent to 3-PG outputs that are used for model calibration/validation. To deal with this, I have prepared some simple R scripts and a 3-PG manual (in particular see section "10.2 Data sources and calculations" of the manual). I only mention this because it might be helpful to indicate the type of information that users have asked for in the past, and to indicate how these have often been calculated. This is all available from the same webpage where the Excel version of 3-PGmix is provided: https://sites.google.com/site/davidforresterssite/home/projects/3PGmix.

trotsiuk commented 2 years ago

Hi All, Thank you for moving it forward, I'm glad to hear that there is an interest and motivation. Below some comments, suggestions:

  1. Parameters #57. I fully agree that it would be nice to incorporate the already published parameters and have a function (as suggested get_parameter) to get them or incorporate those options to already existing function prepare_parameters. It would be also useful to have an option of choosing the default parameter (e.g. which study/species) you want to use as default.
  2. Driving data #58 . As already commented there, I'm a bit more sceptical about this one. I afraid it can be too much work particularly on long-term maintaining and development of new climate data. But this is more personal view. Potentially some ideas can be taken from injestr
  3. Outputs. Not fully sure what it is meant.
  4. Biomass equations. Similar to 2, I think there are several packages already available for the biomass calculations, some are under development (under openR science). There is also an internal package for @DavidForrester functions.
  5. Soil #51 . Yes, getting those data are a bit tricky. Would be nice to have some function, but again not sure how generic it will be. Potentially some ideas can be taken from injestr.

General comment. Based on your descriptions, I have a feeling that you are largely focusing on European scale. To my experience, 3-PG is also to larger extend is used outside of Europe (@DavidForrester please correct me). Therefore, if we are considering to include any of the above mentioned functions, I would prefer if they can be applied worldwide (e.g. driving data, soil, biomass equations, etc.). This brings additional complexity for the development, particularly which data source to use.

Rasilgon commented 2 years ago

Hi! Thanks for the feedback! The list reflects what we need from r·PG in our project. We are building a decision support tool/system and in r3pg plays an important role in it. Our focus is in Germany, and by extension, Europe. It is right, some items are very specific and therefore, might not well suited to be part of the package, as long as, they are not though to more general.

From my perspective (Decision Support) the possibility to easily prepare the inputs for r3PG any random location of Europe will a great enhancement, as you discussed in #58. For instance, from very simple information about one forest stand/area obtain the tables site, species, parameter/size_distribution, and even some pre-defined thinning plans. The methods should be general, so the issue would be obtain the data. Maybe as a first step, the pkg can include European Data (or get_Data), for biomass and soil.

However, as @trotsiuk pointed out, this might make development and maintaince more troublesome and demanding. It is also good to keep pkg simple and focused on its main purpose.

Maybe we can just keep for the moment the parameters bit, which you all agree on. @DavidForrester already provided a very good starting point.

In any case, we (KIT) will develop methods/functions for addressing the items of the list. If you agree, we can come back to you once we have a very clear workaround for any of them, and re-discuss them. Meanwhile, I will work on the parameters.

trotsiuk commented 2 years ago

Potentially, instead of creating and expanding the functionality, it can be also useful to provide a vignette to show how to prepare the driving/soil data from EU sources. In this way, people working within EU can simply use them, and people working with different climatic data can take the approach.

twest820 commented 2 years ago

r3PG is under GPL3, so you can of course modify the code in whatever way you wish, as long as downstream use is compatible with GPL3 licence.

A minor comment: Licensing of git repos is often communicated by dropping a LICENSE.md (or equivalent) alongside README.md. Potential contributors and forkers are more likely to see a top level license file than to find this sentence or to think to look in pkg/DESCRIPTION. Not that it's hard to search the repo for "license" but it's maybe something to consider.

trotsiuk commented 2 years ago

Thanks for the suggestion @twest820. Have now added the file.