hollorol / RBBGCMuso

RBBGCMuso is a software package that supports the application of the Biome-BGCMuSo biogeochemical model.
GNU General Public License v2.0
20 stars 11 forks source link
biogeochemical-model ecosystem-model interface model optimization-tools r r-package

+BEGIN_HTML

Fork me on GitHub

+END_HTML

Please cite this work as follows: Hollós, R., Kristóf, E., Fodor, N., Hidy, D., Horváth, F., Barcza, Z. (2024). RBBGCMuso: an R package to support the application of the [[http://nimbus.elte.hu/bbgc/][Biome-BGCMuSo]] biogeochemical model. URL https://github.com/hollorol/RBBGCMuso.

Current version: 0.7.0

RBBGCMuso is an R package which supports the easy but powerful application of the [[http://nimbus.elte.hu/bbgc/][Biome-BGCMuSo]] biogeochemical model in R environment. It also provides some additional tools for the model such as Biome-BGCMuSo optimized Monte-Carlo simulation and global sensitivity analysis. If you would like to use the framework, please read the following description.

** Installation You can install the RBBGCMuso package in several ways depending on the operating system you use. Up to now RBBGCMuso was tested only in Linux and MS Windows environment, so Mac OS X compatibility cannot be guaranteed yet. In MS Windows you can install the package from binary or from source installer. In Linux you can only install the software from source.

*** Installation in Linux and MS Windows from Source (proposed method) If you would like to install the RBBGCMuso package from Source, you have two options. a) Clone this repository, then build and run the package (further information is available here: [[http://kbroman.org/pkg_primer/pages/build.html][package build and install]]) or b) Install the remotes package first (recommended):

+BEGIN_SRC R :eval no

install.packages("remotes")

+END_SRC

Then copy the following line into the R session and execute it:

+BEGIN_SRC R :eval no

remotes::install_github("hollorol/RBBGCMuso/RBBGCMuso",upgrade="never")

+END_SRC

We provide support to Biome-BGCMuSo v6 via a separate branch:

+BEGIN_SRC R :eval no

remotes::install_github("hollorol/RBBGCMuso/RBBGCMuso",ref="version6",upgrade="never")

+END_SRC

If you use Linux, with Debian (version 8+) you can automate the whole installation process with curl via copying the following line into the Linux terminal:

+BEGIN_SRC bash :eval no

bash <(curl -s https://raw.githubusercontent.com/hollorol/RBBGCMuso/Documentation/debianInstaller.sh)

+END_SRC

*** Installation in MS Windows (only for experts) Alternatively, you can also install the latest RBBGCMuso by copying the following line into the R console (using R or RStudio):

+BEGIN_SRC R :eval no

source("https://raw.githubusercontent.com/hollorol/RBBGCMuso/master/installWin.R")

+END_SRC

** Quick usage *** Preparation

To start using RBBGCMuso you have to load the package in R with the following command:

+BEGIN_SRC R :eval no

library(RBBGCMuso)

+END_SRC

In order to use the RBBGCMuso framework, you have to set up the environment, as you would normally do when you use the model without the RBBGCMuso framework. It means that according to the Biome-BGCMuSo terminology you have to have the proper INI file set, the meteorology input file, the soil input file, and the ecophysiological constants file (EPC) as minimum input. Additional files might be included by the user including nitrogen deposition, management handlers, etc. Please read the corresponding documentation in the [[http://nimbus.elte.hu/bbgc/files/Manual_BBGC_MuSo_v6.1.pdf][actual Biome-BGCMuSo User's Guide]].

If you do not yet have a complete, operational model input dataset, you may want to use the so-called copyMusoExampleTo function (part of RBBGCMuso) which downloads a complete sample simulation set to your hard drive:

+BEGIN_SRC R :eval no

copyMusoExampleTo()

+END_SRC

Once this command is executed in R, it will invoke a small Graphical User Interface (GUI) where you can select the target site for the sample simulation. At present only the "hhs" site is available, which is the abbreviation of the Hegyhátsál eddy covariance station in Hungary. After selecting the site (hhs in this example) the GUI will ask the user to specify a directory (=folder) where the dataset will be stored. In this example we suppose that the user works under MS Windows, and he/she created a directory called C:\model as target directory. It means that after selection of the site the user will select the C:\model directory. Once the copyMusoExampleTo command is finished, the model input dataset and the model executable (called muso.exe and cygwin1.dll) are available in the C:\model folder. The user might check the content of the files using his/her favourite text editor (we propose Editpad Lite as it can handle both Windows and Linux text files). Note that file extension might be hidden by Windows which could cause problems, so we propose to adjust Windows so that file extensions are visible. Visit [[https://www.thewindowsclub.com/show-file-extensions-in-windows][this website]] to learn how to show file extensions in Windows.

In this example the C:\model directory will contain the following files:

In the followings we will demonstrate the usability of RBBGCMuso with the hhs example dataset. If you have your own model input data set, you might need to change the commands accordingly.


Important note on file naming convention

We propose to use the following filename convention for the INI files. For practical considerations, name your spinup INI file as something_s.ini, and the normal INI file as something_n.ini, where something is arbitrary (note the _s and _n convention). It is not obligatory, but if you do not follow this convention then you have to generate the settings variable manually with the setupMuso command. However, if you do follow this convention, then RBBGCMuSo will automatically recognize your spinup and normal INI file name and content, so the work will be much easier. (See help of setupMuso command in R.) In our example s.ini and n.ini follows this convention, so by default RBBGCMuso will use these files for spinup and normal run, repsectively.

*** Running the model

Now as we have a complete set of input data, we are ready to run the model. You can run the model in spinup mode, in normal mode, or in both phases (including the so-called transient run; see the [[http://nimbus.elte.hu/bbgc/files/Manual_BBGC_MuSo_v6.1.pdf][Biome-BGCMuSo User's Guide]]). Using the runMuso function (that is part of RBBGCMuso) you will be able to execute the the model in both spinup or normal phase, and you can also simplify the execution of both phases consecutively. (Note that runMuso is the same as the obsolete calibMuso function.)

In order to execute the simulation, first you have to set the working directory in R so that RBBGCMuso will find the model and the input files. In our example this is as follows:

+BEGIN_SRC R :eval no

setwd("c:/model")

+END_SRC

(Note the "/" symbol which is different from the "\" that is typically used in Windows!)

In order to run the model as it is provided, simply use the following command in R or RStudio:

+BEGIN_SRC R :eval no

runMuso(skipSpinup = FALSE)

+END_SRC

Note that by default runMuso skips the spinup simulation (in order to speed up the model execution), but in our case we do not yet have the result of the spinup run (the so-called endpoint file which is the initial condition for the normal simulation), so spinup simulation is obligatory. This is performed with the skipSpinup=FALSE parameter. Note that according to the naming convention described above, the model will use s.ini and n.ini for spinup and normal phase, repsectively (this can be changed with the parameters of runMuso if needed). As n.ini represents a grass simulation, the results will provide simulation data on C3 grass ecosystem with management defined by the hhs.mgm file.

If the simulation is successful, the results can be found in the C:\model directory. In our example two files were created with .log extension that contain some information about the spinup and the normal phase. The hhs.endpoint file is the result of the spinup (and optional transient) run, and can be considered as initial conditions for the normal run. (Here we have to note that now runMuso can be called without the skipSpinup parameter which means that the simulation will be restricted to the normal phase only.) The results of the simulation (carbon fluxes, state variables, whatever was set by the user in the DAILY_OUTPUT block of the normal INI file) are available in the file hegyhatsal.dayout. Note that annual output was not requested in this case. Also note that in the hhs example file set binary daily output is created and further processed by RBBGCMuso. One of the most attractive features of RBBGCMuso is that the model output is handled by the package which means that it will be directly available for the user as a variable for further processing in R environment.

*** Visualization of the model output

Once the simulation is completed (hopefully without errors), we can visualize the results. Biome-BGCMuSo provides large flexibility on model output selection, which means that the results will depend on the settings of the user in the normal INI file (DAILY_OUTPUT block; see below). In our hhs example 12 variables are calculated in daily resolution. As the model is run for 9 years by the normal INI file, each output variable will be available for 9x365 days (note the handling of leap years in the [[http://nimbus.elte.hu/bbgc/files/Manual_BBGC_MuSo_v6.1.pdf][Biome-BGCMuSo User's Guide]]).

Assume that we would like to visualize Gross Primary Production (GPP) for one simulation year (this is the 2nd variable in the n.ini file; see below). This can be achieved by the following commands. First we re-run the normal phase and redirect the output to the R variable called 'results':

+BEGIN_SRC R :eval no

results<-runMuso()

+END_SRC

Now we extract the 2nd variable from the complete output set and call this R variable as gpp:

+BEGIN_SRC R :eval no

gpp<-results[,2]

+END_SRC

Now we are ready to visualize the results, first for all 9 years:

+BEGIN_SRC R :eval no

plot(gpp*1000)

+END_SRC

Note that the 1000 multiplier is needed to get GPP in gC/m^{2}/day units. The result should look like this image:

+BEGIN_HTML

<img width="600px"
src="https://github.com/hollorol/RBBGCMuso/blob/eae5f7f83aacc68880236c7480d6ba4d17990375/images/gpp01.png" alt="GPP plot">

+END_HTML

Now get the 4th year from the dataset and plot it:

+BEGIN_SRC R :eval no

gpp4<-gpp[(3365+1):(4365)] plot(gpp4*1000,type="l")

+END_SRC

Advanced visualization of the results is possible with plotMuso.

*** Selection of output variables

The visualization example above used the Hegyhátsál sample simulation with the predefined output variables that we included in the initialization file of the normal phase. The available output variables can be checked by the user by opening the n.ini file (normal phase initialization file) with a text editor (e.g. Notepad, or our favourite EditPad Lite). Check the DAILY_OUTPUT block within the n.ini. This should look like this (with more spaces between the numbers and the descriptors):

+BEGIN_SRC text

DAILY_OUTPUT 12 number of daily output variables 2520 proj_lai 3009 daily_GPP 3014 daily_Reco 171 evapotransp 2502 n_actphen 2603 vwc00-03cm 2604 vwc03-10cm 2605 vwc10-30cm 75 GDD 2636 rooting_depth 2716 m_soilstress 671 m_vegc_to_SNSC

+END_SRC

Note the number right below the DAILY_OUTPUT line that indicates the number of selected output variables. If you decide to change the number of output variables, the number (currently 12) should be adjusted accordingly. At present the R package handles only daily output data, but the user should acknowledge the optional annual output set in the ini file as well. Biome-BGCMuSo offers a large number of posible output variables. The full list of variables are available at the website of the model as an Excel file: https://nimbus.elte.hu/~bzoli/public/OUTGOING/muso70beta07/Biome-BGCMuSo7.0-b7_outputs.xlsx

Selection of output variables is primarily driven by the need of the user: it depends on the process that the user would like to study. We made an effort to provide all possible variables that are comparable with the observations. One might be interested in carbon fluxes like Net Ecosystem Exchange (NEE), Gross Primary Production (GPP), total ecosystem respiation (Reco, all comparable with eddy covariance measurements), evapotransporation (ET), Net Primary Production (NPP), soil organic carbon (SOC) content, leaf area index (LAI), aboveground woody biomass and coarse woody debris in forests, crop yield, rooting depth, aoveground or total biomass for herbaceous vegetation, litter, soil respiration, soil water content for 10 soil layers, soil N2O efflux, etc.

Below we list the most common output variables that can be calculated by the model.

+BEGIN_SRC text

50 tsoil[0] - soil temperature of the topmost soil layer (0-3 cm) [Celsius] 171 evapotranspiration [kgH2O/m2/day, equivalent with mm/day] 518 soil1c_total - total soil organic carbon content in the 1st soil pool [kgC/m2] 519 soil2c_total - total soil organic carbon content in the 2nd soil pool [kgC/m2] 520 soil3c_total - total soil organic carbon content in the 3rd soil pool [kgC/m2] 521 soil4c_total - total carbon content in the recalcitrant SOC pool [kgC/m2] 3061 total soilc - total SOC pool [kgC/m2] 313 fruitc - carbon content of the fruit/yield pool [kgC/m2] (this is used for frop yield estimation) 2527 plant height [m] 2528 NDVI [dimless] 2520 proj_lai - this is what we typically refer as Leaf Area Index (LAI) [m2/m2] 2603 vwc[0] - volumetric soil water content of the 1st layer (0-3 cm) [m3/m3] 2604 vwc[1] - volumetric soil water content of the 2nd layer (3-10 cm) [m3/m3] 2605 vwc[2] - volumetric soil water content of the 3rd layer (10-30 cm) [m3/m3] 2606 vwc[3] - volumetric soil water content of the 4th layer (30-60 cm) [m3/m3] 2607 vwc[4] - volumetric soil water content of the 5th layer (60-90 cm) [m3/m3] 2608 vwc[5] - volumetric soil water content of the 6th layer (90-120 cm) [m3/m3] 2609 vwc[6] - volumetric soil water content of the 7th layer (120-150 cm) [m3/m3] 3006 daily_npp - daily Net Primary Production [kgC/m2/day] 3005 daily_nep - daily Net Ecosystem Production (estimated by -NEE) [kgC/m2/day]
3009 daily_gpp - daily Gross Primary Production [kgC/m2/day] 3037 cum_yieldC_HRV - harvested fruit that is crop yield in case of croplands [kgC/m2] 75 GDD - growing degree day, used for the phenophase length calculations 1531 SUM of the soil mineral NH4+ in the total soil column [kgN/m2] 1532 SUM of the soil mineral NO3- in the total soil column [kgN/m2] 3013 daily soil respiration [kgC/m2/day] 307 leafC - total leaf carbon content [kgC/m2] 310 fine root C - total fine root carbon content [kgC/m2] 316 soft stem C - total soft stem carbon content [kgC/m2] (only for herbaceous vegetation) 407 standing dead biomass [kgC/m2] - that is the inactive standing plant pool not yet part of the litter pool 319 livestemC - aboveground live woody biomass [kgC/m2] 322 deadstemC - aboveground dead woody biomass [kgC/m2] 3160 total abovegound woody biomass C [kgC/m2] 401 CWD - coarse woody debris [kgC/m2]

+END_SRC

A note from the Biome-BGC User's Guide: "Livewood is defined as the actively respiring woody tissue, that is, the lateral sheathing meristem of phloem tissue, plus any ray parenchyma extending radially into the xylem tissue. Deadwood consists of all the other woody material, including the heartwood, the xylem, and the bark." In this sense aboveground woody biomass can be calculated as the sum of output variables 319 and 322 (plus the corresponding storage/transfer pools). For convenience, variable 3160 can be used as it represents the sum of 319 and 322 plus the related storage/transfer pools.

*** Perform Quick experiments

Assume that we would like to dig a bit deeper with the model and understand the effect of changing ecophysiological variables on the model results. This can easily be performed with RBBGCMuso. Execute the following command in R/RStudio:

+BEGIN_SRC R :eval no

musoQuickEffect(calibrationPar = 13, startVal = 0, endVal = 9, nSteps = 5, outVar = 3009, yearNum=3)

+END_SRC

This command selects the 13th line in the ecophysiological constants (EPC) file (this is base temperature), then it starts to replace the original value from 0 to 9 in 5 consecutive steps. In this example GPP is selected (variable number 3009, which is the 2nd variable), so the effect of varying base temperature on GPP is calculated using 5 simulations. The result is a spectacular plot where color coding is used distinguish the parameter values. yearNum=3 means that the experiment is done for the 3rd year of the simulation. Remember that in crop rotation simulations the effect might be invisible if there is a conflict between year number and crop type.

At present musoQuickEffect is not usable for the allocation parameters due to restrictions of the allocation fractions.

*** Study the effect of ecophysiological parameters using paramSweep

The paramSweep function is the extension of the musoQuickEffect. It can test the effect of the multiple selected parameters on the model results in once. The result of the paramSweep function is a single HTML file with embedded images. paramSweep needs a csv file called parameters.csv which defines the parameters of interest and the corresponding parameter intervals. In case of the hhs sample dataset there is an example parameters.csv file (please open it and check). The structure of the parameters.csv file is simple. First, parameter name is needed (it can be anything but should refer to the parameter), then the line number of the EPC file is provided, then the possible minimum and maximum value of the parameter is given. Note that there is a tricky part in the parameters.csv as the parameter selection is not straightforward in case of multiple columns (see the end of the EPC file!). The logic is that fractional part of a number is used to select the appropriate parameter from multiple columns. For example, "emergence,132.61,0,1000" means that in the 132nd line of the EPC file there are 7 columns (numbering starts from 0, so it is 6), and we would like to adjust the 2nd column (marked by 1), which ends up with 132.61. 0,1000 means that sweep starts at 0 and ends with 1000. Invoke the paramSweep with simply issuing this command:

+BEGIN_SRC R :eval no

paramSweep()

+END_SRC

The routine uses the provided parameters.csv by default. This can be changed of course and the user can provide an alternative csv file.

*** Sensitivity analysis

Advanced sensitivity analysis is possible with the musoSensi function of RBBGCMuso. [[http://nimbus.elte.hu/agromo/files/musoSensi_usage.html][Visit this link to read the manual of the sensitivity analysis.]] Note that parameters.csv is provided in the hhs example dataset, so you don't have to create it manually.

In the simplest case the user might issue the following command that can be immediately tested with the provided example:

+BEGIN_SRC R :eval no

musoSensi(iterations = 1000, varIndex = 2)

+END_SRC

This example runs the analysis with 1000 iterations using the second output variable (that is daily GPP). The results will be provided in a graphical form and also by numeric values.

IMPORTANT NOTE: If the result file contains only NAs it means that none of the parameters affected the output variable of interest. In this case you need to adjust the output parameter selection or the EPC parameter list. A simple example for this is soil temperature which is not affected by some of the plant parameters. [[https://github.com/hollorol/RBBGCMuso/issues/3][See this link for further details.]]

*** Parameter estimation (calibration)

RBBGCMuso supports parameter estimation (also called as model optimization or calibration) based on the so-called GLUE method. GLUE uses observations and the optimization is driven by the parameter intervals file that is described above (parameters.csv). Below we provide a sample R script that executes the GLUE-based parameter estimation using the sample dataset that is provided by the copyMusoExampleTo() command (see above). Note that the content of the EPC file might have been changed as the result of the above-described procedures, which means that the user might want to remove the test folder and recreate it using the copyMusoExampleTo() command. The runMuso(skipSpinup = FALSE) command must be executed prior to testing the provided code if the model folder is newly created:

+BEGIN_SRC R :eval no

md <- data.table::fread("HU-He2_2012_MEASURED.txt") md[md ==-9999] <- NA md[,GPP:=GPP/1000] plotMusoWithData(md, modelVar = 3009, dataVar = "GPP") plotMuso()

likelihoodGPP = list( GPP = (function(x, y){exp(-sqrt(mean((x-y)^2))) })) calibrateMuso(measuredData = md, dataVar = c(GPP=3009), iterations = 100, likelihood = likelihoodGPP, method="GLUE")

+END_SRC

In the script the observed daily GPP is used to construct the likelihood function. Unit conversion takes place since the model provides GPP in kgC/m2/day units while the observations are provided in gC/m2/day units. The result of the calibration is provided by a PDF file that is created in the model folder. The plotMusoWithData command is useful to compare visually the observation and the simulation.

NOTE: we plan to disseminate a sample script in the future to demonstrate the applicability of the CIRM method in the GLUE context.

*** Contact

E-mail: Roland HOLLÓS: hollorol@gmail.com; Zoltán BARCZA: zoltan.barcza@ttk.elte.hu

** Acknowledgements

The research was funded by the Széchenyi 2020 programme, the European Regional Development Fund and the Hungarian Government (GINOP-2.3.2-15-2016-00028), and by the National Multidisciplinary Laboratory for Climate Change (RRF-2.3.1-21-2022-00014) project.