trotsiuk / r3PG

An R package for forest growth simulation using the 3-PG process-based model
https://trotsiuk.github.io/r3PG/
GNU General Public License v3.0
27 stars 16 forks source link

assorted typos and other minor issues #69

Closed twest820 closed 1 year ago

twest820 commented 2 years ago

Filing this to capture a few low priority things I've noticed. These can be gotten to as someone has time. I'll add to this issue if I encounter other things

md_3PG.f95

At initialization, the lines

        volume_mai(:) = volume_cum(:) / age(ii,:)
        basal_area_prop(:) = basal_area(:) / sum( basal_area(:) )

assume age(, :) and the total basal_area are greater than zero. This is not necessarily the case, may contribute to #52, and causes the initial volume_mai to be infinite for Liquidambar formosana in mixtures_other.

where( fNn(:) == 0.d0 ) fN0(:) = 1.d0

silently overwrites parameter settings from within s_3PG_f(). This is unlikely to be expected and should be done in parameter input validation rather than at simulation runtime.

The 3-PGpjs manual says CoeffCond has units of mbar, the 3-PGmix manual says mBar⁻¹, and the code omits documenting the units. Since f_vpd = exp(-CoeffCond * VPD_sp) I believe the 3-PGmix manual is correct.

lambda_h is calculated as

        lambda_h(:) = 0.8285d0 + ((1.09498d0 - 0.781928d0 * kLSweightedave(:)) * 0.1d0 ** (canopy_vol_frac(:))) - &
            0.6714096d0 * 0.1d0 ** (canopy_vol_frac(:))
        if ( solarAngle > 30.d0 ) then
            lambda_h(:) = lambda_h(:) + 0.00097d0 * 1.08259d0 ** solarAngle
        end if

but the corresponding coefficients in Equations 5a and 5b of Forrester et al. 2014 are 0.8260 + (1.1698 − 0.9221 kLSweightedave) 0.1 canopy_vol_frac - 0.6703 * 0.1 * canopy_vol_frac + 0.0011 1.0807 solarAngle. Equations A21 and A22 current 3-PGmix manual have the same values as Forrester et al. It therefore looks very much like the regression has been slightly reparameterized since 2014 but corresponding updates weren't made to the 3-PGmix manual and the comments in the code.

f_gamma_dist() has

        out = x ** (x - 0.5d0) * 2.718282d0 ** (-x) * (2.d0 * Pi) ** (0.5d0) * &
            (1.d0 + 1.d0 / (12.d0 * x) + 1.d0 / (288.d0 * x ** 2.d0) - 139.d0 / (51840.d0 * x ** 3.d0) - &
            571.d0 / (2488320.d0 * x ** 4.d0))

It's a negligible difference but it's unclear why this is 2.718282d0 ** (-x) instead of just Exp(-x).

rhoMin and rhoMax are misnamed and are more correctly called rho0 and rho1, respectively. This is an error in the 3-PGpjs manual, which naively assumes young wood (rho0) is less dense than old wood (rho1) even though this depends growth rates and is not the case for juvenile wood of quite a few species. Since, correctly, no check is made that rhoMin < rhoMax the code should probably follow 3-the PGmix manual in using rho0 and rho1.

water_runoff_polled is

water_runoff_polled = poolFractn * excessSW

which looks like a typo for water_runoff_pooled as polling and pooling are definitely different things. It's also unclear where this calculation comes from as it's not commented in the code and doesn't yield search hits in either the 3-PGpjs or 3-PGmix manuals.

It's unclear what, other than numerical error, might cause the check

if( maxval( mort_manag(i) * managementInputs( t_n(i),3:5,i)) > 1.d0 ) then

to be true since mort_manag and managementInputs(,3:5,) are both fractions.

md_decl_const.f95

The array daysInMonth is currently declared as

integer, dimension(12), parameter :: daysInMonth = (/31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31/)

and then used in light, transpiration, and δ13C GPP calculations. This is a good zero order approximation but, as February is hard coded to 28 days and there are no leap year checks, an off by one day error is introduced every leap February. From a look through the code the result is overall estimates of incoming PAR and evapotranspiration are biased low by approximately 0.07% across any given four year period (both 3-PGpjs and 3-PGmix ends assuming there's no solar radiation or ET on February 29th).

This seems likely to be a minor enough structural error to be unimportant to overall model accuracy and I presume fixing this would be problematic to backwards compatibility. However, it still seems worth capturing as an issue.

i_write_out.h

! Production ---------------
output(ii,:,6,2) = npp_f(:)

gets named npp in R rather than npp_f. npp and npp_f differ by biom_debt_foliage so the difference between the two is often substantial in deciduous leaf on months. (Otherwise, they appear to be identical.)

! Mortality ---------------
output(ii,:,8,1) = biom_tree_max(:)

tends to return a time invariant value. Since biom_tree_max is calculated from the (usually) time varying stems_n_ha this is somewhat suspicious.

Third, this probably isn't really an i_write_out.h issue but, for want of a better header to put it under, the data frame returned by run_3PG() contains numerous unnamed variables whose names mostly appear to align with four dimensional array columns not assigned in i_write_out.h. These include

I haven't checked if any of these contain actual data but it seems probably worth excluding unused variables from the data frame and giving informative names to any which do include data.

prepare_climate.R

In prepare_climate.R there is the check

if( any( is.na(climate) ) ){
    stop( "Climate table should not contain NA's" ) # typo: should be NAs
  }

There are a couple issues with this.

  1. It breaks with, for example, the climate table in the validation spreadsheet pkg/tests/r_vba_compare/r3PG_input.xls for the mixtures_other site since the shitai_clim climate does not have values for tmp_ave. The workaround is just to drop tmp_ave column so that prepare_climate() recreates and fills the column a few lines later but, strictly speaking, it should be acceptable for tmp_ave to contain imputable NAs. The fix here is modify the check and then use mutate(tmp_ave = replace_na(tmp_ave, ...)).
  2. It rejects climate tables where d13catm is not set even if d13c calculations are disabled (settings$calculate_d13c = FALSE is passed to run_3PG()). The workaround is again to drop the d13catm column if it's not needed.

s_sizeDist_correct() declares but does not initialize CVdbhDistribution, CVwsDistribution, wsWeibullScale, wsWeibullShape, and wsWeibullLocation. It also does not zero them in these two branches:

        else
            DrelBiaspFS(:) = 0.d0
            DrelBiasBasArea(:) = 0.d0
            DrelBiasheight(:) = 0.d0
            DrelBiasLCL(:) = 0.d0
            DrelBiasCrowndiameter(:) = 0.d0
            wsrelBias(:) = 0.d0
        end if

        ! Correct for trees that have age 0 or are thinned (e.g. n_trees = 0)
        where( age(:) .eq. 0.d0 .or. stems_n(:) .eq. 0.d0 )
            DrelBiaspFS(:) = 0.d0
            DrelBiasBasArea(:) = 0.d0
            DrelBiasheight(:) = 0.d0
            DrelBiasLCL(:) = 0.d0
            DrelBiasCrowndiameter(:) = 0.d0
            wsrelBias(:) = 0.d0
        end where

From a very brief look, it appears this may result in these five outputs containing whatever was in uninitialized memory when they're returned to R. This is most obvious when bias correction is disabled.

prepare_input.R

Description of vpd_day is that it's frost days per month.

prepare_sizeDist.R

In prepare_sizeDist.R this should be stop( 'First column name of the size_dist table must correspond to: parameter' ).

    if( !identical( c("parameter"), colnames(size_dist)[1]) ){
      stop( 'First column name of the parameters table must correspond to: parameter' )
    }

r3PG_VBA_compare.R

I haven't done a comprehensive analysis of this but I've noticed the values output from the current Fortran code diverge somewhat from the reference values in r3PG_input.xls. Due to the lack of comments in the test code it's unclear if this is expected behavior but, given the importance assigned to backwards compatibility, this may merit closer attention even though the changes are, presumably, intentional. The current test code generates a .png for visual comparison but it may be worth considering implementing automated checks. It looks like r3PG_input.xls may be excised from longer runs as it shows starting biom_foliage_debt which doesn't quite match the initial biom_foliage values specified.

63 repros in mixtures_eu with European beech volume_cum dropping from 138.53876 to 138.19902 in October 2003. Testing therefore could have detected #63 prior to the 0.1.3 release but, since the code necessary to do so didn't exist and wasn't added when #63 was fixed, there remains a test hole here.

In general, values aren't sanity checked for range. So overly negative, overly positive, infinite, and NA outputs will not be detected, both internally (due to the lack of assertions) and in the output returned to R (due to the necessary R test code not being present).

examples

The Weibull fitting examples for 3-PGmix bias correction are missing from r3PG. A partial workaround is to start from the ones distributed with 3-PGmix but it seems likely most people will find the code there difficult to adapt to their datasets.

trotsiuk commented 2 years ago

Thanks @twest820 for the outstanding code review and pointing potentially some issues. Some of them shall be addressed back to original code of Landsberg and Waring some of them indeed need to be cross-checked with 3-PG mix.

Would be great if there will be a volunteer who can implement those suggestions and corrections.

A quick response from my side:

md_3PG.f95

Based on the excel version downloaded here https://sites.google.com/site/davidforresterssite/home/projects/3PGmix, r3PG have the same implementation and coefficients:

If CanopyVolumefraction(layer) < 0.01 Then
                    lambdaH(speciescounter) = 0.8285 + ((1.09498 - 0.781928 * kLSweightedave(layer)) * 0.1 ^ (0.01)) - 0.6714096 * 0.1 ^ (0.01)
                    Else
                    lambdaH(speciescounter) = 0.8285 + ((1.09498 - 0.781928 * kLSweightedave(layer)) * 0.1 ^ (CanopyVolumefraction(layer))) - 0.6714096 * 0.1 ^ (CanopyVolumefraction(layer))
                    End If

md_decl_const.f95

Yes, correct. There is a room for improvement, as you also mentioned in #68

i_write_out.h

Fortran output is a four dimensional array. Fortran is not communicating the names back to R and therefore the names are assigned in R transf.out . Therefore, there is no real need to declare the unused variables in Fortran.

prepare_climate.R

good suggestion.

prepare_sizeDist.R

good suggestion

r3PG_VBA_compare.R

There are automatic tests run under the testthat, but agree would be nice to have better automatic testing with excel.

Much room for improvement.

trotsiuk commented 2 years ago

Tod can you please write the new comment instead of extending the initial one, this would be easier to track for me. Thanks

i_write_output.h

! Production --------------- output(ii,:,6,2) = npp_f(:)

This is because it is assigned NPP values for further calculation https://github.com/trotsiuk/r3PG/blob/8d378441174a86c868071b33ac7d6d408d545c8d/pkg/src/md_3PG.f95#L431

! Mortality --------------- output(ii,:,8,1) = biom_tree_max(:)

For me seems to be correct and is a time variant variable

library(r3PG)
out_3PG <- run_3PG(
  site        = d_site, 
  species     = d_species, 
  climate     = d_climate, 
  thinning    = d_thinning,
  parameters  = d_parameters, 
  size_dist   = d_sizeDist,
  settings    = list(light_model = 2, transp_model = 2, phys_model = 2, 
                     height_model = 1, correct_bias = 0, calculate_d13c = 0),
  check_input = TRUE, df_out = TRUE)

library(dplyr)
library(ggplot2)

sel_var <- c('biom_tree_max')

out_3PG %>%
  filter( variable %in% sel_var ) %>%
  ggplot( aes(date, value, color = species) ) +
  geom_line() +
  facet_wrap(~variable, scales = 'free') +
  theme_classic()

image

Third, this probably isn't really an i_write_out.h issue

answered in previous comment https://github.com/trotsiuk/r3PG/issues/69#issuecomment-1048202251

prepare_climate.R

There are a couple issues with this.

Yes, thanks for this. The issue was that I was checkin for NA in all columns instead of the subset (other assumptions which you suggested were already included before). https://github.com/trotsiuk/r3PG/blob/7df6479c690515937089a11816ab1a61e7d3dae2/pkg/R/prepare_climate.R#L49

prepare_input.R

Thanks, DONE

prepare_sizeDist.R

Thanks, DONE

trotsiuk commented 2 years ago

s_sizeDist_correct

s_sizeDist_correct() declares but does not initialize CVdbhDistribution, CVwsDistribution, wsWeibullScale, wsWeibullShape, and wsWeibullLocation. It also does not zero them in these two branches:

For me everything works well, can you please provide reproducible example:

library(r3PG)
out_3PG <- run_3PG(
  site        = d_site, 
  species     = d_species, 
  climate     = d_climate, 
  thinning    = d_thinning,
  parameters  = d_parameters, 
  size_dist   = d_sizeDist,
  settings    = list(light_model = 2, transp_model = 2, phys_model = 2, 
                     height_model = 1, correct_bias = 0, calculate_d13c = 0),
  check_input = TRUE, df_out = TRUE)

library(dplyr)
library(ggplot2)

sel_var <- c('DrelBiaspFS', 'DrelBiasBasArea')

out_3PG %>%
  filter( variable %in% sel_var ) %>%
  ggplot( aes(date, value, color = species) ) +
  geom_line() +
  facet_wrap(~variable, scales = 'free') +
  theme_classic()

image

trotsiuk commented 2 years ago

md_3PG.f95

assume age(, :) and the total basal_area are greater than zero. This is not necessarily the case, may contribute to https://github.com/trotsiuk/r3PG/issues/52,

Can you please provide a reproducible example as discussed in #52

twest820 commented 2 years ago

As the necessary state management code simply isn't present I think md_3PG.f95 itself is probably adequate as a repex (I added those lines in my branch the same day as noting the issue here, which reasonably also qualifies). In my view, at least, the issue really is that timestep state management isn't object oriented. While I've moved state into its own class (and also fixed another occurrence of this particular issue with t_n) I haven't refactored into a state.ClearSizeDist() kind of member function. I probably should.

If the package tests did repeated runs on the r_vba_compare examples in principle they could pick up this particular issue—that's where my unit tests found it—but I suspect the call pattern through R introduces additional zeroing which masks the Fortran-level omissions. Since it's most likely a latent defect that's why it's noted in this list of minor miscellany rather than broken out as its own issue.

trotsiuk commented 2 years ago

Hi Tom, Yes, r3PG eventually diverged from eh excel 3PG mix due to implementations and further development. Therefore, I'm not sure if the further comparison with the excel version will be valuable. We also, during the previous tests, noted that some calculations (e.g calculation average temperature from minimum and maximum) will give a different results between R/Fortran and Excel (on the >6th digit) this will eventually result also in diverge calculations.