Closed gonzalezeb closed 2 years ago
@ropensci-review-bot assign @adamhsparks as editor
Assigned! @adamhsparks is now the editor
G'day, Erika, I'll be your guest editor for allodb.
The package passes all standard local checks and tests with flying colours. But checking the code syntax using lintr::lint_package()
(I for some reason can't use goodpractice::gp()
locally so I'm falling back to this), I get several suggestions for improving the code. I suggest going through them and addressing where appropriate and possible and perhaps noting in your submission those flags that are not appropriate to address for allodb. This will make the review process easier for the reviewers.
Also, I note that your package's website is incomplete, but once accepted to rOpenSci a website will be generated for you and I note that your current README contains the necessary information on how to install and use the package, so we're good there.
Once you've addressed the issues flagged with goodpractice::gp()
or lintr::lint_package()
, I'll move on to finding reviewers for you.
@ropensci-review-bot run goodpractice
Hi, @gonzalezeb, just checking in to see if there’s anything I can do to help you with your submission process?
Hi Adam, thanks for checking in.
I am getting some errors after addressing some issues brought out by lintr::lint_package()
and goodpractice::gp()
, but my collaborator is not, so I am not sure if everything is good (at least not from my laptop).
When I run devtools::test()
to test functionalities, I get that some tests are not running well. When I run goodpractice::goodpractice() on 3/26 I got "Supreme package..." but now I get a few errors. Maybe if you run some checks again you can let me know if you are still seeing issues..
Yes, it passes both the CRAN checks when I run them with, and devtools::test()
, but the code style, etc. as suggested by the rOpenSci guidelines have some suggestions.
This is what I see when I run lint_package()
.
Some of these can easily be addressed, others like the first hit with names, probably have good reasons in your field to have the object name like that. That's fine, but it should be noted for the reviewers.
> lintr::lint_package()
................
R/est_params.R:46:24: style: Variable and function name style should be snake_case.
Nres = 1e4
^~~~
R/est_params.R:64:51: style: Commas should always have a space after.
coords = dfobs[i, c("long","lat")],
^
R/est_params.R:72:66: style: Put spaces around all infix operators.
if (length(unique(df$equation_id)) == 1) df[1, 3] <- df[1, 3]*1.01
~^~
R/get_biomass.R:56:25: style: Variable and function name style should be snake_case.
Nres = 1e4) {
^~~~
R/illustrate_allodb.R:48:31: style: Variable and function name style should be snake_case.
Nres = 1e4) {
^~~~
R/imports.R:6:1: style: Variable and function name style should be snake_case.
data.frame <- function(...) {
^~~~~~~~~~
R/new_equations.R:62:1: style: functions should have cyclomatic complexity of less than 15, this has 24.
new_equations <- function(subset_taxa = "all",
^
R/new_equations.R:71:27: style: Variable and function name style should be snake_case.
new_minDBH = NULL,
^~~~~~~~~~
R/new_equations.R:72:27: style: Variable and function name style should be snake_case.
new_maxDBH = NULL,
^~~~~~~~~~
R/new_equations.R:73:27: style: Variable and function name style should be snake_case.
new_sampleSize = NULL,
^~~~~~~~~~~~~~
R/new_equations.R:74:27: style: Variable and function name style should be snake_case.
new_unitDBH = "cm",
^~~~~~~~~~~
R/new_equations.R:75:27: style: Variable and function name style should be snake_case.
new_unitOutput = "kg",
^~~~~~~~~~~~~~
R/new_equations.R:76:27: style: Variable and function name style should be snake_case.
new_inputVar = "DBH",
^~~~~~~~~~~~
R/new_equations.R:77:27: style: Variable and function name style should be snake_case.
new_outputVar = "Total aboveground biomass",
^~~~~~~~~~~~~
R/new_equations.R:88:34: style: Variable and function name style should be snake_case.
suppressWarnings(new_equations$dbh_unit_CF <-
^~~~~~~~~~~
R/new_equations.R:90:34: style: Variable and function name style should be snake_case.
suppressWarnings(new_equations$output_units_CF <-
^~~~~~~~~~~~~~~
R/new_equations.R:112:5: style: Variable and function name style should be snake_case.
toMerge <- eq_jansen[, c("hsub", "equation_allometry")]
^~~~~~~
R/new_equations.R:113:64: style: Variable and function name style should be snake_case.
eq_jansen$equation_allometry <- apply(toMerge, 1, function(X) {
^
R/new_equations.R:181:30: style: There should be a space between right parenthesis and an opening curly brace.
if (is.matrix(new_coords)){
^~
R/new_equations.R:265:5: style: Variable and function name style should be snake_case.
equationID <- paste0("new", seq_len(length(new_taxa)))
^~~~~~~~~~
R/new_equations.R:266:5: style: Variable and function name style should be snake_case.
coordsEq <- cbind(long = new_coords[, 1],
^~~~~~~~
R/new_equations.R:268:5: style: Variable and function name style should be snake_case.
rcoordsEq <- round(coordsEq * 2 - 0.5) / 2 + 0.25
^~~~~~~~~
R/new_equations.R:270:5: style: Variable and function name style should be snake_case.
koppenZones <- apply(rcoordsEq, 1, function(X) {
^~~~~~~~~~~
R/new_equations.R:270:49: style: Variable and function name style should be snake_case.
koppenZones <- apply(rcoordsEq, 1, function(X) {
^
R/new_equations.R:273:5: style: Variable and function name style should be snake_case.
koppenZones <- as.character(unlist(koppenZones))
^~~~~~~~~~~
R/new_equations.R:301:5: style: Variable and function name style should be snake_case.
dbhCF <-
^~~~~
R/new_equations.R:303:5: style: Variable and function name style should be snake_case.
outputCF <-
^~~~~~~~
R/new_equations.R:305:28: style: Variable and function name style should be snake_case.
suppressWarnings(dbhCF$dbh_unit_CF <-
^~~~~~~~~~~
R/new_equations.R:307:31: style: Variable and function name style should be snake_case.
suppressWarnings(outputCF$output_units_CF <-
^~~~~~~~~~~~~~~
R/resample_agb.R:41:26: style: Variable and function name style should be snake_case.
Nres = 1e4) {
^~~~
R/resample_agb.R:78:47: style: Variable and function name style should be snake_case.
list_dbh <- apply(dfsub[, 1:3], 1, function(X) {
^
R/resample_agb.R:89:5: warning: local variable ‘sampled_dbh’ assigned but may not be used
sampled_dbh <- list_dbh
^~~~~~~~~~~
R/resample_agb.R:97:7: warning: local variable ‘sampled_dbh’ assigned but may not be used
sampled_dbh <- list_dbh[[j]]
^~~~~~~~~~~
R/weight_allom.R:45:31: style: Variable and function name style should be snake_case.
suppressWarnings(dfequation$wN <-
^~
R/weight_allom.R:53:3: style: Variable and function name style should be snake_case.
coordsSite <- t(as.numeric(coords))
^~~~~~~~~~
R/weight_allom.R:54:3: style: Variable and function name style should be snake_case.
rcoordsSite <- round(coordsSite * 2 - 0.5) / 2 + 0.25
^~~~~~~~~~~
R/weight_allom.R:56:3: style: Variable and function name style should be snake_case.
koppenObs <- apply(rcoordsSite, 1, function(X) {
^~~~~~~~~
R/weight_allom.R:56:47: style: Variable and function name style should be snake_case.
koppenObs <- apply(rcoordsSite, 1, function(X) {
^
R/weight_allom.R:64:14: style: Variable and function name style should be snake_case.
dfequation$wE <- vapply(dfequation$koppen, compare_koppen, FUN.VALUE = 0.9)
^~
R/weight_allom.R:86:14: style: Variable and function name style should be snake_case.
dfequation$wT <- 1e-6
^~
R/weight_allom.R:91:3: style: Variable and function name style should be snake_case.
eqtaxaG <- data.table::tstrsplit(dfequation$taxa1, " ")[[1]]
^~~~~~~
R/weight_allom.R:92:3: style: Variable and function name style should be snake_case.
eqtaxaS <- data.table::tstrsplit(dfequation$taxa1, " ")[[2]]
^~~~~~~
R/weight_allom.R:152:3: style: Variable and function name style should be snake_case.
vecW <- dfequation$w
^~~~
tests/testthat/test-est_params.R:5:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-est_params.R:10:39: style: Commas should always have a space after.
long = c(-78,-85),
^
tests/testthat/test-get_biomass.R:5:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-get_biomass.R:25:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-get_biomass.R:92:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-get_biomass.R:106:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-get_biomass.R:122:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-illustrate_allodb.R:10:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-illustrate_allodb.R:31:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-new_equations.R:4:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-new_equations.R:127:3: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-resample_agb.R:19:3: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
tests/testthat/test-weight_allom.R:5:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
{
^
There is a note about UTF-8 strings as well that I'm seeing that you'll need to address as well.
* checking data for non-ASCII characters ... NOTE
Note: found 156 marked UTF-8 strings
Ah, sorry, I didn’t pay attention and realise you’d applied the changes to the repo yet. I’d need to refresh my local branch and rerun. I thought you were having issues identifying and applying fixes to these initial problems.
On 5 Apr 2021, at 09:47, Erika Gonzalez-Akre @.***> wrote:
Thanks @adamhsparks. Most of the suggestion after running lint_package() have already been applied, see for example here and here. I am not sure why they don't show to you. I still need to work on some thought. Let me know if after installing the package again you still see those.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
OK, this is what I get now.
> lintr::lint_package()
................
R/new_equations.R:62:1: style: functions should have cyclomatic complexity of less than 15, this has 24.
new_equations <- function(subset_taxa = "all",
^
R/new_equations.R:115:65: style: Variable and function name style should be snake_case.
eq_jansen$equation_allometry <- apply(to_merge, 1, function(X) {
^
R/new_equations.R:273:51: style: Variable and function name style should be snake_case.
koppen_zones <- apply(rcoords_eq, 1, function(X) {
^
R/resample_agb.R:78:47: style: Variable and function name style should be snake_case.
list_dbh <- apply(dfsub[, 1:3], 1, function(X) {
^
R/resample_agb.R:89:5: warning: local variable ‘sampled_dbh’ assigned but may not be used
sampled_dbh <- list_dbh
^~~~~~~~~~~
R/resample_agb.R:97:7: warning: local variable ‘sampled_dbh’ assigned but may not be used
sampled_dbh <- list_dbh[[j]]
^~~~~~~~~~~
R/weight_allom.R:45:31: style: Variable and function name style should be snake_case.
suppressWarnings(dfequation$wN <-
^~
R/weight_allom.R:53:3: style: Variable and function name style should be snake_case.
coordsSite <- t(as.numeric(coords))
^~~~~~~~~~
R/weight_allom.R:54:3: style: Variable and function name style should be snake_case.
rcoordsSite <- round(coordsSite * 2 - 0.5) / 2 + 0.25
^~~~~~~~~~~
R/weight_allom.R:56:3: style: Variable and function name style should be snake_case.
koppenObs <- apply(rcoordsSite, 1, function(X) {
^~~~~~~~~
R/weight_allom.R:56:47: style: Variable and function name style should be snake_case.
koppenObs <- apply(rcoordsSite, 1, function(X) {
^
R/weight_allom.R:64:14: style: Variable and function name style should be snake_case.
dfequation$wE <- vapply(dfequation$koppen, compare_koppen, FUN.VALUE = 0.9)
^~
R/weight_allom.R:86:14: style: Variable and function name style should be snake_case.
dfequation$wT <- 1e-6
^~
R/weight_allom.R:91:3: style: Variable and function name style should be snake_case.
eqtaxaG <- data.table::tstrsplit(dfequation$taxa1, " ")[[1]]
^~~~~~~
R/weight_allom.R:92:3: style: Variable and function name style should be snake_case.
eqtaxaS <- data.table::tstrsplit(dfequation$taxa1, " ")[[2]]
^~~~~~~
R/weight_allom.R:152:3: style: Variable and function name style should be snake_case.
vecW <- dfequation$w
^~~~
tests/testthat/test-est_params.R:9:39: style: Commas should always have a space after.
long = c(-78,-85),
Are there any that I can provide assistance with?
Hi @adamhsparks. Most issues brought out by lint
have been addressed, only two remain but I am little stuck on them. Maybe you can give me some guidance.
In R/new_equations
, the function contains a lot of 'if' because it does a lot of sanity checks on the user-provided data. We could remove these, at the risk of having undetected problems in the user-provided allometries; and it seems that moving the 'ifs' to another function would make much sense.
In R/weight_allom
, the problem is with "wE" which is the original col name of a matrix (koppenMatrix) been used.
Thanks, @gonzalezeb, for addressing these changes. I think that both of these are acceptable given the reasons stated. I'll begin looking for reviewers now.
@gonzalezeb, can you please update your README badges for allodb using rodev::use_review_badge(436)
and add a NEWS.md file, usethis::use_news_md()
?
@gonzalezeb, can you also please address the errors that are encountered when running checks?
==> devtools::check()
ℹ Updating allodb documentation
ℹ Loading allodb
Writing NAMESPACE
Writing NAMESPACE
── Building ────────────────────────────────────────────────────────── allodb ──
Setting env vars:
● CFLAGS : -Wall -pedantic -fdiagnostics-color=always
● CXXFLAGS : -Wall -pedantic -fdiagnostics-color=always
● CXX11FLAGS: -Wall -pedantic -fdiagnostics-color=always
────────────────────────────────────────────────────────────────────────────────
✓ checking for file ‘/Users/adamsparks/Development/GitHub/Reviews/allodb/DESCRIPTION’ ...
─ preparing ‘allodb’:
✓ checking DESCRIPTION meta-information ...
─ installing the package to build vignettes
E creating vignettes (5.3s)
--- re-building ‘allodb-vignette.Rmd’ using rmarkdown
Quitting from lines 81-87 (allodb-vignette.Rmd)
Error: processing vignette 'allodb-vignette.Rmd' failed with diagnostics:
arguments imply differing number of rows: 9991, 19
--- failed re-building ‘allodb-vignette.Rmd’
SUMMARY: processing the following file failed:
‘allodb-vignette.Rmd’
Error: Vignette re-building failed.
Execution halted
Error: 'col_ed' is not an exported object from 'namespace:cli'
Execution halted
Exited with status 1.
Once that's dealt with, I'll look for reviewers. 🙏
Hi @adamhsparks, allodb
is passing all checks, please let me know if you see the same (I hope!)
devtools::check(remote = TRUE, manual = TRUE)
i Updating allodb documentation
i Loading allodb
Writing NAMESPACE
Writing NAMESPACE
-- Building --------------------------------------------------- allodb --
Setting env vars:
* CFLAGS : -Wall -pedantic -fdiagnostics-color=always
* CXXFLAGS : -Wall -pedantic -fdiagnostics-color=always
* CXX11FLAGS: -Wall -pedantic -fdiagnostics-color=always
-------------------------------------------------------------------------
√ checking for file 'C:\Users\erikab\Dropbox (Smithsonian)\GitHub\allodb/DESCRIPTION' (349ms)
- preparing 'allodb': (11.3s)
√ checking DESCRIPTION meta-information ...
- installing the package to build vignettes
√ creating vignettes (39.3s)
- checking for LF line-endings in source and make files and shell scripts (11.1s)
- checking for empty or unneeded directories
- looking to see if a 'data/datalist' file should be added
- building 'allodb_0.0.0.9000.tar.gz'
-- Checking --------------------------------------------------- allodb --
Setting env vars:
* _R_CHECK_CRAN_INCOMING_USE_ASPELL_: TRUE
* _R_CHECK_CRAN_INCOMING_REMOTE_ : TRUE
* _R_CHECK_CRAN_INCOMING_ : TRUE
* _R_CHECK_FORCE_SUGGESTS_ : FALSE
* NOT_CRAN : true
-- R CMD check ----------------------------------------------------------
- using log directory 'C:/Users/erikab/AppData/Local/Temp/Rtmp6dUAvx/allodb.Rcheck' (1s)
- using R version 3.6.3 (2020-02-29)
- using platform: x86_64-w64-mingw32 (64-bit)
- using session charset: ISO8859-1
- using option '--as-cran' (402ms)
√ checking for file 'allodb/DESCRIPTION'
- this is package 'allodb' version '0.0.0.9000'
- package encoding: UTF-8
N checking CRAN incoming feasibility (35.1s)
Maintainer: 'Erika Gonzalez-Akre <GonzalezEB@si.edu>'
New submission
Version contains large components (0.0.0.9000)
√ checking package namespace information ...
√ checking package dependencies (2.3s)
√ checking if this is a source package ...
√ checking if there is a namespace
√ checking for executable files (1.2s)
√ checking for hidden files and directories ...
√ checking for portable file names ...
√ checking whether package 'allodb' can be installed (11.2s)
√ checking installed package size ...
√ checking package directory (894ms)
√ checking for future file timestamps (624ms)
√ checking 'build' directory ...
√ checking DESCRIPTION meta-information (718ms)
√ checking top-level files ...
√ checking for left-over files ...
√ checking index information (353ms)
√ checking package subdirectories (392ms)
√ checking R files for non-ASCII characters ...
√ checking R files for syntax errors ...
√ checking whether the package can be loaded (355ms)
√ checking whether the package can be loaded with stated dependencies ...
√ checking whether the package can be unloaded cleanly (336ms)
√ checking whether the namespace can be loaded with stated dependencies (360ms)
√ checking whether the namespace can be unloaded cleanly (459ms)
√ checking loading without being on the library search path (780ms)
√ checking use of S3 registration (2.5s)
√ checking dependencies in R code (2s)
√ checking S3 generic/method consistency (783ms)
√ checking replacement functions (335ms)
√ checking foreign function calls (802ms)
√ checking R code for possible problems (5.4s)
√ checking Rd files (681ms)
√ checking Rd metadata ...
√ checking Rd line widths ...
√ checking Rd cross-references (343ms)
√ checking for missing documentation entries (473ms)
√ checking for code/documentation mismatches (1.5s)
√ checking Rd \usage sections (1.1s)
√ checking Rd contents ...
√ checking for unstated dependencies in examples ...
√ checking contents of 'data' directory (696ms)
√ checking data for non-ASCII characters (1s)
√ checking data for ASCII and uncompressed saves (588ms)
√ checking R/sysdata.rda (478ms)
√ checking installed files from 'inst/doc' ...
√ checking files in 'vignettes' ...
√ checking examples (4.7s)
√ checking for unstated dependencies in 'tests' ...
- checking tests (429ms)
√ Running 'testthat.R' (8.7s)
√ checking for unstated dependencies in vignettes (9.1s)
√ checking package vignettes in 'inst/doc' ...
√ checking re-building of vignette outputs (23.6s)
√ checking PDF version of manual (3.6s)
√ checking for detritus in the temp directory
See
'C:/Users/erikab/AppData/Local/Temp/Rtmp6dUAvx/allodb.Rcheck/00check.log'
for details.
-- R CMD check results --------------------------- allodb 0.0.0.9000 ----
Duration: 2m 0.2s
> checking CRAN incoming feasibility ... NOTE
Maintainer: 'Erika Gonzalez-Akre <GonzalezEB@si.edu>'
New submission
Version contains large components (0.0.0.9000)
0 errors √ | 0 warnings √ | 1 note x
Hi @gonzalezeb, I'm getting a NOTE when I run R CMD check
.
Note: found 156 marked UTF-8 strings
Can you check into this?
As it's just a note, not an error in the checks, I'll start looking for reviewers now.
@ropensci-review-bot assign @jeffreyhanson
Hi @gonzalezeb, I just had a quick look through this package it looks really awesome! Great work!
Before I do a through review, I have some broad suggestions for the code/doucmentation that are relevant for multiple functions. I also noticed a few specific things that might be good to fix before I do the reivew too. I've listed them below - what do you think? I think most of them could be addressed by polishing the code/documentation a bit more? I'm also happy to conduct a through review with the current version of the package if you prefer - it will just a lot of small issues (e.g. this term in this file should have code formatting) that might be annoying for you to have to respond to?
The package website isn't working (https://forestgeo.github.io/allodb/). I think there might be a bug in the GitHub Actions code for generating the package website (e.g. see errors here).
I feel like the function documentation could be improved by using code formatting (e.g. use "`character` vector" instead of "character vector"). This would help users understand that you are referring to a specific data type. This suggestion applies to multiple functions - what do you think?
I feel like the function documentation could be improved by using more precise terminology. For example, in the documentation for function parameters, I think it would be helpful to specify the exact class for arguments (e.g. using "`numeric` vector" instead of "numerical vector", because "numerical" isn't a class in R). Additionally, I think it would be helpful to include classes in the documentation for function outputs (i.e. in @return
sections). For example, this sentence currently says "A data frame of resampled DBHs and associated ..." and could be updated to say "A data frame (`tibble::tibble()` object) of resampled DBHs and associated ...". This suggestion applies to multiple functions - how does that sound?
Also, I feel like the documentation for function parameters could be more consistent. I really like how the documentation for the dbh
parameter in get_biomass
function has the format "A [class-name] containing [...]". I wonder if you could update the parameter documentation for the other parameters -- e.g. the wna
parameter -- so that they all follow this format (e.g. wna
currently reads like "This parameter is used [...]" and doesn't contain class information).
It's not really the best practice to incude a non-R script file in the /R folder. Perhaps the sysdata.rda
file could be moved to a more apropriate location?
I feel like the it might helpful to include some code to validate function arguments. This could help users understand why their code isn't working if they supply incorrect arguments (e.g. if they supply a numeric
vector instead of a character
vector). For example, I think this argument validation code is fantastic and will be really useful for users, and I wonder if you could add similar code to verify arguments to all parameters in each function? If you're interested in R packages to help with this, I personally use the assertthat R package (e.g. see this function) and I have heard good things about checkmate R package too. What do you think?
Could you remove the unused Travis badge in REAMDE (because it seems that GitHub Actions is used for CI)?
The .buildignore folder seems to contain random R script and meeting notes. I don't see the benefit for keeping this in the code repository? Perhaps you could move them to a wiki, issues, or a seperate repo? Sorry if I'm missing something?
This function (and maybe others?) over-ride the global state. This is not ideal because it could cause issues for users when they do not expect such changes. For example, imagine a user is trying to use this package within an analysis, and they manually use set.seed()
to handle random number generation for simulations in various parts of their analysis. If this function uses set.seed()
and does not revert it afterwards, then it could mean that the user's subsequent simulations are all based on the same seed even though they used set.seed()
to change the seed in their code. If you're interested in suggestions on how to address this, I personally like to use the withr R package for stuff like this. For example, instead of using the set.seed()
function, you could use the withr::with_seed()` function. Does that help?
eval(parse(....))
is rarely a good idea unless you have no alternatives. This is because it can introduces bugs that are hard to track down and it can also cause security problems. In this function, I wonder if you could you could use some combination of expression
and substitute
to acheive the same result?
I feel like it would be easier for users to use this function (and maybe others too?) if the parameters were re-ordered such that all parameters without defaults came before those with defaults. For example, to run the get_biomass()
function with default parameters and without using named arguments, I would have to do something like get_biomasss(x, y, , z)
instead of get_biomass(x, y, z)
(to specify arguments for dbh
, genus
, and coords
). But maybe there's a reason for this particular order for parameters? Sorry if I'm missing something obvious.
Please let me know if anything I've written isn't clear or doesn't make sense, and I can follow up with more details? I could also be entirely wrong about some things too, so please do correct me if I'm missing something obvious :)
I'll try again with the bot. I'm new here, please bear with me.
@ropensci-review-bot add @jeffreyhanson to reviewers.
@ropensci-review-bot add @jeffreyhanson to reviewers.
@ropensci-review-bot add @jeffreyhanson to reviewers
@jeffreyhanson added to the reviewers list. Review due date is 2021-06-14. Thanks @jeffreyhanson for accepting to review! Please refer to our reviewer guide.
Hello @jeffreyhanson, your suggestions are manageable, thanks for catching all that. We will work on those and will get back to you if we have questions. Will try to have things solved by 5/31 so you can proceed with a through review.
Hi @gonzalezeb, ok, that sounds great - thank you!
Just an update @gonzalezeb, I’m still looking for a second reviewer, sorry about any delays with this.
@ropensci-review-bot add @jstillh to reviewers
@jstillh added to the reviewers list. Review due date is 2021-06-24. Thanks @jstillh for accepting to review!
Thank you for accepting the review for allodb, Jonas. Please let me know if I can assist you with anything along the way.
Hi @jeffreyhanson, we solved many of the issues you pointed out and may justify a few later. Hope you can go ahead with reviewing the functionality of the package. Thanks!
Hi @gonzalezeb
Here is my review. I really appreciate the package and the work you & the co-authors have put into developing it.
I think the package will be useful for a lot (if not every) forest ecologist... I reviewed the package looking at it as a regular user - sorry if i misunderstood some of the components of your package.
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 9
The authors provide a package allowing to calculate biomass for individual trees of extratropical tree and shrub species. A package including biomass equations for extratropical species is a useful and timely complement to the BIOMASS
package by Chave et al. for tropical species. The authors apparently developed the package focusing on the plots of the ForestGEO network. Given that the authors aim at providing a package complementing the BIOMASS
package, i reviewed the functionality of the package not only for the use on single/multiple sites but for the use along latitudinal gradients and focused on European tree species that i am familiar with.
Although i appreciate the general idea of the package and the large effort that the authors put into its development, i have some concerns regarding the concept applied when calculating biomass and some general comments on code and documentation that i feel should be addressed:
The core function get_biomass
provides the user with one estimate of aboveground biomass based on the diameter at breast height (dbh) of a tree. This function then applies one or several of the biomass equations based on climatic similarity and similarity between species. The estimate is based on a large set of biomass-equations compiled from a wide range of sources - i very much appreciate the work the authors have put into the compilation of this dataset.
I like the general framework of this package and the option to add additional equations via the function new_equations
but i have concerns regarding the workflow implemented in the package. These mainly focus on the following points:
Documentation of selection and weighting
The function get_biomass
returns a vector
with an estimate of the biomass for each tree. The calculation of this value is somewhat opaque as long as the user does not actively change some of the arguments (which by default do not need to be provided) of the get_biomass
function. Even if species / genus specific equations are available, this function includes additional, less specific equations. These are given a certain weight based on similarities using the weight_allom
function and might contribute little to the final estimate. However, this is not obvious to the user, as the package only provides one value for each tree. I would appreciate if the function get_biomass
would not only return a vector but more metadata on the steps that resulted in the calculation of the final result. This could be achieved by returning a list containing the final values, all values with the associated weights, the output of new_eqtable
documenting the equations used and the output of weight_allom
.
I understand that the function illustrate_allodb
should give the user an impression on the results of the resampling - but it does not provide the user with all information (e.g., no information on weights) and it requires the user to run an additional function.
Inclusion of equations based on climatic simlilarity
To check for climatic similarity, the package makes use of the Koeppen classification. This is a feasible approach - but it has
some drawbacks in the implementation: Based on the Koeppen classification of the site (provided through the coords
argument), a certain set of equations is used. This can result in the exclusion/ downweighting even of species specific biomass equations. The illustrative example below provides biomass estimates for Fagus sylvatica along a latitudinal gradient from Northern Italy to Northern Germany and the output of illustrate_allodb
for two sites used in this latitudinal gradient. Obviously a completely different set of equations is used - in the second site, no species specific
equations are used. (Btw: there seems to be a typo, i guess it should be Populus instead of Pupulus.)
The differences between the sites observed here are mostly driven by a change in the Koeppen zones and a different set of equations applied due to the change in Koeppen zones. I am not convinced that
this makes ecologically sense. Additionally, the Koeppen-classification for specific equations seems to be based on
the value for longitude and latitude provided in the original publication of each biomass equation. However, for equations
covering a wider range (e.g., the equations published in Forrester et al. 2017), the authors seem to have extracted the Koeppen value for the mean of both the lat and long range provided. This may result in a very specific classification not representative for the species and a subsequent down-weighting of this equation when applied in the midst of the species’s distribution range: For the ForestGeo Plot in Zofin, which is mainly covered with beech and lies in the center of the distribution range of this species, this results in a down-weighting of the equation provided in Forrester et al. 2017 as the value wE
provided in the dataset for this specific combination in koppenMatrix
is 0.7. I did not check if other equations have a higher weight, but i think this issue should be addressed by the authors.
If the Koeppen classification is not included in the dataset koeppenMatrix
or the specific combination if Koeppen zones is
included with a weight of 0, this results in the exclusion of a specific equation and a subsequent error message (if only one
equation is provided) or the inclusion of other, not species specific equations, see my comments further down.
In general, the weighting concept of the different biomass equations should be more transparent for the package user and probably requires conceptual changes.
data <- allodb::get_biomass(rep(50, 41), rep("Fagus", 41), rep("sylvatica", 41), coords = as.matrix(cbind(rep(10, 41), seq(44, 54, 0.25))))
rcoords_site <- round(as.matrix(cbind(rep(10, 41), seq(44, 54, 0.25))) * 2 - 0.5)/2 + 0.25
koeppen <- apply(rcoords_site, 1, function(xk) {
subset(kgc::climatezones, Lon == xk[1] & Lat == xk[2])$Cls
})
plot(seq(44, 54, 0.25), data, xlab = "lat", ylab = "biomass [kg]", ylim = c(0, 3000), pch = 21, cex = 1.5, bg = koeppen)
![unnamed-chunk-1-1](https://user-images.githubusercontent.com/31648011/121906533-7da22c00-cd2b-11eb-97fe-fe8d6a432e5f.png)
allodb::illustrate_allodb("Fagus" , coords = c(10, 44), "sylvatica")
allodb::illustrate_allodb("Fagus" , coords = c(10, 46), "sylvatica")
data("koppenMatrix", package = "allodb")
data("sites_info", package = "allodb")
koppenMatrix$wE[koppenMatrix$zone1 == sites_info$koppen[sites_info$site == "zofin"] & koppenMatrix$zone2 == "Csb"]
## [1] 0.7
Use of eval(parse())
As @jeffreyhanson already pointed out here, the use of eval(parse())
is not good practice for the reasons he laid out nicely in his comment. You seem to have replaced the eval(parse())
in this function but then changed it back according to this issue because the str2lang
that you used to replace eval(parse())
is not available for users with R < 3.6.0.
I was not aware of the str2lang
function and i am not convinced that using this function will solve the issue raised by @jeffreyhanson: The issue with tracing bugs will still exist even when changing to the str2lang
function and the general security concerns associated with the use eval(parse())
can probably not be solved with replacing the parse
function by str2lang
as str2lang
is only a special case of parse
.
The concerns raised are associated with the combination of eval(parse())
and not with the use parse()
. (btw: when you removed str2lang
in this function, you only removed it on line 93 but not within the else
statement on line 101).
I think working along the lines laid out by @jeffreyhanson (i.e. using expression
and substitute
) would be a good option. Another option would be to create internal functions for all forms of the equations in dfequation
and then apply these.
Documentation
I think the vignette should be improved. It would be great if you could provide some more information to the user especially on the estimation of the parameters in est_param
and what effect the weighting in weight_allom
has on the result.
As @jeffreyhanson already pointed out here, consistently using code formatting throughout the documentation would be great (e.g., “numeric” in the description of the resample_agb
function could be change to numeric
).
Naming conventions
The use of the package could be improved if you consistently order the arguments and stick to one naming convention throughout the package. sitespecies
does not follow the naming convention you used for the datasets in the package - all other datasets are named following the convention name1_name2. This applies for the dataset koeppenMatrix
as well.
Performance
The calculation of the biomass is relatively slow if for every tree a unique coordinate is provided - i guess this is caused by the subset operations performed in the weight_allom
function. Given that a user might provide individual coordinates for each tree on a site and the coarse resolution of the koeppen classification, it might be an option to first check if the coordinates vary in such a way that these need to be subset individually or if a single coordinate can be used.
Such performance considerations are necessary, as the calculations are rather slow at the moment. For example, the biomass calculation for 100 trees with tree level coordinates require around 9s.
system.time(allodb::get_biomass(dbh = c(1:100), genus = rep("Fagus", 100)
, species = rep("sylvatica", 100)
, coords = as.matrix(cbind(rep(10, 100), sample(seq(48.001, 48.1, 0.0001), 100)))))
## user system elapsed
## 8.81 0.03 8.86
get_biomass
The get_biomass
function creates a biomass estimate based on dbh
, genus
, coords
and species
. It allows the user to provide a selection of the biomass equations via the argument new_eqtable
. This seems to be an important feature of this function. However, if the values provided in the column koppen
do not appear in kgc::climatezones
, the function throws an error and the error message does not point the user to the reason of the error - see my specific comment on the weight_allom
function below. Given the coarse resolution of the kgc::climatezones
dataset of 0.5 degrees, this might be problematic especially in mountainous regions with a large altitudinal gradient. Additionally, this is an issue which might prevent users from applying equations that could be applied for the respective region - see my general comment above.
equations_to_use <- allodb::new_equations(subset_taxa = "Fagus")
allodb::get_biomass(dbh = 20
, genus = "Fagus"
, species = "sylvatica"
, coords = c(10, 46)
, new_eqtable = equations_to_use)
## Warning: Unknown or uninitialised column: `equation_id`.
## Error in if (any(nEQ <- vNms != make.names(vNms))) vNms[nEQ] <- paste0("`", : missing value where TRUE/FALSE needed
allodb::get_biomass(dbh = 20
, genus = "Fagus"
, species = "sylvatica"
, coords = c(10, 46))
## [1] 217.0154
weight_allom
The weight_allom
function creates weights for all biomass functions which will later be used in the calculation of the biomass.
Beyond the concerns i already mentioned above, i only have a minor comment on this function: The kgc::climatezones
dataset does not cover large bodies of water. The values in dfequation$wE
will therefore be -Inf
for all coords
that are not covered by kgc::climatezones
The weight_allom
function therefore returns -Inf
causing the function to stop without a easily understandable error message. I would appreciate if you either add an error message like if(dfequation$wE == -Inf){stop('Error message')}
or if you would just flag the respective values so they are omitted in later calculations.
new_equations
The function new equations
allows the user to a) subset to a specific set of equations for one or several taxa and b) to add additional equations. I appreciate the latter option, as this adds a lot of flexibility to the package.
Thanks, @jstillh, for your quick and comprehensive review!
@jeffreyhanson, are you able to provide your review soon? It was due on Monday. Please let me know if you need more time.
Oh - I'm so sorry! I mistakenly thought my reivew would be due later since @gonzalezeb was working on updating the package - I'll submit review by the end of the week. I hope that doesn't cause too many problems for you.
@jeffreyhanson, all good, I understand the confusion as I obviously interpreted it to be due on the original date. Take all the time that you need.
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 4
The allodb R package provides a framework for estimating tree biomass based on allometric equations. Overall, my impression is that the package is very well designed. I think the authors have done a fantastic job of providing a highly flexible interface -- allowing users to customize calculations as needed -- that is easy to use. Since I have zero expertise in forestry science, I cannot comment on the novelty, scope, or methodological correctness of the package. As such, I have limited my review to focus on code and documentation.
I previously posted several suggestions for improving the package. Although I am glad that some of these suggestions were used to help improve the package, I would appreciate hearing why some of these suggestions were not incorporated? It's entirely possible that these suggestions were irrelevant, infeasible, or undesireable. For example, is there some particular benefit for not describing classes precisely, or not validating user inputs of which I am unaware? Below I've listed some specific comments. Please let me know if anything I have written is unclear, irrelevant, or incorrect?
I mentioned earlier that I thought it would be helpful to include some code to validate function arguments. From what I can tell, the code hasn't been updated to include such validation - is that correct? I'm sorry, was my previous comment unclear, or did I misunderstand something? To help show what I mean, here's some example code that could be used to validate arguments for the est_params()
function. Please note that I'm not saying you have to use this specific code, I'm just trying to provide an example of what the code could look like (e.g. you might prefer to use the checkmate R package instead):
est_params <- function(genus,
coords,
species = NULL,
new_eqtable = NULL,
wna = 0.1,
w95 = 500,
nres = 1e4
) {
# assert arguments are valid
assertthat::assert_that(
## genus
is.character(genus),
assertthat::noNA(genus),
## coords
is.numeric(coords),
assertthat::noNA(coords),
length(coords) == 2
### ensure valid longitude
isTRUE(coords[1] >= -180),
isTRUE(coords[1] <= 180),
### ensure valid latitude
isTRUE(coords[2] >= -90),
isTRUE(coords[2] <= 90),
## species
inhrerits(species, c("character", "NULL")),
## new_eqtable
inhrerits(new_eqtable, c("data.frame", "NULL")),
## wna
assertthat::is.number(wna),
assertthat::noNA(wna),
## w95
assertthat::is.number(w95),
assertthat::noNA(w95),
## nres
assertthat::is.count(nres),
assertthat::noNA(nres)
)
I previously suggested that it would be good to avoid using eval(parse(...))
if possible. Looking through the commit history, it seems like an alternative function was attempted (i.e. str2lang(...)
) but this was later removed to avoid compatibility issues with older R versions. Thanks for looking into this! I appreciate these efforts of trying to get it working. I was trying to think of alternative strategies to avoid eval(parse(...))
, but I think the over-arching issue is that some equivalent is needed because the equations are (ultimately) stored as character
objects (e.g. something like "log(dbh) * 5"
). I suppose some alternatives would be to update the equations
data frame to replace the character
objects with function
objects, or replacing the equations
data frame with a list
of purpose-built equation objects that can be used to perform the calculations. However, these seem needlessly complex compared to the current implementation? Maybe this could be one instance where eval(parse(...))
is not bad idea?
To help avoid potential issues with eval(parse(...))
, maybe it would be worth (1) writing a function that can validate if a given character
is a valid equation or not (it could give a useful error message if the function is not valid), (2) adding this validation function to the other functions that accept equation tables as inputs, and (3) adding unit tests to validate all built-in equation (i.e. specified in the equation
table) using this new function? I'm sorry that I can't be of more help.
The package contains functions that output different data frame classes. For example, the resample_agb()
function outputs a tibble
(well, tbl_df
object), and the est_params()
function outputs a data.table
object. Since these functions output different classes, this means that the user will need to keep remember exactly what data frame class is output from each function -- increasing the cognitive load required to use the package. In other words, this behavior breaks the "law of least surprise". Is there a particular reason for this? I'm sorry if I'm missing something obvious?
To help make the package easier to use, may I suggest picking one data frame class (e.g. data.frame
, tibble
or data.table
) and using that as the basis for all user-facing functions? This way the user can simply remember something like "The allodb works with data.table
objects, so I need to make sure my input data are in that format, and convert any outputs from that format as needed". Indeed, if there are certain reasons for using certain data frame classes in certain functions (such as for performance), perhaps different classes could be used internally (e.g. a function gets a data.frame
as input, converts it to a data.table
for performance, does some processing, and then outputs a data.frame
)? What do you think? I'm just trying to help suggest ways to make the package easier to use.
The nls()
function used in est_params()
is exceptionally useful for fitting nonlinear models. Since it can experience convergence issues (e.g. see here, here, here, and here), it is sometimes necessary to customize the default settings to obtain reliable outputs (e.g. change optimizer, increase number of iterations). Given this, would it be possible to update the est_params()
function so that users can customize the optimization parameters (e.g. specify the algorithm
and control
parameters for nls()
, see here for details)? It might also be useful if the function could somehow output information on the performance of the models too (e.g. model goodness of fit statistics)? But maybe I'm missing something and these concerns are not relevant here? What do you think?
I mentioned earlier that I thought the documentation could be improved by using more precise terminology to describe function parameters and outputs. For example, using "`numeric` vector" instead of "numerical vector", because "numerical" isn't a class in R). Also, specifying exactly what data frame class (e.g. data.frame
or data.table
) when referring to R objects (e.g. here). From what I can tell, the documentation hasn't been updated based on this suggestion - is that right? I'm sorry if my previous comment wasn't clear? Or maybe I'm misunderstanding something? If so, could you please explain why the current terminology is preferable?
It would be difficult for me to understand exactly how the get_biomass
, est_params
, and resample_agb
functions work based only on reading the documentation. I wonder if it is possible to provide more details on the underlying calculations in a Details section? This suggestion may not be relevant though, because I might lack the required experience in forestry science to understand the documentation in its current form?
I mentioned earlier that I thought the documentation could be improved by using code tags to specify that certain words correspond to object classes. From what I can tell, it seems these tags are still missing from a few places (e.g. like "NULL" here). Could you please double check the documentation and add in any missing tags? Also, when documenting character objects, I would suggest including quotation marks (e.g. using "all"
instead of all
here) so it's clear the documentation is referring to a character object.
Have you thought about including more links to help the user navigate the documentation (e.g. here where the documentation refers the reader to another manual entry, or here where the documentation refers to another function)? Perhaps the @seealso
section could also be used to help users navigate between related functions?
Have you considered using equation commands (e.g. \eqn{}
, deqn{}
) in the documentation to format equations (e.g. such as this equation; see here for example)?
It might be helpful to use the \pkg{}
command when referring to the name of packages in the documentation (e.g. such as \pkg{allodb}
in here)?
Is it possible to explicitly state the type of object in the documentation for each dataset? I appreciate that the documentation says that the objects are a "data frame", but it would be useful to precise what implementation (e.g. data.frame
, data.table
, table
, tibble
objects are all of "data frame" objects).
It might be helpful to add example sections to the documentation for the built-in datsets so that the reader can see a preview of each dataset. What do you think? For example, in data.R, you could insert some code like this:
#' @examples
#' # preview the dataset
#' print(head(equations))
#'
I also noticed some potential typos in the documentation: here, here.
tibble
objects (e.g. equations
) and others are data.frame
objects (e.g. genus_family
). I was wondering if it would be useful to have all built-in datasets represented using the same class (e.g. all as tibble
objects)? This is because tibble
and data.frame
objects have markedly different behaviors -- even though a tibble
inherits from a data.frame
-- and these differences can cause errors when the user/code expects a different behavior. This is related to the "law of least surprise" that I mentioned earlier.@jstillh and @jeffreyhanson, thank you so much for your thoughtful and kind reviews. Your suggestions will improve the package 100%. I will contact you if we have questions, we truly appreciate your time on this.
Hi @gonzalezeb, just following up, it's been four weeks since the reviews came in. If you are experiencing any difficulties or issues, feel free to flag them with the reviewers or me and we'll do our best to assist you through the process.
Hi @adamhsparks. Sorry, I am a little behind with this and have solved only minor issues. There are some comments on the conceptual aspects of the package (i.e. here) where I get the point but changing our approach will require some (lot's!) of work, so I am giving it a careful thought. How should I tackle issues like those?
Hi @gonzalezeb, which specific aspect were you linking to and asking for guidance on? Your link only links to the whole issue here, not one of the reviewers' comments or issue that they may have opened in your repository?
Hi @gonzalezeb, do you need more time to work on this? We can put a holding tag on it and revisit it after three months to see where things are for up to a year. Would that be useful to you?
Hi @adamhsparks, thanks for checking. We have made some substantial progress but I still need to work in some minor stuff. The plan is to finish this before Sep 15 as we want to submit our revised manuscript to MEE also by then. Will that be ok with you all?
Sounds good, Erika! Thanks for updating me.
Hi @adamhsparks, I wasn’t sure how to respond to the multiples reviews so I hope the following is sufficient. We are very grateful for the thoughtful suggestions by reviewers, and have implemented most.
Initial suggestions by @jeffreyhanson (5/19/21), have been addressed:
tibble::tibble()
object), etc).validate_equations()
to tests that all equations in equations$equation_allometry
are a valid R expression of the dbh or h variables (since this could be one of the main mistakes on calculating biomass). This function can be called from inside any function within allodb
and allows users to pass a character vector of equations. We also added consistency checks in the get_biomass function of dbh and coordinates provided by the user.set.seed
with withr::with_seed
in the resample_agb
function.eval(parse(…))
, see next comment.Comprehensive review by @jeffreyhanson (6/17/21), has also been addressed: About code:
tibble
subclass.
About documentation:get_biomass
and weight_allom
functions), added @seealso
to help users navigate btw documentations, use quotation marks for character vectors, added example sections to the documentation for the built-in datasets, etc.Suggestions by @jstillh (6/14/21), have been addressed:
get_biomass
and weight_allom
core functions have been improved so the weighting concept is more transparent, specifically we give more details in the Details section. Regarding adding additional outputs to the get_biomass
function, we think that would make the function too slow.weight_allom
function (i.e. now it deals with coordinates that are not associated with any koppen climate zone) to address the specific comments on the fact that the get_biomass function throws an error when the equation table does not contain equations corresponding to the Koppen climate at the provided coordinates (argument coords). The proposed solution is to change koppenMatrix, replacing climate combinations with weight we=0 (2 completely different climates) to we = 1e-6, similar to what is done with the taxonomic weight. This way, equations with we = 1e-6 will only be significantly used in the recalibrated equations when no better climate combination is found, therefore avoid getting an error. In regards to Forrester et al 2017 equations, we revised them and now many coordinates correspond to max or min values given in original publication (not the mean except for one site). The reviewers advice to avoid the call eval(parse(text = ...))
(item 9 here, and “Code and documentation” here), mainly because (a) it can be unsafe, and (b) error prone.
Instead they suggest storing the equations as objects such as expressions or functions, and manipulate them with substitute()
.
We thank the advice and, after considering it carefully, we would like to share our updated opinion. We first discuss the problems of eval(parse())
and then the alternative solutions.
To make our discussion concrete we offer some code examples.
library(dplyr, warn.conflicts = FALSE)
library(allodb)
eval(parse())
as its problems seem to be balanced by the benefits we see in this particular use case, and the alternative solutions seem too complex. This is in line with @jeffreyhanson’s comment on on Jun
17:I previously suggested that it would be good to avoid using
eval(parse(...))
if possible. (…) However, [the alternatives] seem needlessly complex compared to the current implementation? Maybe this could be one instance whereeval(parse(...))
is not bad idea?
Online we found that parsing text would be a risk if we hosted an application on a server we own and allow users to input arbitrary text. A malicious user could damage our server.
But this is not our case. The allodb package is to be used as an analysis tool, directly on the users’s computers. And the input text comes mainly not from users but from a dataset we built.
We acknowledge that bugs can hide in the unevaluated text, as invalid code is still a valid text object.
valid <- "dbh + 1"
eval(parse(text = valid), envir = list(dbh = 1))
#> [1] 2
invalid <- "1 <- dbh"
try(eval(parse(text = invalid), envir = list(dbh = 1)))
#> Error in 1 <- dbh : invalid (do_set) left-hand side to assignment
But bugs can also hide in unevaluated expressions. Using substitute()
to manipulate expressions does not ensure an expression is valid.
invalid <- quote(1 <- dbh)
substitute(x, list(x = invalid))
#> 1 <- dbh
try(eval(substitute(x, list(x = invalid))))
#> Error in 1 <- dbh : invalid (do_set) left-hand side to assignment
We are comfortable storing and manipulating text. We can inspect equations stored as text directly from the dataframe, and we can manipulate them with familiar tools.
# Storing text
equation <- "1.2927 * (dbh^2)^1.36723"
dbh <- "(sampled_dbh * 0.393701)"
# + We can inspect the equations directly from the dataframe
tibble(equation, dbh)
#> # A tibble: 1 x 2
#> equation dbh
#> <chr> <chr>
#> 1 1.2927 * (dbh^2)^1.36723 (sampled_dbh * 0.393701)
# - To manipulate the equations we use familiar tools
modified <- gsub("dbh", dbh, equation)
modified
#> [1] "1.2927 * ((sampled_dbh * 0.393701)^2)^1.36723"
eval(parse(text = modified), list(sampled_dbh = 1))
#> [1] 0.1010408
In contrast, we are not comfortable storing or manipulating expressions with substitute()
. We found great help in the first edition of Advanced R but we still think the approach is difficult to understand, which makes us more likely to introduce bugs than with eval(parse())
.
# We need to create an escape hatch because substitute() doesn't work if we
# already have an expression saved in a variable
# http://adv-r.had.co.nz/Computing-on-the-language.html#substitute
substitute_q <- function(x, env) {
call <- substitute(substitute(y, env), list(y = x))
eval(call)
}
equation <- quote(1.2927 * (dbh^2)^1.36723)
dbh <- quote((sampled_dbh * 0.393701))
# - We can't inspect the equations directly from the dataframe :-(
tibble(equation = list(equation), dbh = list(dbh))
#> # A tibble: 1 x 2
#> equation dbh
#> <list> <list>
#> 1 <language> <language>
# - To manipulate the equations we need unfamiliar tools
modified <- substitute_q(equation, list(dbh = dbh))
modified
#> 1.2927 * ((sampled_dbh * 0.393701)^2)^1.36723
eval(modified, list(sampled_dbh = 1))
#> [1] 0.1010408
As @jeffreyhanson advised, instead of using expressions and substitute()
we prefer to ensure all text-equations are valid or throw an informative error message.
validate_equation <- function(text, envir = parent.frame()) {
tryCatch(
eval(parse(text = text), envir = envir),
error = function(e) {
stop("`text` must be valid R code.\nx Invalid: ", text, call. = FALSE)
}
)
invisible(text)
}
dbh <- 99
validate_equation(valid)
try(validate_equation(invalid))
#> Error : `text` must be valid R code.
#> x Invalid: <-1dbh
f <- function(text) {
validate_equation(text, envir = list(dbh = 1))
# Some useful code
text
}
f(valid)
#> [1] "dbh + 1"
try(f(invalid))
#> Error : `text` must be valid R code.
#> x Invalid: <-1dbh
@gonzalezeb, thank you for your comprehensive overview of the changes that you have made to the allodb package.
@jeffreyhanson and @jstillh could you please let me know if you are satisfied with the changes to the package or have any further suggestions that may improve it?
Awesome - thank you very much @gonzalezeb. @adamhsparks, I'll try and take get back to you next week.
@gonzalezeb thanks a lot for the overview of the changes - i'll try to go through them in the next week as well and get back to you, @adamhsparks .
Thank you @gonzalezeb for providing such detailed responses to the points I raised earlier. I especially found your discussion on the costs and benefits of eval(parse(...))
really useful. I think the R package is looking absolutely brilliant! Most of the points I raised earlier have been addressed. I've made a couple of suggestions below that I think could help improve package.1 All things considered, these are pretty minor - just some documentation tweaks. If those could be addressed, I'd say that all my suggestions have been satisfied. I've also listed several additional minor comments too. Based on my understanding of rOpenSci guidelines, they won't/shouldn't influence whether the package should be accepted or not (but please correct me if I am wrong). I've listed them just to be thorough, and in case the package authors think they are worth addressing.
Package checks throw the following warning on my computer (i.e. running devtools::check()
). Could you please fix it? Let me know if you're unable to reproduce this error and I can try to provide more information?
Undocumented code objects: ‘equations’ Undocumented data sets: ‘equations’ All user-level objects in a package should have documentation entries. See chapter ‘Writing R documentation files’ in the ‘Writing R Extensions’ manual.
Thank you very much for looking into potential issues with using eval(parse(...))
and potential alternatives. I agree that the main issue would be if the code was run inside a web application that allowed individuals to enter in arbitrary text. Although additional issues could arise on standard desktop/laptop computers too (e.g. a malicious actor could potentially update the library files), I guess at this point malicious actors would have easier ways to accomplish their goals then modifying R library files. So, given the costs and benefits outlined by @gonzalezeb, I think it's perfectly reasonable for the package to use eval(paste(...))
. Maybe it would be worth adding some warning text to the documentation for the function(s) that use eval(parse(...))
so that users are aware of potential issues? This way it is clearly the responsibility of the user to ensure that they take appropriate steps to avoid security issues.2
Also, I noticed one of the links provided by @gonzalezeb mentioned that RAppArmor::eval.secure()
can prevent system calls with administrator privileges. If it's not too much hassle, would it be worth using that where possible? It seems that this R package is only available on Linux though - so perhaps it's not worth the hassle (because then some additional code would be needed to conditionally use RAppArmor::eval.secure()
if it's available, and otherwise use eval(parse(...))
). I just thought I'd mention it just in case.
The README.md file might need re-knitting because the output displayed for est_params()
is a data.table
object (instead of a tibble()
object per the current version).
The documentation doesn't seem to format code correctly (i.e. using back ticks). For example, NULL is referred to as "NULL" instead of NULL
(e.g., see here, or here). Personally, I find syntax formatting helpful when reading documentation (e.g., https://readr.tidyverse.org/reference/read_delim.html). I don't think rOpensci guidelines have style mandates for code in package documentation, so I don't think this is critically important.
The main functions (e.g.est_params()
, get_biomass()
, resample_agb()
) don't contain code to validate all arguments. Although @gonzalezeb mention that "the number of possible mistakes may be infinite", I would suggest that perhaps the number of expectations (e.g. properties or characteristics) for a valid input is much lower. For example, in order to verify that an argument is a single integer (e.g. 5
), one does not need to verify that the parameter is (i) not a character, (ii) not a NULL
, (iii) not a double, (Inf) etc. Instead, one could verify that it conforms to a set of expectations (e.g. inherits(x, "integer") && identical(length(x), 1) && all(is.finite(x))
). This is made much easier using existing R packages developed for this purpose though. However, I don't think rOpensci or CRAN require argument validation for published packages, so I don't think this issue critically important. I'm just making this comment to be thorough :)
This is a super minor suggestion. The equation in the documentation for the est_params()
function has the text " with " formatted as a maths term (rather than text), which makes it slightly difficult to read on the package website. To address this, I think you could change this line to define the equation as:
\deqn{AGB = a * dbh ^ b + e, \space \mathit{with} \space e ~ N(0, sigma^2)}{AGB = a * dbh ^ b + e, with e ~ N(0, sigma^2)}
1 Please note that I still (unsurprisingly) lack the forestry experience needed to evaluate the novelty and methodology of this package, and so continue to limit my reviews to code quality and documentation.
2 Just to be clear, I do think that -- where possible, feasible, and reasonable -- R package developers should ideally -- if only in principle -- minimize security risks, even if software licenses absolve developers of responsibility. It's just I think that eval(paste(..))
is not so terrible here, given the noted trade-offs that are specific to this package. Perhaps other will disagree, and that's fine. Please note that I do not have any experience with cybersecurity. If you want an informed view point on such matters, I suggest you talk to someone else :)
Thanks @jeffreyhanson, I have adopted most of your comments, they are very helpful!
Thank you @gonzalezeb for the very detailed response to the reviews. As @jeffreyhanson, I like your detailed response regarding the use of eval(parse(...))
. In general, i am very pleased with the revisions and the changes you implemented in the functions. I do not have any more suggestions beyond the ones by @jeffreyhanson.
weight_allom
and the get_biomass
function and that you implemented an informative error message if coordinates are not covered by the koppen data. Thanks for this contribution.
Date accepted: 2021-09-27 Submitting Author: Erika Gonzalez-Akre (@gonzalezeb) Other Authors: Camille Piponiot (@cpiponiot), Mauro Lepore (@maurolepore), Kristina Teixeira (@teixeirak) Repository: https://github.com/forestgeo/allodb Version submitted: 0.0.0.9000 Editor: !--editor-->@adamhsparks<!--end-editor-- Reviewers: @jeffreyhanson, @jstillh
Due date for @jeffreyhanson: 2021-06-14 Due date for @jstillh: 2021-06-24Archive: TBD Version accepted: TBD
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
allodb
integrates compiled data and built-in functions to calculate tree above-ground biomass (AGB) which is a critical value used by forest ecologists to estimate the amount of carbon sequester and resealed by forests.allodb
falls in the "field and laboratory reproducibility tools" category as our package intents to improve and standardize the calculation of a widely used forest metric.Who is the target audience and what are scientific applications of this package?
allodb
is designed to be used by forest ecologists with basic or advanced knowledge of R. This scientific application is intended to be used by individuals interested in calculating forest metrics such as AGB in extratropical forest (subtropical, temperate, and boreal forests).Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category? The BIOMASS package is designed to estimate biomass in tropical forest. The main difference between
allodb
and BIOMASS is the set of equations used to estimate AGB. There is an existing calibrated equation used globally for tropical environments but not for areas outside the tropics. Our goal is to provide a set of curated equations and the ability to add new and additional equations in order to have a more reliable value of AGB outside the tropics. We do not duplicate any functionality of the BIOMASS package and only agree on the ecological scope.(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research? N/A
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted. @melvidoni or issue 427
Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
[x] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[x] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
- [x] The package is novel and will be of interest to the broad readership of the journal. - [x] The manuscript describing the package is no longer than 3000 words. - [x] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct