ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

allodb: An R package for biomass estimation at extratropical forest plots #436

Closed gonzalezeb closed 2 years ago

gonzalezeb commented 3 years ago

Date accepted: 2021-09-27 Submitting Author: Erika Gonzalez-Akre (@gonzalezeb) Other Authors: Camille Piponiot (@cpiponiot), Mauro Lepore (@maurolepore), Kristina Teixeira (@teixeirak) Repository: https://github.com/forestgeo/allodb Version submitted: 0.0.0.9000 Editor: !--editor-->@adamhsparks<!--end-editor-- Reviewers: @jeffreyhanson, @jstillh

Due date for @jeffreyhanson: 2021-06-14 Due date for @jstillh: 2021-06-24

Archive: TBD Version accepted: TBD


Package: allodb
Title: Tree Biomass Estimation at Extratropical Forest Plots
Version: 0.0.0.9000
Authors@R: 
    c(person(given = "Erika",
             family = "Gonzalez-Akre",
             role = c("aut", "cre", "cph"),
             email = "GonzalezEB@si.edu"),
      person(given = "Camille",
             family = "Piponiot",
             role = "aut",
             email = "camille.piponiot@gmail.com"),
      person(given = "Mauro",
             family = "Lepore",
             role = "aut",
             email = "maurolepore@gmail.com",
             comment = c(ORCID = "0000-0002-1986-7988")),
      person(given = "Kristina",
             family = "Anderson-Teixeira",
             role = "aut",
             email = "TeixeiraK@si.edu"))
Description: Tool to standardize and simplify the tree biomass
    estimation process across globally distributed extratropical forests.
License: GPL-3
URL: https://github.com/forestgeo/allodb
BugReports: https://github.com/forestgeo/allodb/issues
Depends: 
    R (>= 3.5.0)
Imports: 
    data.table (>= 1.12.8),
    ggplot2 (>= 3.3.2),
    kgc (>= 1.0.0.2),
    utils
Suggests: 
    covr (>= 3.2.1),
    knitr (>= 1.21),
    purrr,
    rmarkdown,
    spelling,
    testthat (>= 3.0.0)
VignetteBuilder: 
    knitr
Config/testthat/edition: 3
Encoding: UTF-8
Language: en-US
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1

Scope

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [x] The package is novel and will be of interest to the broad readership of the journal. - [x] The manuscript describing the package is no longer than 3000 words. - [x] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

noamross commented 3 years ago

@ropensci-review-bot assign @adamhsparks as editor

ropensci-review-bot commented 3 years ago

Assigned! @adamhsparks is now the editor

adamhsparks commented 3 years ago

Editor checks:


Editor comments

G'day, Erika, I'll be your guest editor for allodb.

The package passes all standard local checks and tests with flying colours. But checking the code syntax using lintr::lint_package() (I for some reason can't use goodpractice::gp() locally so I'm falling back to this), I get several suggestions for improving the code. I suggest going through them and addressing where appropriate and possible and perhaps noting in your submission those flags that are not appropriate to address for allodb. This will make the review process easier for the reviewers.

Also, I note that your package's website is incomplete, but once accepted to rOpenSci a website will be generated for you and I note that your current README contains the necessary information on how to install and use the package, so we're good there.

Once you've addressed the issues flagged with goodpractice::gp() or lintr::lint_package(), I'll move on to finding reviewers for you.


adamhsparks commented 3 years ago

@ropensci-review-bot run goodpractice

adamhsparks commented 3 years ago

Hi, @gonzalezeb, just checking in to see if there’s anything I can do to help you with your submission process?

gonzalezeb commented 3 years ago

Hi Adam, thanks for checking in. I am getting some errors after addressing some issues brought out by lintr::lint_package() and goodpractice::gp(), but my collaborator is not, so I am not sure if everything is good (at least not from my laptop).

When I run devtools::test() to test functionalities, I get that some tests are not running well. When I run goodpractice::goodpractice() on 3/26 I got "Supreme package..." but now I get a few errors. Maybe if you run some checks again you can let me know if you are still seeing issues..

adamhsparks commented 3 years ago

Yes, it passes both the CRAN checks when I run them with, and devtools::test(), but the code style, etc. as suggested by the rOpenSci guidelines have some suggestions.

This is what I see when I run lint_package().

Some of these can easily be addressed, others like the first hit with names, probably have good reasons in your field to have the object name like that. That's fine, but it should be noted for the reviewers.

> lintr::lint_package()
................
R/est_params.R:46:24: style: Variable and function name style should be snake_case.
                       Nres = 1e4
                       ^~~~
R/est_params.R:64:51: style: Commas should always have a space after.
                       coords = dfobs[i, c("long","lat")],
                                                  ^
R/est_params.R:72:66: style: Put spaces around all infix operators.
    if (length(unique(df$equation_id)) == 1) df[1, 3] <- df[1, 3]*1.01
                                                                ~^~
R/get_biomass.R:56:25: style: Variable and function name style should be snake_case.
                        Nres = 1e4) {
                        ^~~~
R/illustrate_allodb.R:48:31: style: Variable and function name style should be snake_case.
                              Nres = 1e4) {
                              ^~~~
R/imports.R:6:1: style: Variable and function name style should be snake_case.
data.frame <- function(...) {
^~~~~~~~~~
R/new_equations.R:62:1: style: functions should have cyclomatic complexity of less than 15, this has 24.
new_equations <- function(subset_taxa = "all",
^
R/new_equations.R:71:27: style: Variable and function name style should be snake_case.
                          new_minDBH = NULL,
                          ^~~~~~~~~~
R/new_equations.R:72:27: style: Variable and function name style should be snake_case.
                          new_maxDBH = NULL,
                          ^~~~~~~~~~
R/new_equations.R:73:27: style: Variable and function name style should be snake_case.
                          new_sampleSize = NULL,
                          ^~~~~~~~~~~~~~
R/new_equations.R:74:27: style: Variable and function name style should be snake_case.
                          new_unitDBH = "cm",
                          ^~~~~~~~~~~
R/new_equations.R:75:27: style: Variable and function name style should be snake_case.
                          new_unitOutput = "kg",
                          ^~~~~~~~~~~~~~
R/new_equations.R:76:27: style: Variable and function name style should be snake_case.
                          new_inputVar = "DBH",
                          ^~~~~~~~~~~~
R/new_equations.R:77:27: style: Variable and function name style should be snake_case.
                          new_outputVar = "Total aboveground biomass",
                          ^~~~~~~~~~~~~
R/new_equations.R:88:34: style: Variable and function name style should be snake_case.
  suppressWarnings(new_equations$dbh_unit_CF <-
                                 ^~~~~~~~~~~
R/new_equations.R:90:34: style: Variable and function name style should be snake_case.
  suppressWarnings(new_equations$output_units_CF <-
                                 ^~~~~~~~~~~~~~~
R/new_equations.R:112:5: style: Variable and function name style should be snake_case.
    toMerge <- eq_jansen[, c("hsub", "equation_allometry")]
    ^~~~~~~
R/new_equations.R:113:64: style: Variable and function name style should be snake_case.
    eq_jansen$equation_allometry <- apply(toMerge, 1, function(X) {
                                                               ^
R/new_equations.R:181:30: style: There should be a space between right parenthesis and an opening curly brace.
    if (is.matrix(new_coords)){
                             ^~
R/new_equations.R:265:5: style: Variable and function name style should be snake_case.
    equationID <- paste0("new", seq_len(length(new_taxa)))
    ^~~~~~~~~~
R/new_equations.R:266:5: style: Variable and function name style should be snake_case.
    coordsEq <- cbind(long = new_coords[, 1],
    ^~~~~~~~
R/new_equations.R:268:5: style: Variable and function name style should be snake_case.
    rcoordsEq <- round(coordsEq * 2 - 0.5) / 2 + 0.25
    ^~~~~~~~~
R/new_equations.R:270:5: style: Variable and function name style should be snake_case.
    koppenZones <- apply(rcoordsEq, 1, function(X) {
    ^~~~~~~~~~~
R/new_equations.R:270:49: style: Variable and function name style should be snake_case.
    koppenZones <- apply(rcoordsEq, 1, function(X) {
                                                ^
R/new_equations.R:273:5: style: Variable and function name style should be snake_case.
    koppenZones <- as.character(unlist(koppenZones))
    ^~~~~~~~~~~
R/new_equations.R:301:5: style: Variable and function name style should be snake_case.
    dbhCF <-
    ^~~~~
R/new_equations.R:303:5: style: Variable and function name style should be snake_case.
    outputCF <-
    ^~~~~~~~
R/new_equations.R:305:28: style: Variable and function name style should be snake_case.
    suppressWarnings(dbhCF$dbh_unit_CF <-
                           ^~~~~~~~~~~
R/new_equations.R:307:31: style: Variable and function name style should be snake_case.
    suppressWarnings(outputCF$output_units_CF <-
                              ^~~~~~~~~~~~~~~
R/resample_agb.R:41:26: style: Variable and function name style should be snake_case.
                         Nres = 1e4) {
                         ^~~~
R/resample_agb.R:78:47: style: Variable and function name style should be snake_case.
  list_dbh <- apply(dfsub[, 1:3], 1, function(X) {
                                              ^
R/resample_agb.R:89:5: warning: local variable ‘sampled_dbh’ assigned but may not be used
    sampled_dbh <- list_dbh
    ^~~~~~~~~~~
R/resample_agb.R:97:7: warning: local variable ‘sampled_dbh’ assigned but may not be used
      sampled_dbh <- list_dbh[[j]]
      ^~~~~~~~~~~
R/weight_allom.R:45:31: style: Variable and function name style should be snake_case.
  suppressWarnings(dfequation$wN <-
                              ^~
R/weight_allom.R:53:3: style: Variable and function name style should be snake_case.
  coordsSite <- t(as.numeric(coords))
  ^~~~~~~~~~
R/weight_allom.R:54:3: style: Variable and function name style should be snake_case.
  rcoordsSite <- round(coordsSite * 2 - 0.5) / 2 + 0.25
  ^~~~~~~~~~~
R/weight_allom.R:56:3: style: Variable and function name style should be snake_case.
  koppenObs <- apply(rcoordsSite, 1, function(X) {
  ^~~~~~~~~
R/weight_allom.R:56:47: style: Variable and function name style should be snake_case.
  koppenObs <- apply(rcoordsSite, 1, function(X) {
                                              ^
R/weight_allom.R:64:14: style: Variable and function name style should be snake_case.
  dfequation$wE <- vapply(dfequation$koppen, compare_koppen, FUN.VALUE = 0.9)
             ^~
R/weight_allom.R:86:14: style: Variable and function name style should be snake_case.
  dfequation$wT <- 1e-6
             ^~
R/weight_allom.R:91:3: style: Variable and function name style should be snake_case.
  eqtaxaG <- data.table::tstrsplit(dfequation$taxa1, " ")[[1]]
  ^~~~~~~
R/weight_allom.R:92:3: style: Variable and function name style should be snake_case.
  eqtaxaS <- data.table::tstrsplit(dfequation$taxa1, " ")[[2]]
  ^~~~~~~
R/weight_allom.R:152:3: style: Variable and function name style should be snake_case.
  vecW <- dfequation$w
  ^~~~
tests/testthat/test-est_params.R:5:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-est_params.R:10:39: style: Commas should always have a space after.
                         long = c(-78,-85),
                                      ^
tests/testthat/test-get_biomass.R:5:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-get_biomass.R:25:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-get_biomass.R:92:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-get_biomass.R:106:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-get_biomass.R:122:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-illustrate_allodb.R:10:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-illustrate_allodb.R:31:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-new_equations.R:4:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-new_equations.R:127:3: style: Opening curly braces should never go on their own line and should always be followed by a new line.
  {
  ^
tests/testthat/test-resample_agb.R:19:3: style: Opening curly braces should never go on their own line and should always be followed by a new line.
  {
  ^
tests/testthat/test-weight_allom.R:5:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
adamhsparks commented 3 years ago

There is a note about UTF-8 strings as well that I'm seeing that you'll need to address as well.

* checking data for non-ASCII characters ... NOTE
  Note: found 156 marked UTF-8 strings
gonzalezeb commented 3 years ago

Thanks @adamhsparks. Most of the suggestion after running lint_package() have already been applied, see for example here and here. I am not sure why they don't show to you. I still need to work on some thought. Let me know if after installing the package again you still see those.

adamhsparks commented 3 years ago

Ah, sorry, I didn’t pay attention and realise you’d applied the changes to the repo yet. I’d need to refresh my local branch and rerun. I thought you were having issues identifying and applying fixes to these initial problems.

On 5 Apr 2021, at 09:47, Erika Gonzalez-Akre @.***> wrote:

 Thanks @adamhsparks. Most of the suggestion after running lint_package() have already been applied, see for example here and here. I am not sure why they don't show to you. I still need to work on some thought. Let me know if after installing the package again you still see those.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

adamhsparks commented 3 years ago

OK, this is what I get now.

> lintr::lint_package()
................
R/new_equations.R:62:1: style: functions should have cyclomatic complexity of less than 15, this has 24.
new_equations <- function(subset_taxa = "all",
^
R/new_equations.R:115:65: style: Variable and function name style should be snake_case.
    eq_jansen$equation_allometry <- apply(to_merge, 1, function(X) {
                                                                ^
R/new_equations.R:273:51: style: Variable and function name style should be snake_case.
    koppen_zones <- apply(rcoords_eq, 1, function(X) {
                                                  ^
R/resample_agb.R:78:47: style: Variable and function name style should be snake_case.
  list_dbh <- apply(dfsub[, 1:3], 1, function(X) {
                                              ^
R/resample_agb.R:89:5: warning: local variable ‘sampled_dbh’ assigned but may not be used
    sampled_dbh <- list_dbh
    ^~~~~~~~~~~
R/resample_agb.R:97:7: warning: local variable ‘sampled_dbh’ assigned but may not be used
      sampled_dbh <- list_dbh[[j]]
      ^~~~~~~~~~~
R/weight_allom.R:45:31: style: Variable and function name style should be snake_case.
  suppressWarnings(dfequation$wN <-
                              ^~
R/weight_allom.R:53:3: style: Variable and function name style should be snake_case.
  coordsSite <- t(as.numeric(coords))
  ^~~~~~~~~~
R/weight_allom.R:54:3: style: Variable and function name style should be snake_case.
  rcoordsSite <- round(coordsSite * 2 - 0.5) / 2 + 0.25
  ^~~~~~~~~~~
R/weight_allom.R:56:3: style: Variable and function name style should be snake_case.
  koppenObs <- apply(rcoordsSite, 1, function(X) {
  ^~~~~~~~~
R/weight_allom.R:56:47: style: Variable and function name style should be snake_case.
  koppenObs <- apply(rcoordsSite, 1, function(X) {
                                              ^
R/weight_allom.R:64:14: style: Variable and function name style should be snake_case.
  dfequation$wE <- vapply(dfequation$koppen, compare_koppen, FUN.VALUE = 0.9)
             ^~
R/weight_allom.R:86:14: style: Variable and function name style should be snake_case.
  dfequation$wT <- 1e-6
             ^~
R/weight_allom.R:91:3: style: Variable and function name style should be snake_case.
  eqtaxaG <- data.table::tstrsplit(dfequation$taxa1, " ")[[1]]
  ^~~~~~~
R/weight_allom.R:92:3: style: Variable and function name style should be snake_case.
  eqtaxaS <- data.table::tstrsplit(dfequation$taxa1, " ")[[2]]
  ^~~~~~~
R/weight_allom.R:152:3: style: Variable and function name style should be snake_case.
  vecW <- dfequation$w
  ^~~~
tests/testthat/test-est_params.R:9:39: style: Commas should always have a space after.
                         long = c(-78,-85),

Are there any that I can provide assistance with?

gonzalezeb commented 3 years ago

Hi @adamhsparks. Most issues brought out by lint have been addressed, only two remain but I am little stuck on them. Maybe you can give me some guidance. image

In R/new_equations, the function contains a lot of 'if' because it does a lot of sanity checks on the user-provided data. We could remove these, at the risk of having undetected problems in the user-provided allometries; and it seems that moving the 'ifs' to another function would make much sense.

In R/weight_allom, the problem is with "wE" which is the original col name of a matrix (koppenMatrix) been used.

adamhsparks commented 3 years ago

Thanks, @gonzalezeb, for addressing these changes. I think that both of these are acceptable given the reasons stated. I'll begin looking for reviewers now.

adamhsparks commented 3 years ago

@gonzalezeb, can you please update your README badges for allodb using rodev::use_review_badge(436) and add a NEWS.md file, usethis::use_news_md()?

adamhsparks commented 3 years ago

@gonzalezeb, can you also please address the errors that are encountered when running checks?

==> devtools::check()

ℹ Updating allodb documentation
ℹ Loading allodb
Writing NAMESPACE
Writing NAMESPACE
── Building ────────────────────────────────────────────────────────── allodb ──
Setting env vars:
● CFLAGS    : -Wall -pedantic -fdiagnostics-color=always
● CXXFLAGS  : -Wall -pedantic -fdiagnostics-color=always
● CXX11FLAGS: -Wall -pedantic -fdiagnostics-color=always
────────────────────────────────────────────────────────────────────────────────
✓  checking for file ‘/Users/adamsparks/Development/GitHub/Reviews/allodb/DESCRIPTION’ ...
─  preparing ‘allodb’:
✓  checking DESCRIPTION meta-information ...
─  installing the package to build vignettes
E  creating vignettes (5.3s)
   --- re-building ‘allodb-vignette.Rmd’ using rmarkdown
   Quitting from lines 81-87 (allodb-vignette.Rmd) 
   Error: processing vignette 'allodb-vignette.Rmd' failed with diagnostics:
   arguments imply differing number of rows: 9991, 19
   --- failed re-building ‘allodb-vignette.Rmd’

   SUMMARY: processing the following file failed:
     ‘allodb-vignette.Rmd’

   Error: Vignette re-building failed.
   Execution halted

Error: 'col_ed' is not an exported object from 'namespace:cli'
Execution halted

Exited with status 1.

Once that's dealt with, I'll look for reviewers. 🙏

gonzalezeb commented 3 years ago

Hi @adamhsparks, allodb is passing all checks, please let me know if you see the same (I hope!)

devtools::check(remote = TRUE, manual = TRUE)
i Updating allodb documentation
i Loading allodb
Writing NAMESPACE
Writing NAMESPACE
-- Building --------------------------------------------------- allodb --
Setting env vars:
* CFLAGS    : -Wall -pedantic -fdiagnostics-color=always
* CXXFLAGS  : -Wall -pedantic -fdiagnostics-color=always
* CXX11FLAGS: -Wall -pedantic -fdiagnostics-color=always
-------------------------------------------------------------------------
√  checking for file 'C:\Users\erikab\Dropbox (Smithsonian)\GitHub\allodb/DESCRIPTION' (349ms)
-  preparing 'allodb': (11.3s)
√  checking DESCRIPTION meta-information ... 
-  installing the package to build vignettes
√  creating vignettes (39.3s)
-  checking for LF line-endings in source and make files and shell scripts (11.1s)
-  checking for empty or unneeded directories
-  looking to see if a 'data/datalist' file should be added
-  building 'allodb_0.0.0.9000.tar.gz'

-- Checking --------------------------------------------------- allodb --
Setting env vars:
* _R_CHECK_CRAN_INCOMING_USE_ASPELL_: TRUE
* _R_CHECK_CRAN_INCOMING_REMOTE_    : TRUE
* _R_CHECK_CRAN_INCOMING_           : TRUE
* _R_CHECK_FORCE_SUGGESTS_          : FALSE
* NOT_CRAN                          : true
-- R CMD check ----------------------------------------------------------
-  using log directory 'C:/Users/erikab/AppData/Local/Temp/Rtmp6dUAvx/allodb.Rcheck' (1s)
-  using R version 3.6.3 (2020-02-29)
-  using platform: x86_64-w64-mingw32 (64-bit)
-  using session charset: ISO8859-1
-  using option '--as-cran' (402ms)
√  checking for file 'allodb/DESCRIPTION'
-  this is package 'allodb' version '0.0.0.9000'
-  package encoding: UTF-8
N  checking CRAN incoming feasibility (35.1s)
   Maintainer: 'Erika Gonzalez-Akre <GonzalezEB@si.edu>'

   New submission

   Version contains large components (0.0.0.9000)
√  checking package namespace information ...
√  checking package dependencies (2.3s)
√  checking if this is a source package ... 
√  checking if there is a namespace
√  checking for executable files (1.2s)
√  checking for hidden files and directories ... 
√  checking for portable file names ... 
√  checking whether package 'allodb' can be installed (11.2s)
√  checking installed package size ... 
√  checking package directory (894ms)
√  checking for future file timestamps (624ms)
√  checking 'build' directory ...
√  checking DESCRIPTION meta-information (718ms)
√  checking top-level files ...
√  checking for left-over files ... 
√  checking index information (353ms)
√  checking package subdirectories (392ms)
√  checking R files for non-ASCII characters ... 
√  checking R files for syntax errors ... 
√  checking whether the package can be loaded (355ms)
√  checking whether the package can be loaded with stated dependencies ... 
√  checking whether the package can be unloaded cleanly (336ms)
√  checking whether the namespace can be loaded with stated dependencies (360ms)
√  checking whether the namespace can be unloaded cleanly (459ms)
√  checking loading without being on the library search path (780ms)
√  checking use of S3 registration (2.5s)
√  checking dependencies in R code (2s)
√  checking S3 generic/method consistency (783ms)
√  checking replacement functions (335ms)
√  checking foreign function calls (802ms)
√  checking R code for possible problems (5.4s)
√  checking Rd files (681ms)
√  checking Rd metadata ... 
√  checking Rd line widths ... 
√  checking Rd cross-references (343ms)
√  checking for missing documentation entries (473ms)
√  checking for code/documentation mismatches (1.5s)
√  checking Rd \usage sections (1.1s)
√  checking Rd contents ... 
√  checking for unstated dependencies in examples ... 
√  checking contents of 'data' directory (696ms)
√  checking data for non-ASCII characters (1s)
√  checking data for ASCII and uncompressed saves (588ms)
√  checking R/sysdata.rda (478ms)
√  checking installed files from 'inst/doc' ... 
√  checking files in 'vignettes' ... 
√  checking examples (4.7s)
√  checking for unstated dependencies in 'tests' ... 
-  checking tests (429ms)
√  Running 'testthat.R' (8.7s)
√  checking for unstated dependencies in vignettes (9.1s)
√  checking package vignettes in 'inst/doc' ... 
√  checking re-building of vignette outputs (23.6s)
√  checking PDF version of manual (3.6s)
√  checking for detritus in the temp directory

   See
     'C:/Users/erikab/AppData/Local/Temp/Rtmp6dUAvx/allodb.Rcheck/00check.log'
   for details.

-- R CMD check results --------------------------- allodb 0.0.0.9000 ----
Duration: 2m 0.2s

> checking CRAN incoming feasibility ... NOTE
  Maintainer: 'Erika Gonzalez-Akre <GonzalezEB@si.edu>'

  New submission

  Version contains large components (0.0.0.9000)

0 errors √ | 0 warnings √ | 1 note x
adamhsparks commented 3 years ago

Hi @gonzalezeb, I'm getting a NOTE when I run R CMD check.

Note: found 156 marked UTF-8 strings

Can you check into this?

As it's just a note, not an error in the checks, I'll start looking for reviewers now.

adamhsparks commented 3 years ago

@ropensci-review-bot assign @jeffreyhanson

jeffreyhanson commented 3 years ago

Hi @gonzalezeb, I just had a quick look through this package it looks really awesome! Great work!

Before I do a through review, I have some broad suggestions for the code/doucmentation that are relevant for multiple functions. I also noticed a few specific things that might be good to fix before I do the reivew too. I've listed them below - what do you think? I think most of them could be addressed by polishing the code/documentation a bit more? I'm also happy to conduct a through review with the current version of the package if you prefer - it will just a lot of small issues (e.g. this term in this file should have code formatting) that might be annoying for you to have to respond to?

  1. The package website isn't working (https://forestgeo.github.io/allodb/). I think there might be a bug in the GitHub Actions code for generating the package website (e.g. see errors here).

  2. I feel like the function documentation could be improved by using code formatting (e.g. use "`character` vector" instead of "character vector"). This would help users understand that you are referring to a specific data type. This suggestion applies to multiple functions - what do you think?

  3. I feel like the function documentation could be improved by using more precise terminology. For example, in the documentation for function parameters, I think it would be helpful to specify the exact class for arguments (e.g. using "`numeric` vector" instead of "numerical vector", because "numerical" isn't a class in R). Additionally, I think it would be helpful to include classes in the documentation for function outputs (i.e. in @return sections). For example, this sentence currently says "A data frame of resampled DBHs and associated ..." and could be updated to say "A data frame (`tibble::tibble()` object) of resampled DBHs and associated ...". This suggestion applies to multiple functions - how does that sound?

    Also, I feel like the documentation for function parameters could be more consistent. I really like how the documentation for the dbh parameter in get_biomass function has the format "A [class-name] containing [...]". I wonder if you could update the parameter documentation for the other parameters -- e.g. the wna parameter -- so that they all follow this format (e.g. wna currently reads like "This parameter is used [...]" and doesn't contain class information).

  4. It's not really the best practice to incude a non-R script file in the /R folder. Perhaps the sysdata.rda file could be moved to a more apropriate location?

  5. I feel like the it might helpful to include some code to validate function arguments. This could help users understand why their code isn't working if they supply incorrect arguments (e.g. if they supply a numeric vector instead of a character vector). For example, I think this argument validation code is fantastic and will be really useful for users, and I wonder if you could add similar code to verify arguments to all parameters in each function? If you're interested in R packages to help with this, I personally use the assertthat R package (e.g. see this function) and I have heard good things about checkmate R package too. What do you think?

  6. Could you remove the unused Travis badge in REAMDE (because it seems that GitHub Actions is used for CI)?

  7. The .buildignore folder seems to contain random R script and meeting notes. I don't see the benefit for keeping this in the code repository? Perhaps you could move them to a wiki, issues, or a seperate repo? Sorry if I'm missing something?

  8. This function (and maybe others?) over-ride the global state. This is not ideal because it could cause issues for users when they do not expect such changes. For example, imagine a user is trying to use this package within an analysis, and they manually use set.seed() to handle random number generation for simulations in various parts of their analysis. If this function uses set.seed() and does not revert it afterwards, then it could mean that the user's subsequent simulations are all based on the same seed even though they used set.seed() to change the seed in their code. If you're interested in suggestions on how to address this, I personally like to use the withr R package for stuff like this. For example, instead of using the set.seed() function, you could use the withr::with_seed()` function. Does that help?

  9. eval(parse(....)) is rarely a good idea unless you have no alternatives. This is because it can introduces bugs that are hard to track down and it can also cause security problems. In this function, I wonder if you could you could use some combination of expression and substitute to acheive the same result?

  10. I feel like it would be easier for users to use this function (and maybe others too?) if the parameters were re-ordered such that all parameters without defaults came before those with defaults. For example, to run the get_biomass() function with default parameters and without using named arguments, I would have to do something like get_biomasss(x, y, , z) instead of get_biomass(x, y, z) (to specify arguments for dbh, genus, and coords). But maybe there's a reason for this particular order for parameters? Sorry if I'm missing something obvious.

Please let me know if anything I've written isn't clear or doesn't make sense, and I can follow up with more details? I could also be entirely wrong about some things too, so please do correct me if I'm missing something obvious :)

adamhsparks commented 3 years ago

I'll try again with the bot. I'm new here, please bear with me.

@ropensci-review-bot add @jeffreyhanson to reviewers.

adamhsparks commented 3 years ago

@ropensci-review-bot add @jeffreyhanson to reviewers.

adamhsparks commented 3 years ago

@ropensci-review-bot add @jeffreyhanson to reviewers

ropensci-review-bot commented 3 years ago

@jeffreyhanson added to the reviewers list. Review due date is 2021-06-14. Thanks @jeffreyhanson for accepting to review! Please refer to our reviewer guide.

gonzalezeb commented 3 years ago

Hello @jeffreyhanson, your suggestions are manageable, thanks for catching all that. We will work on those and will get back to you if we have questions. Will try to have things solved by 5/31 so you can proceed with a through review.

jeffreyhanson commented 3 years ago

Hi @gonzalezeb, ok, that sounds great - thank you!

adamhsparks commented 3 years ago

Just an update @gonzalezeb, I’m still looking for a second reviewer, sorry about any delays with this.

adamhsparks commented 3 years ago

@ropensci-review-bot add @jstillh to reviewers

ropensci-review-bot commented 3 years ago

@jstillh added to the reviewers list. Review due date is 2021-06-24. Thanks @jstillh for accepting to review!

adamhsparks commented 3 years ago

Thank you for accepting the review for allodb, Jonas. Please let me know if I can assist you with anything along the way.

gonzalezeb commented 3 years ago

Hi @jeffreyhanson, we solved many of the issues you pointed out and may justify a few later. Hope you can go ahead with reviewing the functionality of the package. Thanks!

jstillh commented 3 years ago

Hi @gonzalezeb
Here is my review. I really appreciate the package and the work you & the co-authors have put into developing it.
I think the package will be useful for a lot (if not every) forest ecologist... I reviewed the package looking at it as a regular user - sorry if i misunderstood some of the components of your package.

Package Review

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 9


Review Comments

The authors provide a package allowing to calculate biomass for individual trees of extratropical tree and shrub species. A package including biomass equations for extratropical species is a useful and timely complement to the BIOMASS package by Chave et al. for tropical species. The authors apparently developed the package focusing on the plots of the ForestGEO network. Given that the authors aim at providing a package complementing the BIOMASS package, i reviewed the functionality of the package not only for the use on single/multiple sites but for the use along latitudinal gradients and focused on European tree species that i am familiar with.
Although i appreciate the general idea of the package and the large effort that the authors put into its development, i have some concerns regarding the concept applied when calculating biomass and some general comments on code and documentation that i feel should be addressed:

Conceptual

The core function get_biomass provides the user with one estimate of aboveground biomass based on the diameter at breast height (dbh) of a tree. This function then applies one or several of the biomass equations based on climatic similarity and similarity between species. The estimate is based on a large set of biomass-equations compiled from a wide range of sources - i very much appreciate the work the authors have put into the compilation of this dataset.
I like the general framework of this package and the option to add additional equations via the function new_equations but i have concerns regarding the workflow implemented in the package. These mainly focus on the following points:

  1. Documentation of selection and weighting
    The function get_biomass returns a vector with an estimate of the biomass for each tree. The calculation of this value is somewhat opaque as long as the user does not actively change some of the arguments (which by default do not need to be provided) of the get_biomass function. Even if species / genus specific equations are available, this function includes additional, less specific equations. These are given a certain weight based on similarities using the weight_allom function and might contribute little to the final estimate. However, this is not obvious to the user, as the package only provides one value for each tree. I would appreciate if the function get_biomass would not only return a vector but more metadata on the steps that resulted in the calculation of the final result. This could be achieved by returning a list containing the final values, all values with the associated weights, the output of new_eqtable documenting the equations used and the output of weight_allom.
    I understand that the function illustrate_allodb should give the user an impression on the results of the resampling - but it does not provide the user with all information (e.g., no information on weights) and it requires the user to run an additional function.

  2. Inclusion of equations based on climatic simlilarity
    To check for climatic similarity, the package makes use of the Koeppen classification. This is a feasible approach - but it has some drawbacks in the implementation: Based on the Koeppen classification of the site (provided through the coords argument), a certain set of equations is used. This can result in the exclusion/ downweighting even of species specific biomass equations. The illustrative example below provides biomass estimates for Fagus sylvatica along a latitudinal gradient from Northern Italy to Northern Germany and the output of illustrate_allodb for two sites used in this latitudinal gradient. Obviously a completely different set of equations is used - in the second site, no species specific equations are used. (Btw: there seems to be a typo, i guess it should be Populus instead of Pupulus.)
    The differences between the sites observed here are mostly driven by a change in the Koeppen zones and a different set of equations applied due to the change in Koeppen zones. I am not convinced that this makes ecologically sense. Additionally, the Koeppen-classification for specific equations seems to be based on the value for longitude and latitude provided in the original publication of each biomass equation. However, for equations covering a wider range (e.g., the equations published in Forrester et al. 2017), the authors seem to have extracted the Koeppen value for the mean of both the lat and long range provided. This may result in a very specific classification not representative for the species and a subsequent down-weighting of this equation when applied in the midst of the species’s distribution range: For the ForestGeo Plot in Zofin, which is mainly covered with beech and lies in the center of the distribution range of this species, this results in a down-weighting of the equation provided in Forrester et al. 2017 as the value wE provided in the dataset for this specific combination in koppenMatrix is 0.7. I did not check if other equations have a higher weight, but i think this issue should be addressed by the authors.
    If the Koeppen classification is not included in the dataset koeppenMatrix or the specific combination if Koeppen zones is included with a weight of 0, this results in the exclusion of a specific equation and a subsequent error message (if only one equation is provided) or the inclusion of other, not species specific equations, see my comments further down.

In general, the weighting concept of the different biomass equations should be more transparent for the package user and probably requires conceptual changes.

data <- allodb::get_biomass(rep(50, 41), rep("Fagus", 41), rep("sylvatica", 41), coords = as.matrix(cbind(rep(10, 41), seq(44, 54, 0.25))))
rcoords_site <- round(as.matrix(cbind(rep(10, 41), seq(44, 54, 0.25))) * 2 - 0.5)/2 + 0.25
koeppen <- apply(rcoords_site, 1, function(xk) {
  subset(kgc::climatezones, Lon == xk[1] & Lat == xk[2])$Cls
})
plot(seq(44, 54, 0.25), data, xlab = "lat", ylab = "biomass [kg]",  ylim = c(0, 3000), pch = 21, cex = 1.5, bg = koeppen)

![unnamed-chunk-1-1](https://user-images.githubusercontent.com/31648011/121906533-7da22c00-cd2b-11eb-97fe-fe8d6a432e5f.png)

allodb::illustrate_allodb("Fagus" , coords = c(10, 44), "sylvatica")

unnamed-chunk-1-1

allodb::illustrate_allodb("Fagus" , coords = c(10, 46), "sylvatica")

data("koppenMatrix", package = "allodb")
data("sites_info", package = "allodb")

koppenMatrix$wE[koppenMatrix$zone1 == sites_info$koppen[sites_info$site == "zofin"] & koppenMatrix$zone2 == "Csb"]
## [1] 0.7

Code and documentation

Use of eval(parse())
As @jeffreyhanson already pointed out here, the use of eval(parse()) is not good practice for the reasons he laid out nicely in his comment. You seem to have replaced the eval(parse()) in this function but then changed it back according to this issue because the str2lang that you used to replace eval(parse()) is not available for users with R < 3.6.0.
I was not aware of the str2lang function and i am not convinced that using this function will solve the issue raised by @jeffreyhanson: The issue with tracing bugs will still exist even when changing to the str2lang function and the general security concerns associated with the use eval(parse()) can probably not be solved with replacing the parse function by str2lang as str2lang is only a special case of parse.
The concerns raised are associated with the combination of eval(parse()) and not with the use parse(). (btw: when you removed str2lang in this function, you only removed it on line 93 but not within the else statement on line 101).
I think working along the lines laid out by @jeffreyhanson (i.e. using expression and substitute) would be a good option. Another option would be to create internal functions for all forms of the equations in dfequation and then apply these.

Documentation
I think the vignette should be improved. It would be great if you could provide some more information to the user especially on the estimation of the parameters in est_param and what effect the weighting in weight_allom has on the result.
As @jeffreyhanson already pointed out here, consistently using code formatting throughout the documentation would be great (e.g., “numeric” in the description of the resample_agb function could be change to numeric).

Naming conventions
The use of the package could be improved if you consistently order the arguments and stick to one naming convention throughout the package. sitespecies does not follow the naming convention you used for the datasets in the package - all other datasets are named following the convention name1_name2. This applies for the dataset koeppenMatrix as well.

Performance
The calculation of the biomass is relatively slow if for every tree a unique coordinate is provided - i guess this is caused by the subset operations performed in the weight_allom function. Given that a user might provide individual coordinates for each tree on a site and the coarse resolution of the koeppen classification, it might be an option to first check if the coordinates vary in such a way that these need to be subset individually or if a single coordinate can be used.

Such performance considerations are necessary, as the calculations are rather slow at the moment. For example, the biomass calculation for 100 trees with tree level coordinates require around 9s.

system.time(allodb::get_biomass(dbh = c(1:100), genus = rep("Fagus", 100)
                                , species = rep("sylvatica", 100)
                                , coords = as.matrix(cbind(rep(10, 100), sample(seq(48.001, 48.1, 0.0001), 100)))))
##    user  system elapsed 
##    8.81    0.03    8.86

Specific comments

get_biomass
The get_biomass function creates a biomass estimate based on dbh, genus, coords and species. It allows the user to provide a selection of the biomass equations via the argument new_eqtable. This seems to be an important feature of this function. However, if the values provided in the column koppen do not appear in kgc::climatezones, the function throws an error and the error message does not point the user to the reason of the error - see my specific comment on the weight_allom function below. Given the coarse resolution of the kgc::climatezones dataset of 0.5 degrees, this might be problematic especially in mountainous regions with a large altitudinal gradient. Additionally, this is an issue which might prevent users from applying equations that could be applied for the respective region - see my general comment above.

equations_to_use <- allodb::new_equations(subset_taxa = "Fagus")

allodb::get_biomass(dbh = 20
                    , genus =  "Fagus"
                    , species = "sylvatica"
                    , coords = c(10, 46)
                    , new_eqtable = equations_to_use)
## Warning: Unknown or uninitialised column: `equation_id`.

## Error in if (any(nEQ <- vNms != make.names(vNms))) vNms[nEQ] <- paste0("`", : missing value where TRUE/FALSE needed
allodb::get_biomass(dbh = 20
                    , genus =  "Fagus"
                    , species = "sylvatica"
                    , coords = c(10, 46))
## [1] 217.0154

weight_allom

The weight_allom function creates weights for all biomass functions which will later be used in the calculation of the biomass.

Beyond the concerns i already mentioned above, i only have a minor comment on this function: The kgc::climatezones dataset does not cover large bodies of water. The values in dfequation$wE will therefore be -Inf for all coords that are not covered by kgc::climatezones The weight_allom function therefore returns -Inf causing the function to stop without a easily understandable error message. I would appreciate if you either add an error message like if(dfequation$wE == -Inf){stop('Error message')} or if you would just flag the respective values so they are omitted in later calculations.

new_equations
The function new equations allows the user to a) subset to a specific set of equations for one or several taxa and b) to add additional equations. I appreciate the latter option, as this adds a lot of flexibility to the package.

adamhsparks commented 3 years ago

Thanks, @jstillh, for your quick and comprehensive review!

@jeffreyhanson, are you able to provide your review soon? It was due on Monday. Please let me know if you need more time.

jeffreyhanson commented 3 years ago

Oh - I'm so sorry! I mistakenly thought my reivew would be due later since @gonzalezeb was working on updating the package - I'll submit review by the end of the week. I hope that doesn't cause too many problems for you.

adamhsparks commented 3 years ago

@jeffreyhanson, all good, I understand the confusion as I obviously interpreted it to be due on the original date. Take all the time that you need.

jeffreyhanson commented 3 years ago

Package Review

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 4


Review Comments

The allodb R package provides a framework for estimating tree biomass based on allometric equations. Overall, my impression is that the package is very well designed. I think the authors have done a fantastic job of providing a highly flexible interface -- allowing users to customize calculations as needed -- that is easy to use. Since I have zero expertise in forestry science, I cannot comment on the novelty, scope, or methodological correctness of the package. As such, I have limited my review to focus on code and documentation.

I previously posted several suggestions for improving the package. Although I am glad that some of these suggestions were used to help improve the package, I would appreciate hearing why some of these suggestions were not incorporated? It's entirely possible that these suggestions were irrelevant, infeasible, or undesireable. For example, is there some particular benefit for not describing classes precisely, or not validating user inputs of which I am unaware? Below I've listed some specific comments. Please let me know if anything I have written is unclear, irrelevant, or incorrect?

Code

Documentation

Built-in datsets

Minor suggestions

gonzalezeb commented 3 years ago

@jstillh and @jeffreyhanson, thank you so much for your thoughtful and kind reviews. Your suggestions will improve the package 100%. I will contact you if we have questions, we truly appreciate your time on this.

adamhsparks commented 3 years ago

Hi @gonzalezeb, just following up, it's been four weeks since the reviews came in. If you are experiencing any difficulties or issues, feel free to flag them with the reviewers or me and we'll do our best to assist you through the process.

gonzalezeb commented 3 years ago

Hi @adamhsparks. Sorry, I am a little behind with this and have solved only minor issues. There are some comments on the conceptual aspects of the package (i.e. here) where I get the point but changing our approach will require some (lot's!) of work, so I am giving it a careful thought. How should I tackle issues like those?

adamhsparks commented 3 years ago

Hi @gonzalezeb, which specific aspect were you linking to and asking for guidance on? Your link only links to the whole issue here, not one of the reviewers' comments or issue that they may have opened in your repository?

adamhsparks commented 3 years ago

Hi @gonzalezeb, do you need more time to work on this? We can put a holding tag on it and revisit it after three months to see where things are for up to a year. Would that be useful to you?

gonzalezeb commented 3 years ago

Hi @adamhsparks, thanks for checking. We have made some substantial progress but I still need to work in some minor stuff. The plan is to finish this before Sep 15 as we want to submit our revised manuscript to MEE also by then. Will that be ok with you all?

adamhsparks commented 3 years ago

Sounds good, Erika! Thanks for updating me.

gonzalezeb commented 3 years ago

Hi @adamhsparks, I wasn’t sure how to respond to the multiples reviews so I hope the following is sufficient. We are very grateful for the thoughtful suggestions by reviewers, and have implemented most.

Initial suggestions by @jeffreyhanson (5/19/21), have been addressed:

  1. The package website is working (https://forestgeo.github.io/allodb/)
  2. Documentation for multiple functions has been improved by using code formatting (eg. using ‘numeric vector’ instead of ‘numerical vector’, added (tibble::tibble() object), etc).
  3. Typos have been corrected
  4. Unnecessary R and non-R script files have been removed (eg. R/sysdata.rda)
  5. We agree with the reviewer. If a user passes a bad argument, it is best to fail fast with an informative message so that the user can quickly correct their mistake and move on. But the number of possible mistakes may be infinite; so we have particularly focused on adding an internal function validate_equations()to tests that all equations in equations$equation_allometry are a valid R expression of the dbh or h variables (since this could be one of the main mistakes on calculating biomass). This function can be called from inside any function within allodb and allows users to pass a character vector of equations. We also added consistency checks in the get_biomass function of dbh and coordinates provided by the user.
  6. Unused Travis badge in REAMDE has been removed
  7. The .buildignore folder has been removed
  8. We changed set.seed with withr::with_seed in the resample_agb function.
  9. About eval(parse(…)), see next comment.
  10. Parameters for many functions have been reordered.

Comprehensive review by @jeffreyhanson (6/17/21), has also been addressed: About code:

  1. See note number 5 above. If the suggested change is truly necessary (validation of function arguments), then we may take the time to implement.
  2. All dataframes now have the tibble subclass. About documentation:
  3. Documentation has been improved for all functions (specially the get_biomass and weight_allom functions), added @seealso to help users navigate btw documentations, use quotation marks for character vectors, added example sections to the documentation for the built-in datasets, etc.
  4. We are now using \pkg{allodb} in documentation when referring to this package, and equations commands (eg. \deqn{}).
  5. Removed library(allodb) from all test scripts.

Suggestions by @jstillh (6/14/21), have been addressed:

gonzalezeb commented 3 years ago

The reviewers advice to avoid the call eval(parse(text = ...)) (item 9 here, and “Code and documentation” here), mainly because (a) it can be unsafe, and (b) error prone.

Instead they suggest storing the equations as objects such as expressions or functions, and manipulate them with substitute().

We thank the advice and, after considering it carefully, we would like to share our updated opinion. We first discuss the problems of eval(parse()) and then the alternative solutions.

To make our discussion concrete we offer some code examples.

library(dplyr, warn.conflicts = FALSE)
library(allodb)

tl;dr

I previously suggested that it would be good to avoid using eval(parse(...)) if possible. (…) However, [the alternatives] seem needlessly complex compared to the current implementation? Maybe this could be one instance where eval(parse(...)) is not bad idea?

Unsafe

Online we found that parsing text would be a risk if we hosted an application on a server we own and allow users to input arbitrary text. A malicious user could damage our server.

But this is not our case. The allodb package is to be used as an analysis tool, directly on the users’s computers. And the input text comes mainly not from users but from a dataset we built.

Error prone

We acknowledge that bugs can hide in the unevaluated text, as invalid code is still a valid text object.

valid <- "dbh + 1"
eval(parse(text = valid), envir = list(dbh = 1))
#> [1] 2

invalid <- "1 <- dbh"
try(eval(parse(text = invalid), envir = list(dbh = 1)))
#> Error in 1 <- dbh : invalid (do_set) left-hand side to assignment

But bugs can also hide in unevaluated expressions. Using substitute() to manipulate expressions does not ensure an expression is valid.

invalid <- quote(1 <- dbh)
substitute(x, list(x = invalid))
#> 1 <- dbh

try(eval(substitute(x, list(x = invalid))))
#> Error in 1 <- dbh : invalid (do_set) left-hand side to assignment

We are comfortable storing and manipulating text. We can inspect equations stored as text directly from the dataframe, and we can manipulate them with familiar tools.

# Storing text
equation <- "1.2927 * (dbh^2)^1.36723"
dbh <- "(sampled_dbh * 0.393701)"

# + We can inspect the equations directly from the dataframe
tibble(equation, dbh)
#> # A tibble: 1 x 2
#>   equation                 dbh                     
#>   <chr>                    <chr>                   
#> 1 1.2927 * (dbh^2)^1.36723 (sampled_dbh * 0.393701)

# - To manipulate the equations we use familiar tools
modified <- gsub("dbh", dbh, equation)
modified
#> [1] "1.2927 * ((sampled_dbh * 0.393701)^2)^1.36723"

eval(parse(text = modified), list(sampled_dbh = 1))
#> [1] 0.1010408

In contrast, we are not comfortable storing or manipulating expressions with substitute(). We found great help in the first edition of Advanced R but we still think the approach is difficult to understand, which makes us more likely to introduce bugs than with eval(parse()).

# We need to create an escape hatch because substitute() doesn't work if we
# already have an expression saved in a variable
# http://adv-r.had.co.nz/Computing-on-the-language.html#substitute
substitute_q <- function(x, env) {
  call <- substitute(substitute(y, env), list(y = x))
  eval(call)
}

equation <- quote(1.2927 * (dbh^2)^1.36723)
dbh <- quote((sampled_dbh * 0.393701))

# - We can't inspect the equations directly from the dataframe :-(
tibble(equation = list(equation), dbh = list(dbh))
#> # A tibble: 1 x 2
#>   equation   dbh       
#>   <list>     <list>    
#> 1 <language> <language>

# - To manipulate the equations we need unfamiliar tools
modified <- substitute_q(equation, list(dbh = dbh))
modified
#> 1.2927 * ((sampled_dbh * 0.393701)^2)^1.36723

eval(modified, list(sampled_dbh = 1))
#> [1] 0.1010408

As @jeffreyhanson advised, instead of using expressions and substitute() we prefer to ensure all text-equations are valid or throw an informative error message.


validate_equation <- function(text, envir = parent.frame()) {
  tryCatch(
    eval(parse(text = text), envir = envir),
    error = function(e) {
      stop("`text` must be valid R code.\nx Invalid: ", text, call. = FALSE)
    }
  )

  invisible(text)
}

dbh <- 99
validate_equation(valid)
try(validate_equation(invalid))
#> Error : `text` must be valid R code.
#> x Invalid: <-1dbh

f <- function(text) {
  validate_equation(text, envir = list(dbh = 1))
  # Some useful code

  text
}

f(valid)
#> [1] "dbh + 1"
try(f(invalid))
#> Error : `text` must be valid R code.
#> x Invalid: <-1dbh
adamhsparks commented 2 years ago

@gonzalezeb, thank you for your comprehensive overview of the changes that you have made to the allodb package.

@jeffreyhanson and @jstillh could you please let me know if you are satisfied with the changes to the package or have any further suggestions that may improve it?

jeffreyhanson commented 2 years ago

Awesome - thank you very much @gonzalezeb. @adamhsparks, I'll try and take get back to you next week.

jstillh commented 2 years ago

@gonzalezeb thanks a lot for the overview of the changes - i'll try to go through them in the next week as well and get back to you, @adamhsparks .

jeffreyhanson commented 2 years ago

Thank you @gonzalezeb for providing such detailed responses to the points I raised earlier. I especially found your discussion on the costs and benefits of eval(parse(...)) really useful. I think the R package is looking absolutely brilliant! Most of the points I raised earlier have been addressed. I've made a couple of suggestions below that I think could help improve package.1 All things considered, these are pretty minor - just some documentation tweaks. If those could be addressed, I'd say that all my suggestions have been satisfied. I've also listed several additional minor comments too. Based on my understanding of rOpenSci guidelines, they won't/shouldn't influence whether the package should be accepted or not (but please correct me if I am wrong). I've listed them just to be thorough, and in case the package authors think they are worth addressing.

Suggestions

Minor comments


1 Please note that I still (unsurprisingly) lack the forestry experience needed to evaluate the novelty and methodology of this package, and so continue to limit my reviews to code quality and documentation.

2 Just to be clear, I do think that -- where possible, feasible, and reasonable -- R package developers should ideally -- if only in principle -- minimize security risks, even if software licenses absolve developers of responsibility. It's just I think that eval(paste(..)) is not so terrible here, given the noted trade-offs that are specific to this package. Perhaps other will disagree, and that's fine. Please note that I do not have any experience with cybersecurity. If you want an informed view point on such matters, I suggest you talk to someone else :)

gonzalezeb commented 2 years ago

Thanks @jeffreyhanson, I have adopted most of your comments, they are very helpful!

jstillh commented 2 years ago

Thank you @gonzalezeb for the very detailed response to the reviews. As @jeffreyhanson, I like your detailed response regarding the use of eval(parse(...)). In general, i am very pleased with the revisions and the changes you implemented in the functions. I do not have any more suggestions beyond the ones by @jeffreyhanson.

Thanks for this contribution.