allodb: An R package for biomass estimation at extratropical forest plots

ropensci / software-review

rOpenSci Software Peer Review.

291 stars 104 forks source link

allodb: An R package for biomass estimation at extratropical forest plots #436

Closed gonzalezeb closed 2 years ago

gonzalezeb commented 3 years ago

Date accepted: 2021-09-27 Submitting Author: Erika Gonzalez-Akre (@gonzalezeb) Other Authors: Camille Piponiot (@cpiponiot), Mauro Lepore (@maurolepore), Kristina Teixeira (@teixeirak) Repository: https://github.com/forestgeo/allodb Version submitted: 0.0.0.9000 Editor: !--editor-->@adamhsparks<!--end-editor-- Reviewers: @jeffreyhanson, @jstillh

Due date for @jeffreyhanson: 2021-06-14 Due date for @jstillh: 2021-06-24

Archive: TBD Version accepted: TBD

Paste the full DESCRIPTION file inside a code block below:

Package: allodb
Title: Tree Biomass Estimation at Extratropical Forest Plots
Version: 0.0.0.9000
Authors@R: 
    c(person(given = "Erika",
             family = "Gonzalez-Akre",
             role = c("aut", "cre", "cph"),
             email = "GonzalezEB@si.edu"),
      person(given = "Camille",
             family = "Piponiot",
             role = "aut",
             email = "camille.piponiot@gmail.com"),
      person(given = "Mauro",
             family = "Lepore",
             role = "aut",
             email = "maurolepore@gmail.com",
             comment = c(ORCID = "0000-0002-1986-7988")),
      person(given = "Kristina",
             family = "Anderson-Teixeira",
             role = "aut",
             email = "TeixeiraK@si.edu"))
Description: Tool to standardize and simplify the tree biomass
    estimation process across globally distributed extratropical forests.
License: GPL-3
URL: https://github.com/forestgeo/allodb
BugReports: https://github.com/forestgeo/allodb/issues
Depends: 
    R (>= 3.5.0)
Imports: 
    data.table (>= 1.12.8),
    ggplot2 (>= 3.3.2),
    kgc (>= 1.0.0.2),
    utils
Suggests: 
    covr (>= 3.2.1),
    knitr (>= 1.21),
    purrr,
    rmarkdown,
    spelling,
    testthat (>= 3.0.0)
VignetteBuilder: 
    knitr
Config/testthat/edition: 3
Encoding: UTF-8
Language: en-US
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [ ] data retrieval
- [ ] data extraction
- [ ] data munging
- [ ] data deposition
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [x] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences): allodb integrates compiled data and built-in functions to calculate tree above-ground biomass (AGB) which is a critical value used by forest ecologists to estimate the amount of carbon sequester and resealed by forests. allodb falls in the "field and laboratory reproducibility tools" category as our package intents to improve and standardize the calculation of a widely used forest metric.
Who is the target audience and what are scientific applications of this package? allodb is designed to be used by forest ecologists with basic or advanced knowledge of R. This scientific application is intended to be used by individuals interested in calculating forest metrics such as AGB in extratropical forest (subtropical, temperate, and boreal forests).
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category? The BIOMASS package is designed to estimate biomass in tropical forest. The main difference between allodb and BIOMASS is the set of equations used to estimate AGB. There is an existing calibrated equation used globally for tropical environments but not for areas outside the tropics. Our goal is to provide a set of curated equations and the ability to add new and additional equations in order to have a more reliable value of AGB outside the tropics. We do not duplicate any functionality of the BIOMASS package and only agree on the ecological scope.
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research? N/A
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted. @melvidoni or issue 427

Technical checks

Confirm each of the following by checking the box.

[x] I have read the guide for authors and rOpenSci packaging guide.

This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions, created with roxygen2.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.

Publication options

[x] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[x] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

- [x] The package is novel and will be of interest to the broad readership of the journal. - [x] The manuscript describing the package is no longer than 3000 words. - [x] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

[x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

noamross commented 3 years ago

@ropensci-review-bot assign @adamhsparks as editor

ropensci-review-bot commented 3 years ago

Assigned! @adamhsparks is now the editor

adamhsparks commented 3 years ago

Editor checks:

[x] Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package.
[x] Fit: The package meets criteria for fit and overlap
[x] Automated tests: Package has a testing suite and is tested via a CI service.
[x] License: The package has a CRAN or OSI accepted license
[x] Repository: The repository link resolves correctly

Editor comments

G'day, Erika, I'll be your guest editor for allodb.

The package passes all standard local checks and tests with flying colours. But checking the code syntax using lintr::lint_package() (I for some reason can't use goodpractice::gp() locally so I'm falling back to this), I get several suggestions for improving the code. I suggest going through them and addressing where appropriate and possible and perhaps noting in your submission those flags that are not appropriate to address for allodb. This will make the review process easier for the reviewers.

Also, I note that your package's website is incomplete, but once accepted to rOpenSci a website will be generated for you and I note that your current README contains the necessary information on how to install and use the package, so we're good there.

Once you've addressed the issues flagged with goodpractice::gp() or lintr::lint_package(), I'll move on to finding reviewers for you.

adamhsparks commented 3 years ago

@ropensci-review-bot run goodpractice

adamhsparks commented 3 years ago

Hi, @gonzalezeb, just checking in to see if there’s anything I can do to help you with your submission process?

gonzalezeb commented 3 years ago

Hi Adam, thanks for checking in. I am getting some errors after addressing some issues brought out by lintr::lint_package() and goodpractice::gp(), but my collaborator is not, so I am not sure if everything is good (at least not from my laptop).

When I run devtools::test() to test functionalities, I get that some tests are not running well. When I run goodpractice::goodpractice() on 3/26 I got "Supreme package..." but now I get a few errors. Maybe if you run some checks again you can let me know if you are still seeing issues..

adamhsparks commented 3 years ago

Yes, it passes both the CRAN checks when I run them with, and devtools::test(), but the code style, etc. as suggested by the rOpenSci guidelines have some suggestions.

This is what I see when I run lint_package().

Some of these can easily be addressed, others like the first hit with names, probably have good reasons in your field to have the object name like that. That's fine, but it should be noted for the reviewers.

> lintr::lint_package()
................
R/est_params.R:46:24: style: Variable and function name style should be snake_case.
                       Nres = 1e4
                       ^~~~
R/est_params.R:64:51: style: Commas should always have a space after.
                       coords = dfobs[i, c("long","lat")],
                                                  ^
R/est_params.R:72:66: style: Put spaces around all infix operators.
    if (length(unique(df$equation_id)) == 1) df[1, 3] <- df[1, 3]*1.01
                                                                ~^~
R/get_biomass.R:56:25: style: Variable and function name style should be snake_case.
                        Nres = 1e4) {
                        ^~~~
R/illustrate_allodb.R:48:31: style: Variable and function name style should be snake_case.
                              Nres = 1e4) {
                              ^~~~
R/imports.R:6:1: style: Variable and function name style should be snake_case.
data.frame <- function(...) {
^~~~~~~~~~
R/new_equations.R:62:1: style: functions should have cyclomatic complexity of less than 15, this has 24.
new_equations <- function(subset_taxa = "all",
^
R/new_equations.R:71:27: style: Variable and function name style should be snake_case.
                          new_minDBH = NULL,
                          ^~~~~~~~~~
R/new_equations.R:72:27: style: Variable and function name style should be snake_case.
                          new_maxDBH = NULL,
                          ^~~~~~~~~~
R/new_equations.R:73:27: style: Variable and function name style should be snake_case.
                          new_sampleSize = NULL,
                          ^~~~~~~~~~~~~~
R/new_equations.R:74:27: style: Variable and function name style should be snake_case.
                          new_unitDBH = "cm",
                          ^~~~~~~~~~~
R/new_equations.R:75:27: style: Variable and function name style should be snake_case.
                          new_unitOutput = "kg",
                          ^~~~~~~~~~~~~~
R/new_equations.R:76:27: style: Variable and function name style should be snake_case.
                          new_inputVar = "DBH",
                          ^~~~~~~~~~~~
R/new_equations.R:77:27: style: Variable and function name style should be snake_case.
                          new_outputVar = "Total aboveground biomass",
                          ^~~~~~~~~~~~~
R/new_equations.R:88:34: style: Variable and function name style should be snake_case.
  suppressWarnings(new_equations$dbh_unit_CF <-
                                 ^~~~~~~~~~~
R/new_equations.R:90:34: style: Variable and function name style should be snake_case.
  suppressWarnings(new_equations$output_units_CF <-
                                 ^~~~~~~~~~~~~~~
R/new_equations.R:112:5: style: Variable and function name style should be snake_case.
    toMerge <- eq_jansen[, c("hsub", "equation_allometry")]
    ^~~~~~~
R/new_equations.R:113:64: style: Variable and function name style should be snake_case.
    eq_jansen$equation_allometry <- apply(toMerge, 1, function(X) {
                                                               ^
R/new_equations.R:181:30: style: There should be a space between right parenthesis and an opening curly brace.
    if (is.matrix(new_coords)){
                             ^~
R/new_equations.R:265:5: style: Variable and function name style should be snake_case.
    equationID <- paste0("new", seq_len(length(new_taxa)))
    ^~~~~~~~~~
R/new_equations.R:266:5: style: Variable and function name style should be snake_case.
    coordsEq <- cbind(long = new_coords[, 1],
    ^~~~~~~~
R/new_equations.R:268:5: style: Variable and function name style should be snake_case.
    rcoordsEq <- round(coordsEq * 2 - 0.5) / 2 + 0.25
    ^~~~~~~~~
R/new_equations.R:270:5: style: Variable and function name style should be snake_case.
    koppenZones <- apply(rcoordsEq, 1, function(X) {
    ^~~~~~~~~~~
R/new_equations.R:270:49: style: Variable and function name style should be snake_case.
    koppenZones <- apply(rcoordsEq, 1, function(X) {
                                                ^
R/new_equations.R:273:5: style: Variable and function name style should be snake_case.
    koppenZones <- as.character(unlist(koppenZones))
    ^~~~~~~~~~~
R/new_equations.R:301:5: style: Variable and function name style should be snake_case.
    dbhCF <-
    ^~~~~
R/new_equations.R:303:5: style: Variable and function name style should be snake_case.
    outputCF <-
    ^~~~~~~~
R/new_equations.R:305:28: style: Variable and function name style should be snake_case.
    suppressWarnings(dbhCF$dbh_unit_CF <-
                           ^~~~~~~~~~~
R/new_equations.R:307:31: style: Variable and function name style should be snake_case.
    suppressWarnings(outputCF$output_units_CF <-
                              ^~~~~~~~~~~~~~~
R/resample_agb.R:41:26: style: Variable and function name style should be snake_case.
                         Nres = 1e4) {
                         ^~~~
R/resample_agb.R:78:47: style: Variable and function name style should be snake_case.
  list_dbh <- apply(dfsub[, 1:3], 1, function(X) {
                                              ^
R/resample_agb.R:89:5: warning: local variable ‘sampled_dbh’ assigned but may not be used
    sampled_dbh <- list_dbh
    ^~~~~~~~~~~
R/resample_agb.R:97:7: warning: local variable ‘sampled_dbh’ assigned but may not be used
      sampled_dbh <- list_dbh[[j]]
      ^~~~~~~~~~~
R/weight_allom.R:45:31: style: Variable and function name style should be snake_case.
  suppressWarnings(dfequation$wN <-
                              ^~
R/weight_allom.R:53:3: style: Variable and function name style should be snake_case.
  coordsSite <- t(as.numeric(coords))
  ^~~~~~~~~~
R/weight_allom.R:54:3: style: Variable and function name style should be snake_case.
  rcoordsSite <- round(coordsSite * 2 - 0.5) / 2 + 0.25
  ^~~~~~~~~~~
R/weight_allom.R:56:3: style: Variable and function name style should be snake_case.
  koppenObs <- apply(rcoordsSite, 1, function(X) {
  ^~~~~~~~~
R/weight_allom.R:56:47: style: Variable and function name style should be snake_case.
  koppenObs <- apply(rcoordsSite, 1, function(X) {
                                              ^
R/weight_allom.R:64:14: style: Variable and function name style should be snake_case.
  dfequation$wE <- vapply(dfequation$koppen, compare_koppen, FUN.VALUE = 0.9)
             ^~
R/weight_allom.R:86:14: style: Variable and function name style should be snake_case.
  dfequation$wT <- 1e-6
             ^~
R/weight_allom.R:91:3: style: Variable and function name style should be snake_case.
  eqtaxaG <- data.table::tstrsplit(dfequation$taxa1, " ")[[1]]
  ^~~~~~~
R/weight_allom.R:92:3: style: Variable and function name style should be snake_case.
  eqtaxaS <- data.table::tstrsplit(dfequation$taxa1, " ")[[2]]
  ^~~~~~~
R/weight_allom.R:152:3: style: Variable and function name style should be snake_case.
  vecW <- dfequation$w
  ^~~~
tests/testthat/test-est_params.R:5:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-est_params.R:10:39: style: Commas should always have a space after.
                         long = c(-78,-85),
                                      ^
tests/testthat/test-get_biomass.R:5:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-get_biomass.R:25:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-get_biomass.R:92:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-get_biomass.R:106:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-get_biomass.R:122:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-illustrate_allodb.R:10:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-illustrate_allodb.R:31:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-new_equations.R:4:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-new_equations.R:127:3: style: Opening curly braces should never go on their own line and should always be followed by a new line.
  {
  ^
tests/testthat/test-resample_agb.R:19:3: style: Opening curly braces should never go on their own line and should always be followed by a new line.
  {
  ^
tests/testthat/test-weight_allom.R:5:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^

adamhsparks commented 3 years ago

There is a note about UTF-8 strings as well that I'm seeing that you'll need to address as well.

* checking data for non-ASCII characters ... NOTE
  Note: found 156 marked UTF-8 strings

gonzalezeb commented 3 years ago

Thanks @adamhsparks. Most of the suggestion after running lint_package() have already been applied, see for example here and here. I am not sure why they don't show to you. I still need to work on some thought. Let me know if after installing the package again you still see those.

adamhsparks commented 3 years ago

Ah, sorry, I didn’t pay attention and realise you’d applied the changes to the repo yet. I’d need to refresh my local branch and rerun. I thought you were having issues identifying and applying fixes to these initial problems.

On 5 Apr 2021, at 09:47, Erika Gonzalez-Akre @.***> wrote:

Thanks @adamhsparks. Most of the suggestion after running lint_package() have already been applied, see for example here and here. I am not sure why they don't show to you. I still need to work on some thought. Let me know if after installing the package again you still see those.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

adamhsparks commented 3 years ago

OK, this is what I get now.

> lintr::lint_package()
................
R/new_equations.R:62:1: style: functions should have cyclomatic complexity of less than 15, this has 24.
new_equations <- function(subset_taxa = "all",
^
R/new_equations.R:115:65: style: Variable and function name style should be snake_case.
    eq_jansen$equation_allometry <- apply(to_merge, 1, function(X) {
                                                                ^
R/new_equations.R:273:51: style: Variable and function name style should be snake_case.
    koppen_zones <- apply(rcoords_eq, 1, function(X) {
                                                  ^
R/resample_agb.R:78:47: style: Variable and function name style should be snake_case.
  list_dbh <- apply(dfsub[, 1:3], 1, function(X) {
                                              ^
R/resample_agb.R:89:5: warning: local variable ‘sampled_dbh’ assigned but may not be used
    sampled_dbh <- list_dbh
    ^~~~~~~~~~~
R/resample_agb.R:97:7: warning: local variable ‘sampled_dbh’ assigned but may not be used
      sampled_dbh <- list_dbh[[j]]
      ^~~~~~~~~~~
R/weight_allom.R:45:31: style: Variable and function name style should be snake_case.
  suppressWarnings(dfequation$wN <-
                              ^~
R/weight_allom.R:53:3: style: Variable and function name style should be snake_case.
  coordsSite <- t(as.numeric(coords))
  ^~~~~~~~~~
R/weight_allom.R:54:3: style: Variable and function name style should be snake_case.
  rcoordsSite <- round(coordsSite * 2 - 0.5) / 2 + 0.25
  ^~~~~~~~~~~
R/weight_allom.R:56:3: style: Variable and function name style should be snake_case.
  koppenObs <- apply(rcoordsSite, 1, function(X) {
  ^~~~~~~~~
R/weight_allom.R:56:47: style: Variable and function name style should be snake_case.
  koppenObs <- apply(rcoordsSite, 1, function(X) {
                                              ^
R/weight_allom.R:64:14: style: Variable and function name style should be snake_case.
  dfequation$wE <- vapply(dfequation$koppen, compare_koppen, FUN.VALUE = 0.9)
             ^~
R/weight_allom.R:86:14: style: Variable and function name style should be snake_case.
  dfequation$wT <- 1e-6
             ^~
R/weight_allom.R:91:3: style: Variable and function name style should be snake_case.
  eqtaxaG <- data.table::tstrsplit(dfequation$taxa1, " ")[[1]]
  ^~~~~~~
R/weight_allom.R:92:3: style: Variable and function name style should be snake_case.
  eqtaxaS <- data.table::tstrsplit(dfequation$taxa1, " ")[[2]]
  ^~~~~~~
R/weight_allom.R:152:3: style: Variable and function name style should be snake_case.
  vecW <- dfequation$w
  ^~~~
tests/testthat/test-est_params.R:9:39: style: Commas should always have a space after.
                         long = c(-78,-85),

Are there any that I can provide assistance with?

gonzalezeb commented 3 years ago

Hi @adamhsparks. Most issues brought out by lint have been addressed, only two remain but I am little stuck on them. Maybe you can give me some guidance.

In R/new_equations, the function contains a lot of 'if' because it does a lot of sanity checks on the user-provided data. We could remove these, at the risk of having undetected problems in the user-provided allometries; and it seems that moving the 'ifs' to another function would make much sense.

In R/weight_allom, the problem is with "wE" which is the original col name of a matrix (koppenMatrix) been used.

adamhsparks commented 3 years ago

Thanks, @gonzalezeb, for addressing these changes. I think that both of these are acceptable given the reasons stated. I'll begin looking for reviewers now.

adamhsparks commented 3 years ago

@gonzalezeb, can you please update your README badges for allodb using rodev::use_review_badge(436) and add a NEWS.md file, usethis::use_news_md()?

adamhsparks commented 3 years ago

@gonzalezeb, can you also please address the errors that are encountered when running checks?

==> devtools::check()

ℹ Updating allodb documentation
ℹ Loading allodb
Writing NAMESPACE
Writing NAMESPACE
── Building ────────────────────────────────────────────────────────── allodb ──
Setting env vars:
● CFLAGS    : -Wall -pedantic -fdiagnostics-color=always
● CXXFLAGS  : -Wall -pedantic -fdiagnostics-color=always
● CXX11FLAGS: -Wall -pedantic -fdiagnostics-color=always
────────────────────────────────────────────────────────────────────────────────
✓  checking for file ‘/Users/adamsparks/Development/GitHub/Reviews/allodb/DESCRIPTION’ ...
─  preparing ‘allodb’:
✓  checking DESCRIPTION meta-information ...
─  installing the package to build vignettes
E  creating vignettes (5.3s)
   --- re-building ‘allodb-vignette.Rmd’ using rmarkdown
   Quitting from lines 81-87 (allodb-vignette.Rmd) 
   Error: processing vignette 'allodb-vignette.Rmd' failed with diagnostics:
   arguments imply differing number of rows: 9991, 19
   --- failed re-building ‘allodb-vignette.Rmd’

   SUMMARY: processing the following file failed:
     ‘allodb-vignette.Rmd’

   Error: Vignette re-building failed.
   Execution halted

Error: 'col_ed' is not an exported object from 'namespace:cli'
Execution halted

Exited with status 1.

Once that's dealt with, I'll look for reviewers. 🙏

gonzalezeb commented 3 years ago

Hi @adamhsparks, allodb is passing all checks, please let me know if you see the same (I hope!)

devtools::check(remote = TRUE, manual = TRUE)
i Updating allodb documentation
i Loading allodb
Writing NAMESPACE
Writing NAMESPACE
-- Building --------------------------------------------------- allodb --
Setting env vars:
* CFLAGS    : -Wall -pedantic -fdiagnostics-color=always
* CXXFLAGS  : -Wall -pedantic -fdiagnostics-color=always
* CXX11FLAGS: -Wall -pedantic -fdiagnostics-color=always
-------------------------------------------------------------------------
√  checking for file 'C:\Users\erikab\Dropbox (Smithsonian)\GitHub\allodb/DESCRIPTION' (349ms)
-  preparing 'allodb': (11.3s)
√  checking DESCRIPTION meta-information ... 
-  installing the package to build vignettes
√  creating vignettes (39.3s)
-  checking for LF line-endings in source and make files and shell scripts (11.1s)
-  checking for empty or unneeded directories
-  looking to see if a 'data/datalist' file should be added
-  building 'allodb_0.0.0.9000.tar.gz'

-- Checking --------------------------------------------------- allodb --
Setting env vars:
* _R_CHECK_CRAN_INCOMING_USE_ASPELL_: TRUE
* _R_CHECK_CRAN_INCOMING_REMOTE_    : TRUE
* _R_CHECK_CRAN_INCOMING_           : TRUE
* _R_CHECK_FORCE_SUGGESTS_          : FALSE
* NOT_CRAN                          : true
-- R CMD check ----------------------------------------------------------
-  using log directory 'C:/Users/erikab/AppData/Local/Temp/Rtmp6dUAvx/allodb.Rcheck' (1s)
-  using R version 3.6.3 (2020-02-29)
-  using platform: x86_64-w64-mingw32 (64-bit)
-  using session charset: ISO8859-1
-  using option '--as-cran' (402ms)
√  checking for file 'allodb/DESCRIPTION'
-  this is package 'allodb' version '0.0.0.9000'
-  package encoding: UTF-8
N  checking CRAN incoming feasibility (35.1s)
   Maintainer: 'Erika Gonzalez-Akre <GonzalezEB@si.edu>'

   New submission

   Version contains large components (0.0.0.9000)
√  checking package namespace information ...
√  checking package dependencies (2.3s)
√  checking if this is a source package ... 
√  checking if there is a namespace
√  checking for executable files (1.2s)
√  checking for hidden files and directories ... 
√  checking for portable file names ... 
√  checking whether package 'allodb' can be installed (11.2s)
√  checking installed package size ... 
√  checking package directory (894ms)
√  checking for future file timestamps (624ms)
√  checking 'build' directory ...
√  checking DESCRIPTION meta-information (718ms)
√  checking top-level files ...
√  checking for left-over files ... 
√  checking index information (353ms)
√  checking package subdirectories (392ms)
√  checking R files for non-ASCII characters ... 
√  checking R files for syntax errors ... 
√  checking whether the package can be loaded (355ms)
√  checking whether the package can be loaded with stated dependencies ... 
√  checking whether the package can be unloaded cleanly (336ms)
√  checking whether the namespace can be loaded with stated dependencies (360ms)
√  checking whether the namespace can be unloaded cleanly (459ms)
√  checking loading without being on the library search path (780ms)
√  checking use of S3 registration (2.5s)
√  checking dependencies in R code (2s)
√  checking S3 generic/method consistency (783ms)
√  checking replacement functions (335ms)
√  checking foreign function calls (802ms)
√  checking R code for possible problems (5.4s)
√  checking Rd files (681ms)
√  checking Rd metadata ... 
√  checking Rd line widths ... 
√  checking Rd cross-references (343ms)
√  checking for missing documentation entries (473ms)
√  checking for code/documentation mismatches (1.5s)
√  checking Rd \usage sections (1.1s)
√  checking Rd contents ... 
√  checking for unstated dependencies in examples ... 
√  checking contents of 'data' directory (696ms)
√  checking data for non-ASCII characters (1s)
√  checking data for ASCII and uncompressed saves (588ms)
√  checking R/sysdata.rda (478ms)
√  checking installed files from 'inst/doc' ... 
√  checking files in 'vignettes' ... 
√  checking examples (4.7s)
√  checking for unstated dependencies in 'tests' ... 
-  checking tests (429ms)
√  Running 'testthat.R' (8.7s)
√  checking for unstated dependencies in vignettes (9.1s)
√  checking package vignettes in 'inst/doc' ... 
√  checking re-building of vignette outputs (23.6s)
√  checking PDF version of manual (3.6s)
√  checking for detritus in the temp directory

   See
     'C:/Users/erikab/AppData/Local/Temp/Rtmp6dUAvx/allodb.Rcheck/00check.log'
   for details.

-- R CMD check results --------------------------- allodb 0.0.0.9000 ----
Duration: 2m 0.2s

> checking CRAN incoming feasibility ... NOTE
  Maintainer: 'Erika Gonzalez-Akre <GonzalezEB@si.edu>'

  New submission

  Version contains large components (0.0.0.9000)

0 errors √ | 0 warnings √ | 1 note x

adamhsparks commented 3 years ago

Hi @gonzalezeb, I'm getting a NOTE when I run R CMD check.

Note: found 156 marked UTF-8 strings

Can you check into this?

As it's just a note, not an error in the checks, I'll start looking for reviewers now.

adamhsparks commented 3 years ago

@ropensci-review-bot assign @jeffreyhanson

jeffreyhanson commented 3 years ago

Hi @gonzalezeb, I just had a quick look through this package it looks really awesome! Great work!

Before I do a through review, I have some broad suggestions for the code/doucmentation that are relevant for multiple functions. I also noticed a few specific things that might be good to fix before I do the reivew too. I've listed them below - what do you think? I think most of them could be addressed by polishing the code/documentation a bit more? I'm also happy to conduct a through review with the current version of the package if you prefer - it will just a lot of small issues (e.g. this term in this file should have code formatting) that might be annoying for you to have to respond to?

The package website isn't working (https://forestgeo.github.io/allodb/). I think there might be a bug in the GitHub Actions code for generating the package website (e.g. see errors here).
I feel like the function documentation could be improved by using code formatting (e.g. use "`character` vector" instead of "character vector"). This would help users understand that you are referring to a specific data type. This suggestion applies to multiple functions - what do you think?
I feel like the function documentation could be improved by using more precise terminology. For example, in the documentation for function parameters, I think it would be helpful to specify the exact class for arguments (e.g. using "`numeric` vector" instead of "numerical vector", because "numerical" isn't a class in R). Additionally, I think it would be helpful to include classes in the documentation for function outputs (i.e. in @return sections). For example, this sentence currently says "A data frame of resampled DBHs and associated ..." and could be updated to say "A data frame (`tibble::tibble()` object) of resampled DBHs and associated ...". This suggestion applies to multiple functions - how does that sound?

Also, I feel like the documentation for function parameters could be more consistent. I really like how the documentation for the dbh parameter in get_biomass function has the format "A [class-name] containing [...]". I wonder if you could update the parameter documentation for the other parameters -- e.g. the wna parameter -- so that they all follow this format (e.g. wna currently reads like "This parameter is used [...]" and doesn't contain class information).
It's not really the best practice to incude a non-R script file in the /R folder. Perhaps the sysdata.rda file could be moved to a more apropriate location?
I feel like the it might helpful to include some code to validate function arguments. This could help users understand why their code isn't working if they supply incorrect arguments (e.g. if they supply a numeric vector instead of a character vector). For example, I think this argument validation code is fantastic and will be really useful for users, and I wonder if you could add similar code to verify arguments to all parameters in each function? If you're interested in R packages to help with this, I personally use the assertthat R package (e.g. see this function) and I have heard good things about checkmate R package too. What do you think?
Could you remove the unused Travis badge in REAMDE (because it seems that GitHub Actions is used for CI)?
The .buildignore folder seems to contain random R script and meeting notes. I don't see the benefit for keeping this in the code repository? Perhaps you could move them to a wiki, issues, or a seperate repo? Sorry if I'm missing something?
This function (and maybe others?) over-ride the global state. This is not ideal because it could cause issues for users when they do not expect such changes. For example, imagine a user is trying to use this package within an analysis, and they manually use set.seed() to handle random number generation for simulations in various parts of their analysis. If this function uses set.seed() and does not revert it afterwards, then it could mean that the user's subsequent simulations are all based on the same seed even though they used set.seed() to change the seed in their code. If you're interested in suggestions on how to address this, I personally like to use the withr R package for stuff like this. For example, instead of using the set.seed() function, you could use the withr::with_seed()` function. Does that help?
eval(parse(....)) is rarely a good idea unless you have no alternatives. This is because it can introduces bugs that are hard to track down and it can also cause security problems. In this function, I wonder if you could you could use some combination of expression and substitute to acheive the same result?
I feel like it would be easier for users to use this function (and maybe others too?) if the parameters were re-ordered such that all parameters without defaults came before those with defaults. For example, to run the get_biomass() function with default parameters and without using named arguments, I would have to do something like get_biomasss(x, y, , z) instead of get_biomass(x, y, z) (to specify arguments for dbh, genus, and coords). But maybe there's a reason for this particular order for parameters? Sorry if I'm missing something obvious.

Please let me know if anything I've written isn't clear or doesn't make sense, and I can follow up with more details? I could also be entirely wrong about some things too, so please do correct me if I'm missing something obvious :)

adamhsparks commented 3 years ago

I'll try again with the bot. I'm new here, please bear with me.

@ropensci-review-bot add @jeffreyhanson to reviewers.

adamhsparks commented 3 years ago

@ropensci-review-bot add @jeffreyhanson to reviewers.

adamhsparks commented 3 years ago

@ropensci-review-bot add @jeffreyhanson to reviewers

ropensci-review-bot commented 3 years ago

@jeffreyhanson added to the reviewers list. Review due date is 2021-06-14. Thanks @jeffreyhanson for accepting to review! Please refer to our reviewer guide.

gonzalezeb commented 3 years ago

Hello @jeffreyhanson, your suggestions are manageable, thanks for catching all that. We will work on those and will get back to you if we have questions. Will try to have things solved by 5/31 so you can proceed with a through review.

jeffreyhanson commented 3 years ago

Hi @gonzalezeb, ok, that sounds great - thank you!

adamhsparks commented 3 years ago

Just an update @gonzalezeb, I’m still looking for a second reviewer, sorry about any delays with this.

adamhsparks commented 3 years ago

@ropensci-review-bot add @jstillh to reviewers

ropensci-review-bot commented 3 years ago

@jstillh added to the reviewers list. Review due date is 2021-06-24. Thanks @jstillh for accepting to review!

adamhsparks commented 3 years ago

Thank you for accepting the review for allodb, Jonas. Please let me know if I can assist you with anything along the way.

gonzalezeb commented 3 years ago

Hi @jeffreyhanson, we solved many of the issues you pointed out and may justify a few later. Hope you can go ahead with reviewing the functionality of the package. Thanks!

jstillh commented 3 years ago

Hi @gonzalezeb
Here is my review. I really appreciate the package and the work you & the co-authors have put into developing it.
I think the package will be useful for a lot (if not every) forest ecologist... I reviewed the package looking at it as a regular user - sorry if i misunderstood some of the components of your package.

Package Review

Besides Kristina Anderson Texeira and myself being listed as co-authors on a paper with 100+ authors i don’t have any professional relationship with the authors of the package
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s) demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples (that run successfully locally) for all exported functions
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Estimated hours spent reviewing: 9

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer (“rev” role) in the package DESCRIPTION file.

Review Comments

The authors provide a package allowing to calculate biomass for individual trees of extratropical tree and shrub species. A package including biomass equations for extratropical species is a useful and timely complement to the BIOMASS package by Chave et al. for tropical species. The authors apparently developed the package focusing on the plots of the ForestGEO network. Given that the authors aim at providing a package complementing the BIOMASS package, i reviewed the functionality of the package not only for the use on single/multiple sites but for the use along latitudinal gradients and focused on European tree species that i am familiar with.
Although i appreciate the general idea of the package and the large effort that the authors put into its development, i have some concerns regarding the concept applied when calculating biomass and some general comments on code and documentation that i feel should be addressed:

Conceptual

The core function get_biomass provides the user with one estimate of aboveground biomass based on the diameter at breast height (dbh) of a tree. This function then applies one or several of the biomass equations based on climatic similarity and similarity between species. The estimate is based on a large set of biomass-equations compiled from a wide range of sources - i very much appreciate the work the authors have put into the compilation of this dataset.
I like the general framework of this package and the option to add additional equations via the function new_equations but i have concerns regarding the workflow implemented in the package. These mainly focus on the following points:

Documentation of selection and weighting
The function get_biomass returns a vector with an estimate of the biomass for each tree. The calculation of this value is somewhat opaque as long as the user does not actively change some of the arguments (which by default do not need to be provided) of the get_biomass function. Even if species / genus specific equations are available, this function includes additional, less specific equations. These are given a certain weight based on similarities using the weight_allom function and might contribute little to the final estimate. However, this is not obvious to the user, as the package only provides one value for each tree. I would appreciate if the function get_biomass would not only return a vector but more metadata on the steps that resulted in the calculation of the final result. This could be achieved by returning a list containing the final values, all values with the associated weights, the output of new_eqtable documenting the equations used and the output of weight_allom.
I understand that the function illustrate_allodb should give the user an impression on the results of the resampling - but it does not provide the user with all information (e.g., no information on weights) and it requires the user to run an additional function.
Inclusion of equations based on climatic simlilarity
To check for climatic similarity, the package makes use of the Koeppen classification. This is a feasible approach - but it has some drawbacks in the implementation: Based on the Koeppen classification of the site (provided through the coords argument), a certain set of equations is used. This can result in the exclusion/ downweighting even of species specific biomass equations. The illustrative example below provides biomass estimates for Fagus sylvatica along a latitudinal gradient from Northern Italy to Northern Germany and the output of illustrate_allodb for two sites used in this latitudinal gradient. Obviously a completely different set of equations is used - in the second site, no species specific equations are used. (Btw: there seems to be a typo, i guess it should be Populus instead of Pupulus.)
The differences between the sites observed here are mostly driven by a change in the Koeppen zones and a different set of equations applied due to the change in Koeppen zones. I am not convinced that this makes ecologically sense. Additionally, the Koeppen-classification for specific equations seems to be based on the value for longitude and latitude provided in the original publication of each biomass equation. However, for equations covering a wider range (e.g., the equations published in Forrester et al. 2017), the authors seem to have extracted the Koeppen value for the mean of both the lat and long range provided. This may result in a very specific classification not representative for the species and a subsequent down-weighting of this equation when applied in the midst of the species’s distribution range: For the ForestGeo Plot in Zofin, which is mainly covered with beech and lies in the center of the distribution range of this species, this results in a down-weighting of the equation provided in Forrester et al. 2017 as the value wE provided in the dataset for this specific combination in koppenMatrix is 0.7. I did not check if other equations have a higher weight, but i think this issue should be addressed by the authors.
If the Koeppen classification is not included in the dataset koeppenMatrix or the specific combination if Koeppen zones is included with a weight of 0, this results in the exclusion of a specific equation and a subsequent error message (if only one equation is provided) or the inclusion of other, not species specific equations, see my comments further down.

In general, the weighting concept of the different biomass equations should be more transparent for the package user and probably requires conceptual changes.

data <- allodb::get_biomass(rep(50, 41), rep("Fagus", 41), rep("sylvatica", 41), coords = as.matrix(cbind(rep(10, 41), seq(44, 54, 0.25))))
rcoords_site <- round(as.matrix(cbind(rep(10, 41), seq(44, 54, 0.25))) * 2 - 0.5)/2 + 0.25
koeppen <- apply(rcoords_site, 1, function(xk) {
  subset(kgc::climatezones, Lon == xk[1] & Lat == xk[2])$Cls
})
plot(seq(44, 54, 0.25), data, xlab = "lat", ylab = "biomass [kg]",  ylim = c(0, 3000), pch = 21, cex = 1.5, bg = koeppen)

![unnamed-chunk-1-1](https://user-images.githubusercontent.com/31648011/121906533-7da22c00-cd2b-11eb-97fe-fe8d6a432e5f.png)

allodb::illustrate_allodb("Fagus" , coords = c(10, 44), "sylvatica")

unnamed-chunk-1-1

allodb::illustrate_allodb("Fagus" , coords = c(10, 46), "sylvatica")

data("koppenMatrix", package = "allodb")
data("sites_info", package = "allodb")

koppenMatrix$wE[koppenMatrix$zone1 == sites_info$koppen[sites_info$site == "zofin"] & koppenMatrix$zone2 == "Csb"]

## [1] 0.7

Code and documentation

Use of eval(parse())
As @jeffreyhanson already pointed out here, the use of eval(parse()) is not good practice for the reasons he laid out nicely in his comment. You seem to have replaced the eval(parse()) in this function but then changed it back according to this issue because the str2lang that you used to replace eval(parse()) is not available for users with R < 3.6.0.
I was not aware of the str2lang function and i am not convinced that using this function will solve the issue raised by @jeffreyhanson: The issue with tracing bugs will still exist even when changing to the str2lang function and the general security concerns associated with the use eval(parse()) can probably not be solved with replacing the parse function by str2lang as str2lang is only a special case of parse.
The concerns raised are associated with the combination of eval(parse()) and not with the use parse(). (btw: when you removed str2lang in this function, you only removed it on line 93 but not within the else statement on line 101).
I think working along the lines laid out by @jeffreyhanson (i.e. using expression and substitute) would be a good option. Another option would be to create internal functions for all forms of the equations in dfequation and then apply these.

Documentation
I think the vignette should be improved. It would be great if you could provide some more information to the user especially on the estimation of the parameters in est_param and what effect the weighting in weight_allom has on the result.
As @jeffreyhanson already pointed out here, consistently using code formatting throughout the documentation would be great (e.g., “numeric” in the description of the resample_agb function could be change to numeric).

Naming conventions
The use of the package could be improved if you consistently order the arguments and stick to one naming convention throughout the package. sitespecies does not follow the naming convention you used for the datasets in the package - all other datasets are named following the convention name1_name2. This applies for the dataset koeppenMatrix as well.

Performance
The calculation of the biomass is relatively slow if for every tree a unique coordinate is provided - i guess this is caused by the subset operations performed in the weight_allom function. Given that a user might provide individual coordinates for each tree on a site and the coarse resolution of the koeppen classification, it might be an option to first check if the coordinates vary in such a way that these need to be subset individually or if a single coordinate can be used.

Such performance considerations are necessary, as the calculations are rather slow at the moment. For example, the biomass calculation for 100 trees with tree level coordinates require around 9s.

system.time(allodb::get_biomass(dbh = c(1:100), genus = rep("Fagus", 100)
                                , species = rep("sylvatica", 100)
                                , coords = as.matrix(cbind(rep(10, 100), sample(seq(48.001, 48.1, 0.0001), 100)))))

##    user  system elapsed 
##    8.81    0.03    8.86

Specific comments

get_biomass
The get_biomass function creates a biomass estimate based on dbh, genus, coords and species. It allows the user to provide a selection of the biomass equations via the argument new_eqtable. This seems to be an important feature of this function. However, if the values provided in the column koppen do not appear in kgc::climatezones, the function throws an error and the error message does not point the user to the reason of the error - see my specific comment on the weight_allom function below. Given the coarse resolution of the kgc::climatezones dataset of 0.5 degrees, this might be problematic especially in mountainous regions with a large altitudinal gradient. Additionally, this is an issue which might prevent users from applying equations that could be applied for the respective region - see my general comment above.

equations_to_use <- allodb::new_equations(subset_taxa = "Fagus")

allodb::get_biomass(dbh = 20
                    , genus =  "Fagus"
                    , species = "sylvatica"
                    , coords = c(10, 46)
                    , new_eqtable = equations_to_use)

## Warning: Unknown or uninitialised column: `equation_id`.

## Error in if (any(nEQ <- vNms != make.names(vNms))) vNms[nEQ] <- paste0("`", : missing value where TRUE/FALSE needed

allodb::get_biomass(dbh = 20
                    , genus =  "Fagus"
                    , species = "sylvatica"
                    , coords = c(10, 46))

## [1] 217.0154

weight_allom

The weight_allom function creates weights for all biomass functions which will later be used in the calculation of the biomass.

Beyond the concerns i already mentioned above, i only have a minor comment on this function: The kgc::climatezones dataset does not cover large bodies of water. The values in dfequation$wE will therefore be -Inf for all coords that are not covered by kgc::climatezones The weight_allom function therefore returns -Inf causing the function to stop without a easily understandable error message. I would appreciate if you either add an error message like if(dfequation$wE == -Inf){stop('Error message')} or if you would just flag the respective values so they are omitted in later calculations.

new_equations
The function new equations allows the user to a) subset to a specific set of equations for one or several taxa and b) to add additional equations. I appreciate the latter option, as this adds a lot of flexibility to the package.

adamhsparks commented 3 years ago

Thanks, @jstillh, for your quick and comprehensive review!

@jeffreyhanson, are you able to provide your review soon? It was due on Monday. Please let me know if you need more time.

jeffreyhanson commented 3 years ago

Oh - I'm so sorry! I mistakenly thought my reivew would be due later since @gonzalezeb was working on updating the package - I'll submit review by the end of the week. I hope that doesn't cause too many problems for you.

adamhsparks commented 3 years ago

@jeffreyhanson, all good, I understand the confusion as I obviously interpreted it to be due on the original date. Take all the time that you need.

jeffreyhanson commented 3 years ago

Package Review

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s) demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples (that run successfully locally) for all exported functions
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Estimated hours spent reviewing: 4

[ ] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

The allodb R package provides a framework for estimating tree biomass based on allometric equations. Overall, my impression is that the package is very well designed. I think the authors have done a fantastic job of providing a highly flexible interface -- allowing users to customize calculations as needed -- that is easy to use. Since I have zero expertise in forestry science, I cannot comment on the novelty, scope, or methodological correctness of the package. As such, I have limited my review to focus on code and documentation.

I previously posted several suggestions for improving the package. Although I am glad that some of these suggestions were used to help improve the package, I would appreciate hearing why some of these suggestions were not incorporated? It's entirely possible that these suggestions were irrelevant, infeasible, or undesireable. For example, is there some particular benefit for not describing classes precisely, or not validating user inputs of which I am unaware? Below I've listed some specific comments. Please let me know if anything I have written is unclear, irrelevant, or incorrect?

Code

I mentioned earlier that I thought it would be helpful to include some code to validate function arguments. From what I can tell, the code hasn't been updated to include such validation - is that correct? I'm sorry, was my previous comment unclear, or did I misunderstand something? To help show what I mean, here's some example code that could be used to validate arguments for the est_params() function. Please note that I'm not saying you have to use this specific code, I'm just trying to provide an example of what the code could look like (e.g. you might prefer to use the checkmate R package instead):

est_params <- function(genus,
                       coords,
                       species = NULL,
                       new_eqtable = NULL,
                       wna = 0.1,
                       w95 = 500,
                       nres = 1e4
) {
# assert arguments are valid
assertthat::assert_that(
  ## genus
  is.character(genus),
  assertthat::noNA(genus),
  ## coords
  is.numeric(coords),
  assertthat::noNA(coords),
  length(coords) == 2
  ### ensure valid longitude
  isTRUE(coords[1] >= -180),
  isTRUE(coords[1] <= 180),
  ### ensure valid latitude
  isTRUE(coords[2] >= -90),
  isTRUE(coords[2] <= 90),
  ## species
  inhrerits(species, c("character", "NULL")),
  ## new_eqtable
  inhrerits(new_eqtable, c("data.frame", "NULL")),
  ## wna
  assertthat::is.number(wna),
  assertthat::noNA(wna),
  ## w95
  assertthat::is.number(w95),
  assertthat::noNA(w95),
  ## nres
  assertthat::is.count(nres),
  assertthat::noNA(nres)
)

I previously suggested that it would be good to avoid using eval(parse(...)) if possible. Looking through the commit history, it seems like an alternative function was attempted (i.e. str2lang(...)) but this was later removed to avoid compatibility issues with older R versions. Thanks for looking into this! I appreciate these efforts of trying to get it working. I was trying to think of alternative strategies to avoid eval(parse(...)), but I think the over-arching issue is that some equivalent is needed because the equations are (ultimately) stored as character objects (e.g. something like "log(dbh) * 5"). I suppose some alternatives would be to update the equations data frame to replace the character objects with function objects, or replacing the equations data frame with a list of purpose-built equation objects that can be used to perform the calculations. However, these seem needlessly complex compared to the current implementation? Maybe this could be one instance where eval(parse(...)) is not bad idea?

To help avoid potential issues with eval(parse(...)), maybe it would be worth (1) writing a function that can validate if a given character is a valid equation or not (it could give a useful error message if the function is not valid), (2) adding this validation function to the other functions that accept equation tables as inputs, and (3) adding unit tests to validate all built-in equation (i.e. specified in the equation table) using this new function? I'm sorry that I can't be of more help.
The package contains functions that output different data frame classes. For example, the resample_agb() function outputs a tibble (well, tbl_df object), and the est_params() function outputs a data.table object. Since these functions output different classes, this means that the user will need to keep remember exactly what data frame class is output from each function -- increasing the cognitive load required to use the package. In other words, this behavior breaks the "law of least surprise". Is there a particular reason for this? I'm sorry if I'm missing something obvious?

To help make the package easier to use, may I suggest picking one data frame class (e.g. data.frame, tibble or data.table) and using that as the basis for all user-facing functions? This way the user can simply remember something like "The allodb works with data.table objects, so I need to make sure my input data are in that format, and convert any outputs from that format as needed". Indeed, if there are certain reasons for using certain data frame classes in certain functions (such as for performance), perhaps different classes could be used internally (e.g. a function gets a data.frame as input, converts it to a data.table for performance, does some processing, and then outputs a data.frame)? What do you think? I'm just trying to help suggest ways to make the package easier to use.
The nls() function used in est_params() is exceptionally useful for fitting nonlinear models. Since it can experience convergence issues (e.g. see here, here, here, and here), it is sometimes necessary to customize the default settings to obtain reliable outputs (e.g. change optimizer, increase number of iterations). Given this, would it be possible to update the est_params() function so that users can customize the optimization parameters (e.g. specify the algorithm and control parameters for nls(), see here for details)? It might also be useful if the function could somehow output information on the performance of the models too (e.g. model goodness of fit statistics)? But maybe I'm missing something and these concerns are not relevant here? What do you think?

Documentation

I mentioned earlier that I thought the documentation could be improved by using more precise terminology to describe function parameters and outputs. For example, using "`numeric` vector" instead of "numerical vector", because "numerical" isn't a class in R). Also, specifying exactly what data frame class (e.g. data.frame or data.table) when referring to R objects (e.g. here). From what I can tell, the documentation hasn't been updated based on this suggestion - is that right? I'm sorry if my previous comment wasn't clear? Or maybe I'm misunderstanding something? If so, could you please explain why the current terminology is preferable?
It would be difficult for me to understand exactly how the get_biomass, est_params, and resample_agb functions work based only on reading the documentation. I wonder if it is possible to provide more details on the underlying calculations in a Details section? This suggestion may not be relevant though, because I might lack the required experience in forestry science to understand the documentation in its current form?
I mentioned earlier that I thought the documentation could be improved by using code tags to specify that certain words correspond to object classes. From what I can tell, it seems these tags are still missing from a few places (e.g. like "NULL" here). Could you please double check the documentation and add in any missing tags? Also, when documenting character objects, I would suggest including quotation marks (e.g. using "all" instead of all here) so it's clear the documentation is referring to a character object.
Have you thought about including more links to help the user navigate the documentation (e.g. here where the documentation refers the reader to another manual entry, or here where the documentation refers to another function)? Perhaps the @seealso section could also be used to help users navigate between related functions?
Have you considered using equation commands (e.g. \eqn{}, deqn{}) in the documentation to format equations (e.g. such as this equation; see here for example)?
It might be helpful to use the \pkg{} command when referring to the name of packages in the documentation (e.g. such as \pkg{allodb} in here)?
Is it possible to explicitly state the type of object in the documentation for each dataset? I appreciate that the documentation says that the objects are a "data frame", but it would be useful to precise what implementation (e.g. data.frame, data.table, table, tibble objects are all of "data frame" objects).
It might be helpful to add example sections to the documentation for the built-in datsets so that the reader can see a preview of each dataset. What do you think? For example, in data.R, you could insert some code like this:
```
#' @examples
#' # preview the dataset
#' print(head(equations))
#'
```
I also noticed some potential typos in the documentation: here, here.

Built-in datsets

I noticed that some of these objects are tibble objects (e.g. equations) and others are data.frame objects (e.g. genus_family). I was wondering if it would be useful to have all built-in datasets represented using the same class (e.g. all as tibble objects)? This is because tibble and data.frame objects have markedly different behaviors -- even though a tibble inherits from a data.frame -- and these differences can cause errors when the user/code expects a different behavior. This is related to the "law of least surprise" that I mentioned earlier.

Minor suggestions

You could replace seq_len(length(...)) with seq_along(...) here.
Perhaps rep.int() could be useful here. For example, would this provide the same output: equation_id <- rep.int(dfsub$equation_id, dfsub$resample)?
I think you could remove the library(allodb) from each test script (e.g. this line)?

gonzalezeb commented 3 years ago

@jstillh and @jeffreyhanson, thank you so much for your thoughtful and kind reviews. Your suggestions will improve the package 100%. I will contact you if we have questions, we truly appreciate your time on this.

adamhsparks commented 3 years ago

Hi @gonzalezeb, just following up, it's been four weeks since the reviews came in. If you are experiencing any difficulties or issues, feel free to flag them with the reviewers or me and we'll do our best to assist you through the process.

gonzalezeb commented 3 years ago

Hi @adamhsparks. Sorry, I am a little behind with this and have solved only minor issues. There are some comments on the conceptual aspects of the package (i.e. here) where I get the point but changing our approach will require some (lot's!) of work, so I am giving it a careful thought. How should I tackle issues like those?

adamhsparks commented 3 years ago

Hi @gonzalezeb, which specific aspect were you linking to and asking for guidance on? Your link only links to the whole issue here, not one of the reviewers' comments or issue that they may have opened in your repository?

adamhsparks commented 3 years ago

Hi @gonzalezeb, do you need more time to work on this? We can put a holding tag on it and revisit it after three months to see where things are for up to a year. Would that be useful to you?

gonzalezeb commented 3 years ago

Hi @adamhsparks, thanks for checking. We have made some substantial progress but I still need to work in some minor stuff. The plan is to finish this before Sep 15 as we want to submit our revised manuscript to MEE also by then. Will that be ok with you all?

adamhsparks commented 3 years ago

Sounds good, Erika! Thanks for updating me.

gonzalezeb commented 3 years ago

Hi @adamhsparks, I wasn’t sure how to respond to the multiples reviews so I hope the following is sufficient. We are very grateful for the thoughtful suggestions by reviewers, and have implemented most.

Initial suggestions by @jeffreyhanson (5/19/21), have been addressed:

The package website is working (https://forestgeo.github.io/allodb/)
Documentation for multiple functions has been improved by using code formatting (eg. using ‘numeric vector’ instead of ‘numerical vector’, added (tibble::tibble() object), etc).
Typos have been corrected
Unnecessary R and non-R script files have been removed (eg. R/sysdata.rda)
We agree with the reviewer. If a user passes a bad argument, it is best to fail fast with an informative message so that the user can quickly correct their mistake and move on. But the number of possible mistakes may be infinite; so we have particularly focused on adding an internal function validate_equations()to tests that all equations in equations$equation_allometry are a valid R expression of the dbh or h variables (since this could be one of the main mistakes on calculating biomass). This function can be called from inside any function within allodb and allows users to pass a character vector of equations. We also added consistency checks in the get_biomass function of dbh and coordinates provided by the user.
Unused Travis badge in REAMDE has been removed
The .buildignore folder has been removed
We changed set.seed with withr::with_seed in the resample_agb function.
About eval(parse(…)), see next comment.
Parameters for many functions have been reordered.

Comprehensive review by @jeffreyhanson (6/17/21), has also been addressed: About code:

See note number 5 above. If the suggested change is truly necessary (validation of function arguments), then we may take the time to implement.
All dataframes now have the tibble subclass. About documentation:
Documentation has been improved for all functions (specially the get_biomass and weight_allom functions), added @seealso to help users navigate btw documentations, use quotation marks for character vectors, added example sections to the documentation for the built-in datasets, etc.
We are now using \pkg{allodb} in documentation when referring to this package, and equations commands (eg. \deqn{}).
Removed library(allodb) from all test scripts.

Suggestions by @jstillh (6/14/21), have been addressed:

Documentation for both the get_biomass and weight_allom core functions have been improved so the weighting concept is more transparent, specifically we give more details in the Details section. Regarding adding additional outputs to the get_biomass function, we think that would make the function too slow.
We improved the error message within the weight_allom function (i.e. now it deals with coordinates that are not associated with any koppen climate zone) to address the specific comments on the fact that the get_biomass function throws an error when the equation table does not contain equations corresponding to the Koppen climate at the provided coordinates (argument coords). The proposed solution is to change koppenMatrix, replacing climate combinations with weight we=0 (2 completely different climates) to we = 1e-6, similar to what is done with the taxonomic weight. This way, equations with we = 1e-6 will only be significantly used in the recalibrated equations when no better climate combination is found, therefore avoid getting an error. In regards to Forrester et al 2017 equations, we revised them and now many coordinates correspond to max or min values given in original publication (not the mean except for one site).
Vignette has been improved.
About eval(parse(…)), also see next comment.

gonzalezeb commented 3 years ago

The reviewers advice to avoid the call eval(parse(text = ...)) (item 9 here, and “Code and documentation” here), mainly because (a) it can be unsafe, and (b) error prone.

Instead they suggest storing the equations as objects such as expressions or functions, and manipulate them with substitute().

We thank the advice and, after considering it carefully, we would like to share our updated opinion. We first discuss the problems of eval(parse()) and then the alternative solutions.

To make our discussion concrete we offer some code examples.

library(dplyr, warn.conflicts = FALSE)
library(allodb)

tl;dr

The use of eval(parse()) as its problems seem to be balanced by the benefits we see in this particular use case, and the alternative solutions seem too complex. This is in line with @jeffreyhanson’s comment on on Jun 17:

I previously suggested that it would be good to avoid using eval(parse(...)) if possible. (…) However, [the alternatives] seem needlessly complex compared to the current implementation? Maybe this could be one instance where eval(parse(...)) is not bad idea?

We are now testing the equations are correct or will trigger an informative error message.

Unsafe

Online we found that parsing text would be a risk if we hosted an application on a server we own and allow users to input arbitrary text. A malicious user could damage our server.

But this is not our case. The allodb package is to be used as an analysis tool, directly on the users’s computers. And the input text comes mainly not from users but from a dataset we built.

Error prone

We acknowledge that bugs can hide in the unevaluated text, as invalid code is still a valid text object.

valid <- "dbh + 1"
eval(parse(text = valid), envir = list(dbh = 1))
#> [1] 2

invalid <- "1 <- dbh"
try(eval(parse(text = invalid), envir = list(dbh = 1)))
#> Error in 1 <- dbh : invalid (do_set) left-hand side to assignment

But bugs can also hide in unevaluated expressions. Using substitute() to manipulate expressions does not ensure an expression is valid.

invalid <- quote(1 <- dbh)
substitute(x, list(x = invalid))
#> 1 <- dbh

try(eval(substitute(x, list(x = invalid))))
#> Error in 1 <- dbh : invalid (do_set) left-hand side to assignment

We are comfortable storing and manipulating text. We can inspect equations stored as text directly from the dataframe, and we can manipulate them with familiar tools.

# Storing text
equation <- "1.2927 * (dbh^2)^1.36723"
dbh <- "(sampled_dbh * 0.393701)"

# + We can inspect the equations directly from the dataframe
tibble(equation, dbh)
#> # A tibble: 1 x 2
#>   equation                 dbh                     
#>   <chr>                    <chr>                   
#> 1 1.2927 * (dbh^2)^1.36723 (sampled_dbh * 0.393701)

# - To manipulate the equations we use familiar tools
modified <- gsub("dbh", dbh, equation)
modified
#> [1] "1.2927 * ((sampled_dbh * 0.393701)^2)^1.36723"

eval(parse(text = modified), list(sampled_dbh = 1))
#> [1] 0.1010408

In contrast, we are not comfortable storing or manipulating expressions with substitute(). We found great help in the first edition of Advanced R but we still think the approach is difficult to understand, which makes us more likely to introduce bugs than with eval(parse()).

# We need to create an escape hatch because substitute() doesn't work if we
# already have an expression saved in a variable
# http://adv-r.had.co.nz/Computing-on-the-language.html#substitute
substitute_q <- function(x, env) {
  call <- substitute(substitute(y, env), list(y = x))
  eval(call)
}

equation <- quote(1.2927 * (dbh^2)^1.36723)
dbh <- quote((sampled_dbh * 0.393701))

# - We can't inspect the equations directly from the dataframe :-(
tibble(equation = list(equation), dbh = list(dbh))
#> # A tibble: 1 x 2
#>   equation   dbh       
#>   <list>     <list>    
#> 1 <language> <language>

# - To manipulate the equations we need unfamiliar tools
modified <- substitute_q(equation, list(dbh = dbh))
modified
#> 1.2927 * ((sampled_dbh * 0.393701)^2)^1.36723

eval(modified, list(sampled_dbh = 1))
#> [1] 0.1010408

As @jeffreyhanson advised, instead of using expressions and substitute() we prefer to ensure all text-equations are valid or throw an informative error message.


validate_equation <- function(text, envir = parent.frame()) {
  tryCatch(
    eval(parse(text = text), envir = envir),
    error = function(e) {
      stop("`text` must be valid R code.\nx Invalid: ", text, call. = FALSE)
    }
  )

  invisible(text)
}

dbh <- 99
validate_equation(valid)
try(validate_equation(invalid))
#> Error : `text` must be valid R code.
#> x Invalid: <-1dbh

f <- function(text) {
  validate_equation(text, envir = list(dbh = 1))
  # Some useful code

  text
}

f(valid)
#> [1] "dbh + 1"
try(f(invalid))
#> Error : `text` must be valid R code.
#> x Invalid: <-1dbh

adamhsparks commented 2 years ago

@gonzalezeb, thank you for your comprehensive overview of the changes that you have made to the allodb package.

@jeffreyhanson and @jstillh could you please let me know if you are satisfied with the changes to the package or have any further suggestions that may improve it?

jeffreyhanson commented 2 years ago

Awesome - thank you very much @gonzalezeb. @adamhsparks, I'll try and take get back to you next week.

jstillh commented 2 years ago

@gonzalezeb thanks a lot for the overview of the changes - i'll try to go through them in the next week as well and get back to you, @adamhsparks .

jeffreyhanson commented 2 years ago

Thank you @gonzalezeb for providing such detailed responses to the points I raised earlier. I especially found your discussion on the costs and benefits of eval(parse(...)) really useful. I think the R package is looking absolutely brilliant! Most of the points I raised earlier have been addressed. I've made a couple of suggestions below that I think could help improve package.¹ All things considered, these are pretty minor - just some documentation tweaks. If those could be addressed, I'd say that all my suggestions have been satisfied. I've also listed several additional minor comments too. Based on my understanding of rOpenSci guidelines, they won't/shouldn't influence whether the package should be accepted or not (but please correct me if I am wrong). I've listed them just to be thorough, and in case the package authors think they are worth addressing.

Suggestions

Package checks throw the following warning on my computer (i.e. running devtools::check()). Could you please fix it? Let me know if you're unable to reproduce this error and I can try to provide more information?

Undocumented code objects: ‘equations’ Undocumented data sets: ‘equations’ All user-level objects in a package should have documentation entries. See chapter ‘Writing R documentation files’ in the ‘Writing R Extensions’ manual.
Thank you very much for looking into potential issues with using eval(parse(...)) and potential alternatives. I agree that the main issue would be if the code was run inside a web application that allowed individuals to enter in arbitrary text. Although additional issues could arise on standard desktop/laptop computers too (e.g. a malicious actor could potentially update the library files), I guess at this point malicious actors would have easier ways to accomplish their goals then modifying R library files. So, given the costs and benefits outlined by @gonzalezeb, I think it's perfectly reasonable for the package to use eval(paste(...)). Maybe it would be worth adding some warning text to the documentation for the function(s) that use eval(parse(...)) so that users are aware of potential issues? This way it is clearly the responsibility of the user to ensure that they take appropriate steps to avoid security issues.²

Also, I noticed one of the links provided by @gonzalezeb mentioned that RAppArmor::eval.secure() can prevent system calls with administrator privileges. If it's not too much hassle, would it be worth using that where possible? It seems that this R package is only available on Linux though - so perhaps it's not worth the hassle (because then some additional code would be needed to conditionally use RAppArmor::eval.secure() if it's available, and otherwise use eval(parse(...))). I just thought I'd mention it just in case.
The README.md file might need re-knitting because the output displayed for est_params() is a data.table object (instead of a tibble() object per the current version).

Minor comments

The documentation doesn't seem to format code correctly (i.e. using back ticks). For example, NULL is referred to as "NULL" instead of NULL (e.g., see here, or here). Personally, I find syntax formatting helpful when reading documentation (e.g., https://readr.tidyverse.org/reference/read_delim.html). I don't think rOpensci guidelines have style mandates for code in package documentation, so I don't think this is critically important.
The main functions (e.g.est_params(), get_biomass(), resample_agb()) don't contain code to validate all arguments. Although @gonzalezeb mention that "the number of possible mistakes may be infinite", I would suggest that perhaps the number of expectations (e.g. properties or characteristics) for a valid input is much lower. For example, in order to verify that an argument is a single integer (e.g. 5), one does not need to verify that the parameter is (i) not a character, (ii) not a NULL, (iii) not a double, (Inf) etc. Instead, one could verify that it conforms to a set of expectations (e.g. inherits(x, "integer") && identical(length(x), 1) && all(is.finite(x))). This is made much easier using existing R packages developed for this purpose though. However, I don't think rOpensci or CRAN require argument validation for published packages, so I don't think this issue critically important. I'm just making this comment to be thorough :)
This is a super minor suggestion. The equation in the documentation for the est_params() function has the text " with " formatted as a maths term (rather than text), which makes it slightly difficult to read on the package website. To address this, I think you could change this line to define the equation as:
```
\deqn{AGB = a * dbh ^ b + e, \space \mathit{with} \space e ~ N(0, sigma^2)}{AGB = a * dbh ^ b + e, with e ~ N(0, sigma^2)}
```

¹ Please note that I still (unsurprisingly) lack the forestry experience needed to evaluate the novelty and methodology of this package, and so continue to limit my reviews to code quality and documentation.

² Just to be clear, I do think that -- where possible, feasible, and reasonable -- R package developers should ideally -- if only in principle -- minimize security risks, even if software licenses absolve developers of responsibility. It's just I think that eval(paste(..)) is not so terrible here, given the noted trade-offs that are specific to this package. Perhaps other will disagree, and that's fine. Please note that I do not have any experience with cybersecurity. If you want an informed view point on such matters, I suggest you talk to someone else :)

gonzalezeb commented 2 years ago

Thanks @jeffreyhanson, I have adopted most of your comments, they are very helpful!

jstillh commented 2 years ago

Thank you @gonzalezeb for the very detailed response to the reviews. As @jeffreyhanson, I like your detailed response regarding the use of eval(parse(...)). In general, i am very pleased with the revisions and the changes you implemented in the functions. I do not have any more suggestions beyond the ones by @jeffreyhanson.

I appreciate the changes / improvements in the documentation of both the weight_allom and the get_biomass function and that you implemented an informative error message if coordinates are not covered by the koppen data.
Your proposition regarding the weighting of the climatic combinations sounds like a good idea.

Thanks for this contribution.

ropensci / software-review