ropensci / software-review

rOpenSci Software Peer Review.
286 stars 104 forks source link

fellingdateR: Estimate, report and combine felling dates of historical tree-ring series #618

Closed hanecakr closed 2 months ago

hanecakr commented 7 months ago

Date accepted: 2024-04-08

Submitting Author Name: Kristof Haneca Submitting Author Github Handle: !--author1-->@hanecakr<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) Repository: https://github.com/hanecakr/fellingdateR Version submitted: Submission type: Standard Editor: !--editor-->@maelle<!--end-editor-- Reviewers: @njtierney, @ajpelu

Archive: TBD Version accepted: TBD Language: en


Package: fellingdateR
Title: Estimate, report and combine felling dates of historical tree-ring 
    series
Version: 0.0.0.9003
Authors@R: c(
    person("Kristof", "Haneca", , "Kristof.Haneca@vlaanderen.be",role = c("aut", "cre"),
        comment = c(ORCID = "0000-0002-7719-8305")),
    person("Koen", "Van Daele", role = "ctb",
        comment = c(ORCID = "0000-0002-8153-2978")),
    person("Ronald", "Visser", role = "ctb",
        comment = c(ORCID = "0000-0001-6966-1729"))    
    )
Description: `fellingdateR` is an R package that aims to facilitate the 
  analysis and interpretation of tree-ring data from wooden 
  cultural heritage objects and structures. The package standarizes the process
  of computing and combining felling date estimates, both for individual and 
  groups of related tree-ring series.
URL: https://github.com/hanecakr/fellingdateR
BugReports: https://github.com/hanecakr/fellingdateR/issues
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Depends: 
    R (>= 2.10)
LazyData: true
Imports: 
    dplyr,
    ggplot2,
    ggtext,
    MASS,
    plyr,
    tidyr,
    utils,
    dplR
Suggests: 
    covr,
    knitr,
    rmarkdown,
    testthat (>= 3.0.0)
Config/testthat/edition: 3
VignetteBuilder: knitr

Scope

The package aims to facilitate and standardize the computation of felling dates from dated tree-ring series. It covers all steps of processing measured ring-width series from raw data to the reporting of felling date estimates, for single series and for groups of related tree-ring series.

Maily professional tree-ring scientists (dendrochronologists) that work on historical timbers and wooden objects (archaeology, architectural history, sculptures, panel paintings, etc.)

No, no other R-packages do the same job.

yes

All checks pass

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

ropensci-review-bot commented 7 months ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 7 months ago

:rocket:

The following problem was found in your submission template:

:wave:

mpadge commented 7 months ago

@hanecakr The error was because your URL field was markdown-style: "[url](url)", and should just be plain "url". I've edited to fix; you should now be able to call @ropensci-review-bot check package yourself to restart checks. Sorry for any inconvencience.

hanecakr commented 7 months ago

@ropensci-review-bot check package

ropensci-review-bot commented 7 months ago

Thanks, about to send the query.

ropensci-review-bot commented 7 months ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 7 months ago

Checks for fellingdateR (v0.0.0.9003)

git hash: 05c449fb

(Checks marked with :eyes: may be optionally addressed.)

Package License: MIT + file LICENSE


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:------------|------:| |internal |base | 363| |internal |stats | 33| |internal |fellingdateR | 26| |internal |grDevices | 7| |internal |graphics | 2| |imports |ggplot2 | 32| |imports |utils | 15| |imports |dplyr | 6| |imports |ggtext | 3| |imports |plyr | 2| |imports |tidyr | 2| |imports |dplR | 2| |imports |MASS | 1| |suggests |covr | NA| |suggests |knitr | NA| |suggests |rmarkdown | NA| |suggests |testthat | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

base

sub (32), rep (31), length (21), c (17), nrow (17), apply (15), matrix (15), max (14), ncol (13), data.frame (10), seq (10), which (10), range (9), for (8), paste0 (8), as.data.frame (7), attributes (7), numeric (7), as.numeric (6), by (6), drop (5), is.na (5), mean (5), min (5), list (4), summary (4), diff (3), floor (3), format (3), rownames (3), setdiff (3), sum (3), T (3), unique (3), any (2), character (2), get (2), grep (2), lapply (2), names (2), round (2), sapply (2), scale (2), seq_len (2), abs (1), all (1), as.character (1), as.integer (1), ceiling (1), colnames (1), colSums (1), complex (1), cumsum (1), dim (1), file.exists (1), gettext (1), Im (1), intersect (1), lengths (1), logical (1), match (1), merge (1), message (1), Re (1), readLines (1), regexpr (1), seq_along (1), sort (1), sqrt (1), substring (1), table (1), vapply (1)

stats

df (18), spline (7), end (3), sd (2), start (2), sigma (1)

ggplot2

element_blank (13), element_text (5), ggplot (3), aes (2), arrow (2), geom_line (2), unit (2), geom_area (1), scale_x_continuous (1), theme (1)

fellingdateR

hdi (7), sw_model (6), sw_data_overview (3), d.count (2), d.dens (2), fd_report (2), sw_interval (2), strip.comment (1), sw_combine_plot (1)

utils

data (13), citation (1), tail (1)

grDevices

pdf (7)

dplyr

left_join (1), mutate (1), pull (1), relocate (1), select (1), summarize (1)

ggtext

element_markdown (3)

dplR

rwl.stats (2)

graphics

title (2)

plyr

round_any (2)

tidyr

pivot_longer (2)

MASS

fitdistr (1)


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 16 files) and - 1 authors - 1 vignette - 13 internal data files - 8 imported packages - 15 exported functions (median 80 lines of code) - 21 non-exported functions in R (median 50 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 16| 74.9| | |files_vignettes | 1| 68.4| | |files_tests | 14| 93.8| | |loc_R | 2855| 89.8| | |loc_vignettes | 193| 48.1| | |loc_tests | 480| 74.1| | |num_vignettes | 1| 64.8| | |data_size_total | 6023| 68.2| | |data_size_median | 465| 60.0| | |n_fns_r | 36| 45.9| | |n_fns_r_exported | 15| 58.5| | |n_fns_r_not_exported | 21| 42.1| | |n_fns_per_file_r | 1| 20.7| | |num_params_per_fn | 4| 67.3| | |loc_per_fn_r | 65| 93.5| | |loc_per_fn_r_exp | 80| 89.2| | |loc_per_fn_r_not_exp | 50| 90.3| | |rel_whitespace_R | 11| 80.9| | |rel_whitespace_vignettes | 68| 73.3| | |rel_whitespace_tests | 11| 58.1| | |doclines_per_fn_exp | 36| 43.7| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 30| 55.5| | ---

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check.yaml](https://github.com/hanecakr/fellingDateR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/hanecakr/fellingdateR/actions) [![pkgcheck](https://github.com/hanecakr/fellingDateR/workflows/pkgcheck/badge.svg)](https://github.com/hanecakr/fellingdateR/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:-------------|:----------|:------|----------:|:----------| | 6932735657|pkgcheck |success |05c449 | 5|2023-11-20 | | 6932735666|R-CMD-check |success |05c449 | 9|2023-11-20 | | 6932735656|test-coverage |success |05c449 | 12|2023-11-20 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following note: 1. checking data for non-ASCII characters ... NOTE Note: found 11 marked UTF-8 strings R CMD check generated the following check_fails: 1. cyclocomp 2. rcmdcheck_non_ascii_characters_in_data #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 77.6 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- read_fh | 150 fd_report | 42 sw_sum | 32 cor_table | 31 sw_combine | 27 sw_interval | 17 movAv | 16 sw_combine_plot | 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 170 potential issues: message | number of times --- | --- Avoid 1:length(...) expressions, use seq_len. | 6 Avoid library() and require() calls in packages | 1 Avoid using sapply, consider vapply instead, that's type safe | 2 Lines should not be more than 80 characters. | 154 Use <-, not =, for assignment. | 7


4. Other Checks

Details of other checks (click to open)

:heavy_multiplication_x: The following 3 function names are duplicated in other packages: - - `get_header` from eventr - - `hdi` from bayestestR, ggdist, hdi, HDInterval - - `movAv` from berryFunctions


Package Versions

|package |version | |:--------|:--------| |pkgstats |0.1.3.9 | |pkgcheck |0.1.2.11 |


Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

jhollist commented 7 months ago

@hanecakr Thank you for the submission. All looks good to me. Will pass on the handling editor momentarily.

jhollist commented 7 months ago

@ropensci-review-bot assign @maelle as editor

ropensci-review-bot commented 7 months ago

Assigned! @maelle is now the editor

maelle commented 7 months ago

Thanks for your submission @hanecakr! I have some comments before I can proceed with the reviewer search. :deciduous_tree:

Happy to discuss the comments below! :smile_cat:

Editor checks:

Editor comments

Installation instructions

For brevity you can keep only the installation instructions with pak.

Dependencies

I wonder whether the dependency on plyr for very few functions could be avoided?

Metadata

I don't think it's allowed to have backticks in a DESCRIPTION file, I see some in the Description field. Or if it's allowed, at least it's unusual?

Docs

I'd recommend creating a pkgdown website. In the vignette and README it'd make sense to remove the description of functions and datasets, instead pointing to the reference index of the pkgdown website. It'd make it easier to skim the README, and to see what functions/datasets there are at once (for reviewers but also users!).

Data

I see the example datasets are saved as internal datasets however you use them in the documentation (in the README) so why not make them exported datasets? https://r-pkgs.org/data.html#sec-data-data

Rather than using the names with "dummy" I'd recommend informative names. (Beside, some might consider "dummy" to be an offensive word related to ableism).

 Tests

I see some top-level code in for instance https://github.com/hanecakr/fellingdateR/blob/master/tests/testthat/test-fd_report.R I'd recommend instead having a function defined in a helper file like tests/testthat/helper-testdata

test_data <- function() {
data.frame(series = c("aaa", "bbb", "ccc", "no_last", "no_sapwood"),
                       n_sapwood = c(10, 11, 12, 10, NA),
                       waneyedge = c(FALSE, FALSE, TRUE, FALSE, FALSE),
                       last = c(123, 456, 1789, NA, 1978))
}

that you'd then call from each test (not each test file) to create the data. Even better, it could have a more informative name. For more context: https://blog.r-hub.io/2020/11/18/testthat-utility-belt/#code-called-in-your-tests and https://r-pkgs.org/testing-advanced.html#sec-testing-advanced-fixture-helper With such a strategy, each test is easier to read and to run, as it's self-contained.

I think some test lines could be more condensed, e.g. https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/tests/testthat/test-fd_report.R#L105 could go on the previous line, and https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/tests/testthat/test-fd_report.R#L96-L98 or https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/tests/testthat/test-fd_report.R#L120-L122 might be a single line, to see more of the test file at once on the screen?

You shouldn't namespace fellingdateR::: data in test files.

For test-sw_combine_plot.R you might want to look into snapshot testing rather than using ggplot2's internal data structure: https://testthat.r-lib.org/articles/snapshotting.html#whole-file-snapshotting (other relevant source: https://www.tidyverse.org/blog/2022/09/playing-on-the-same-team-as-your-dependecy/#testing-testing)

Why is there a test that is commented out? https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/tests/testthat/test-sw_interval.R#L2 (as opposed to, say, completely removed)

Instead of the regular expressions for testing errors, you might be interested in this approach of classed errors: https://www.mm218.dev/posts/2023-11-07-classed-errors/

Code

Please read the output from lintr in the automatic checks above, and fix accordingly (or update this thread to explain why you disagree).

Lines such as https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/R/cor_table.R#L117 should use inherits()

if (!inherits(x, "rwl")) {

In for instance cor_table.R, I think the code might be more readable with explaining variables. For instance

if (!all(diff(as.numeric(row.names(x))) == 1)) {

would be

increasing_consecutive_years <- (all(diff(as.numeric(row.names(x))) == 1))
if (!increasing_consecutive_years) {

Same comments as for test files, I think there could be fewer new lines in scripts such as https://github.com/hanecakr/fellingdateR/blob/master/R/cor_table.R, to facilitate reading / vertical scrolling. I guess this conflicts a bit with the huge indentations in the files (which you are obviously allowed to prefer!).

For the else if in https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/R/sw_interval.R#L236, why not use the switch() function? I must confess I like using it but have to look up the docs every time to remember how to input the arguments. But it'll decrease the complexity of the current logic.

In https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/R/sw_interval_plot.R#L137 what warnings are suppressed? In the same post mentioned previously https://www.mm218.dev/posts/2023-11-07-classed-errors/, it's shown that one can selectively suppress warnings of a given class.


hanecakr commented 7 months ago

Thanks a lot @maelle for this first round of comments. Will try to tackle most of these issues in the next few days.

hanecakr commented 7 months ago

Dear @maelle. I've been working on the package and think I'm able now to address most of your comments. Thanks a lot for your time, valuable insights and recommendations!

I'll list the answers to your comments here:

Installation instructions

For brevity you can keep only the installation instructions with pak.

Dependencies

I wonder whether the dependency on plyr for very few functions could be avoided?

Metadata

I don't think it's allowed to have backticks in a DESCRIPTION file, I see some in the Description field. Or if it's allowed, at least it's unusual?

Docs

I'd recommend creating a pkgdown website.

Data

I see the example datasets are saved as internal datasets however you use them in the documentation (in the README) so why not make them exported datasets? https://r-pkgs.org/data.html#sec-data-data

Rather than using the names with "dummy" I'd recommend informative names. (Beside, some might consider "dummy" to be an offensive word related to ableism).

Tests

I see some top-level code in for instance https://github.com/hanecakr/fellingdateR/blob/master/tests/testthat/test-fd_report.R I'd recommend instead having a function defined in a helper file like tests/testthat/helper-testdata ...

I think some test lines could be more condensed, e.g. ...

You shouldn't namespace fellingdateR::: data in test files.

For test-sw_combine_plot.R you might want to look into snapshot testing rather than using ggplot2's internal data structure: https://testthat.r-lib.org/articles/snapshotting.html#whole-file-snapshotting (other relevant source: https://www.tidyverse.org/blog/2022/09/playing-on-the-same-team-as-your-dependecy/#testing-testing)

Why is there a test that is commented out? https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/tests/testthat/test-sw_interval.R#L2 (as opposed to, say, completely removed)

Instead of the regular expressions for testing errors, you might be interested in this approach of classed errors: https://www.mm218.dev/posts/2023-11-07-classed-errors/

Code

Please read the output from lintr in the automatic checks above, and fix accordingly (or update this thread to explain why you disagree).

Lines such as https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/R/cor_table.R#L117 should use inherits() if (!inherits(x, "rwl")) {

In for instance cor_table.R, I think the code might be more readable with explaining variables. For instance if (!all(diff(as.numeric(row.names(x))) == 1)) { would be increasing_consecutive_years <- (all(diff(as.numeric(row.names(x))) == 1)) if (!increasing_consecutive_years) {

Same comments as for test files, I think there could be fewer new lines in scripts such as https://github.com/hanecakr/fellingdateR/blob/master/R/cor_table.R, to facilitate reading / vertical scrolling. I guess this conflicts a bit with the huge indentations in the files (which you are obviously allowed to prefer!).

For the else if in https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/R/sw_interval.R#L236, why not use the switch() function? I must confess I like using it but have to look up the docs every time to remember how to input the arguments. But it'll decrease the complexity of the current logic.

In https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/R/sw_interval_plot.R#L137 what warnings are suppressed? In the same post mentioned previously https://www.mm218.dev/posts/2023-11-07-classed-errors/, it's shown that one can selectively suppress warnings of a given class.

 Warning: Removed 52 rows containing non-finite values (stat_align()).  Warning: Removed 43 rows containing missing values (geom_line()).

I'm sure there are more elegant and effective ways to improve the scripts, functions and their documentation, but I hope that these initial revisions have elevated the quality of the package sufficiently, making it amenable for the review process. Of course, I'm open for additional insights and recommendations to further improved the quality and performance of the package.

-- Kristof

maelle commented 7 months ago

Thanks a ton @hanecakr!

A small note, in GitHub Markdown if you type - [x] the checkbox appears as a ticked checkbox, whereas - [x ] does not.

In the README when mentioning the reference index, you could rephrase the link so as not to use "here" (https://webaccess.berkeley.edu/ask-pecan/click-here), for instance "You can find the [list of functions]".

When I would make the made-up date also external, that would mix real-world data and made-up data. And would also complicate the code for sw_data_overview(). Therefore I would like to keep the made-up data for examples and tests as internal data.

Ok, but in that case would it work to not load sysdata (which might look confusing) but instead use data(dataname, package = "fellingdateR")? Also to avoid using ::: when using the data in examples.

Should have done that before submission…

There's already a lot of stuff to do before submission!

Don't hesitate to ask if you need help with snapshot testing, but it's maybe not useful here anyway.

A small non compulsory thing, I see your default branch is named master, you could change it to the new community standard "main" using the usethis function for instance. See also https://www.tidyverse.org/blog/2021/10/renaming-default-branch/

Another Git thing, your repository contains the .Rproj.user folder, should it be removed and added to .gitignore? See https://usethis.r-lib.org/reference/git_vaccinate.html for a handy way to gitignore it globally.

I remember I tried this before, but failed to implement switch() for this function. Would prefer to leave it as it is now.

Fair enough! Happy to try and make a PR if you change your mind (maybe this time I'd remember how to use it straight away :joy: )

I'll now look for reviewers! Thanks again for all your work!

maelle commented 7 months ago

@ropensci-review-bot seeking reviewers

ropensci-review-bot commented 7 months ago

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/618_status.svg)](https://github.com/ropensci/software-review/issues/618)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

hanecakr commented 7 months ago

@maelle Thanks a lot! Meanwhile, I will change the branch name to main, use the data(...) function to load internal data, and have a look at the .Rproj.user folder and try to find a way to add it to .gitignore. The badge will be added to README soon.

maelle commented 7 months ago

@ropensci-review-bot add @njtierney to reviewers

ropensci-review-bot commented 7 months ago

@njtierney added to the reviewers list. Review due date is 2023-12-22. Thanks @njtierney for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot commented 7 months ago

@njtierney: If you haven't done so, please fill this form for us to update our reviewers records.

maelle commented 7 months ago

Thank you @njtierney for accepting to review this package!

hanecakr commented 7 months ago

@njtierney: with a little delay, I decided to follow the advice of @maelle and moved the example datasets from internal to external data, so they become easily available for testing and examples, both for end-uses as during code review. The latest commit implements the necessary (small) changes. Looking forward to your comments and suggestions. -- Kristof.

njtierney commented 6 months ago

Thanks, team! I've got a few deadlines this week but will take a look next week and aim to have this done by the 22nd.

maelle commented 6 months ago

Thank you @njtierney and good luck with the deadlines!

maelle commented 6 months ago

@ropensci-review-bot add @ajpelu to reviewers

ropensci-review-bot commented 6 months ago

@ajpelu added to the reviewers list. Review due date is 2023-12-30. Thanks @ajpelu for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot commented 6 months ago

@ajpelu: If you haven't done so, please fill this form for us to update our reviewers records.

maelle commented 6 months ago

Thank you @ajpelu for accepting to review this package!

ropensci-review-bot commented 6 months ago

:calendar: @njtierney you have 2 days left before the due date for your review (2023-12-22).

njtierney commented 6 months ago

Hi team!

I'm still working through this review. I've done a first pass, but I am just finishing up collecting together my points and polishing up the review so it can be easy to follow. Should be done later this afternoon! :)

njtierney commented 6 months ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

The opening paragraphs of the README are good, and I think that this R package solves a challenging problem, so firstly, well done! I think could be made a little bit clearer in terms of the problem it solves, and the input it takes. While I find the photos useful, it initially made me think that this software takes images as input. I would suggest something more like what is in the vignette to start:

fellingdateR offers a set of functions that assist in inferring felling date estimates from dated tree-ring series.

Then, describe the problem you want to solve, which I think is estimating when the timber was cut down. Then show the data, explain what the columns mean, and how this might be a typical example of dated tree-ring series data.

Then show a short example of the output, clearly demonstrating the problem the package solves.

The rest of the first paragraph:

The presence of (partially) preserved sapwood or waney edge allows to estimate a range for the actual felling date, for individual series as well as for a group of timbers. Furthermore, an additional function provides a tool to sum sapwood probability distributions, comparable to ‘summed probability densities’ commonly applied to sets of radiocarbon (14C) dates.

Is important, but I think could go into more of a methods/general introduction part of the README, perhaps further down.

I’m not sure what the images show me, and so to communicate this effectively I think they should contain a caption.

I think the target audience could be more clearly stated in the README. Perhaps at the end of the first paragraph.

All installed well for me!

It did run successfully locally! T and F should be specified as TRUE and FALSE.

The examples ran without error, using:

devtools::run_examples()

There are no community guidelines in the README, I see them in the file: .github/CONTRIBUTING.md, but these are not linked to in the README. Once these are linked, e.g., by writing something like:

## Code of Conduct

Please note that the visdat project is released with a [Contributor Code of Conduct](https://github.com/hanecakr/fellingdateR/blob/main/.github/CONTRIBUTING.md). By contributing to this project, you agree to abide by its terms.

Functionality

All tests pass - unit tests seem quite good coverage, evaluated using devtools::test_coverage().

Estimated hours spent reviewing: 5


Review Comments

I wanted to open by saying that while I have a lot of feedback, I think that this is a great piece of software that helps solve a tough problem, so well done on the author for writing this! I hope that the feedback is useful 😄 . Please let me know if something is not clear or if you need help implementing these, or further information. Thank you for submitting this software, I enjoyed reviewing it.

General comments

There are a fair few examples from the rOpenSci packaging guide, which I don’t think are followed, I have gone through the guide and written some examples here. After the author makes these changes, I would recommend they double check the guide.

  ✖ write short and simple
    functions. These functions
    have high cyclomatic
    complexity (>50): read_fh
    (150). You can make them
    easier to reason about by
    encapsulating distinct steps
    of your function into
    subfunctions.
  ✖ use '<-' for
    assignment instead of '='.
    '<-' is the standard, and R
    users and developers are used
    it and it is easier to read
    your code for them if you use
    '<-'.
  ✖ avoid long code lines,
    it is bad for readability.
    Also, many people prefer
    editor windows that are about
    80 characters wide. Try make
    your lines shorter than 80
    characters
  ✖ avoid sapply(), it is
    not type safe. It might return
    a vector, or a list, depending
    on the input data. Consider
    using vapply() instead.
  ✖ avoid 1:length(...),
    1:nrow(...), 1:ncol(...),
    1:NROW(...) and 1:NCOL(...)
    expressions. They are error
    prone and result 1:0 if the
    expression on the right hand
    side is zero. Use seq_len() or
    seq_along() instead.
  ✖ avoid 'T' and 'F', as
    they are just variables which
    are set to the logicals 'TRUE'
    and 'FALSE' by default, but
    are not reserved words and
    hence can be overwritten by
    the user.  Hence, one should
    always use 'TRUE' and 'FALSE'
    for the logicals.
if (is.null(y)) {
     y = x
     noRef = TRUE
}
else {
     noRef = FALSE
     y_ori <- y
}

any(pdf_matrix[, 2:length(keycodes) + 1] == 1, na.rm = TRUE))

Input checking

I would recommend writing small helpers for input checking, and considering using cli to help write error messages, as it means you could transform this:

if (!inherits(x, "rwl"))  {
     warning("'x' is not class rwl")
}
if (!inherits(y, "rwl"))  {
     warning("'y' is not class rwl")
}

Into:

warn_if_not_rwl(x)
warn_if_not_rwl(y)

And that code could look like this:

warn_if_not_rwl <- function(x,
                            arg = rlang::caller_arg(x),
                            call = rlang::caller_env()){
     cli::cli_warn(
          c("{arg}' is not of class {.cls rwl}")
     )
}

Similarly,

increasing_consecutive_years <- all(diff(as.numeric(row.names(x))) == 1)
if (!increasing_consecutive_years) {
     stop(
          "The tree-ring series 'x' have/has no consecutive years in increasing order as rownames."
     )
}

Could be written as a function:

check_if_increasing_consecutive_years(x)
check_if_increasing_consecutive_years(y)

Admittedly, I do have a strong preference for writing these types of functions, having written about it recently, but I do think that at least using explaining variables, which you’ve already done in places like:

increasing_consecutive_years <- all(diff(as.numeric(row.names(x))) == 1)

Are a great idea, and there are a few notable places where that would help make the code a bit easier to read, e.g.,

any(
length(min_overlap) != 1 |
!is.numeric(min_overlap) |
min_overlap %% 1 != 0 |
min_overlap < 3
)

cor_table.R

Refactoring values argument of cor_table. There is a lot of input checking for the values argument. I think that things such as :

if ("glk" %in%  values) {

And so on indicate to me that these could be written up as separate functions, which could return a list of their inputs, perhaps. These could then be delivered using switch, which I often forget how to use, but it would be something like:

values_output <- switch(values,
       "glk" = values_glk(inputs),
       "pearson" = values_pearson(inputs))

Examples should demonstrate all types of the inputs for the function arguments.

data.R

I would recommend standardising the dataset names to be all lowercase, so that they are easier to remember. E.g., Sohar_2012_FWE_c becomes: sohar_2012_fwe_c

fd_report.R

I think that fd_report could be renamed felling_report or felling_date_report or similar. While fd is concise, I think it doesn’t help facilitate discoverability of the functions.

Similar to cor_table.R, I think that:

if (!series %in% names(df)) {
      stop("--> 'series' does not exist")
}
if (!last %in% names(df)) {
      stop("--> 'last' does not exist")
}
if (!n_sapwood %in% names(df)) {
      stop("--> 'n_sapwood' does not exist")
}
if (!waneyedge %in% names(df)) {
      stop("--> 'waneyedge' does not exist")
}

Could be rewritten as check_if_variable_exists(). Something like:

check_if_variable_exists <- function(x,
                                     df,
                                     arg = rlang::caller_arg(x),
                                     call = rlang::caller_env()){
     arg_in_data <- x %in% names(df)
     if (!arg_in_data) {
          cli::cli_abort(
               c("{.arg {arg}} does not exist")
          )
     }
}

example_checker <- function(x, 
                            series = "series", 
                            last = "last"){
     check_if_variable_exists(series, x)
     check_if_variable_exists(last, x)
}

example_checker(mtcars, 
                series = "wrong")

## Error in `check_if_variable_exists()`:
## ! `series` does not exist

get_header.R

This function should move the cat message up the top - and should not use cat, instead using one of the cli functions, like cli_abort.

I think you could use structure instead of setting attributes to NULL:

attr(rwl, "row.names") <- NULL
attr(rwl, "po") <- NULL
attr(rwl, "class") <- NULL
attr(rwl, "names") <- NULL

## becomes

rwl <- structure(
          rwl,
          row.names = NULL,
          po = NULL,
          class = NULL,
          names = NULL
     )

Although I think that they are functionally the same, so feel free to ignore!

hdi

This function uses = and <- - suggest sticking to just <-

movAv

I think this starting chunk would be clearer if only if and not else is used.

The stop error can move to the top of this, so we clearly capture if align is not “center” or “right” or “left”. This makes it easier to understand the conditions of error.

if (align == "center") {
     before <- floor((w - 1) / 2)
     after  <- ceiling((w - 1) / 2)
} else if (align == "right") {
     before <- w - 1
     after  <- 0
} else if (align == "left") {
     before <- 0
     after  <- w - 1
} else {
     stop("'align' should be 'center', 'left' or 'right'")
}

I suggest using another explaining variable inside mean:

mean(x[max(0, (i - before)):(i + after)], na.rm = TRUE)

## to something like:

earliest_to_latest <- x[max(0, (i - before)):(i + after)]
mean(earliest_to_latest, na.rm = TRUE)

## or given that this is repeated later
## potentially write this up as a function for reuse?
mean_earliest_latest(x, i, before, after)

As that mean statement is a bit involved to unfurl.

Similarly, the pattern, if (edges == "fill") { and } else if (edges == "nofill") { should be bundled up into a function and applied with switch

read_fh.R

        # NEW: verbose = TRUE, header = FALSE
        inp <- readLines(fname, ok = TRUE, warn = FALSE)
        # NEW: removes empty lines in .fh file
        inp <- inp[nchar(inp) != 0]
        ## Get start and end positions of headers and data blocks
        header.begin <- grep("^HEADER:$", inp)
        # NEW: Quadro => chrono
        # NEW: Double => half chrono
lengths <- numeric(n) # commit Ronald Visser

I have found that moving comments either into documentation or into issues to help track them is helpful, but I appreciate that sometimes it is best to leave them in the code, but just something that might be worth thinking about :)

Tidying up the error messages in this function would make some of these nested if/else clauses easier to understand.

This is a pretty massive function, a bit over 1200 lines of code. I would recommend breaking down the steps inside this into smaller functions, as this will make the code easier to reason with and maintain in the future.

sw_combin_plot.R

This is the first time I’ve seen ############ comment blocks - I’m all for stylistic choices but I am not sure this is needed, especially if this isn’t used in other functions.

I’ve not seen this pattern to avoid R CMD Check notes before

   # to avoid notes in CMD check
   year <-
      p <-
      lower <-
      upper <- COMB <- last <- n_sapwood <- A_i <- agreement <- NULL

My tactic has always been to have a separate definition of these, as answered by Carson Sievert on the posit community paage. I don’t think there’s anything inherently wrong with that, but I could imagine that in some cases this could accidentally erase inputs. Something to be aware of, perhaps?

I am all for using the new base R pipe |> - however you need to update your Depends in your DESCRIPTION like so in order to use it, since it only came out in R 4.1.0:

Depends: 
    R (>= 4.1.0)

This comment should probably live in a github issue or just be removed:

      # NEXT LINE TRIGGERS WARNING
      # Warning message:
      # Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
      # ℹ Please use one dimensional logical vectors instead.
      # ℹ The deprecated feature was likely used in the fellingdateR package.
      # Please report the issue to the authors.
      # { if (nrow(summary |> dplyr::filter(agreement == "poor")) != 0)
      # replaced by:

sw_combine.R

This error should check each of the conditions separately - either it has missing values, or it is not numeric.

if (any(is.na(endDate)) | !is.numeric(endDate)) {
     stop(
          "--> Please check the column with 'end dates'.
Some values are possibly missing or the values are not numeric"
     )
}

sw_data_info.R

I think these error messages would benefit from using cli, as discussed above.

sw_data_overview.R

This is a nice function to include to facilitate data discovery

sw_interval_plot.R

This code

if (all(
     !(attributes(x)$names) %in% c(
          "year",
          "n_sapwood",
          "p")
))
     stop("Input differs from output sw_interval()")

Could be rewritten as an error function or the condition in if could be expressed as a function.

sw_interval.R

In the final line of documentation for this function there is a hanging sentence:

#' @return Depends on the value of `hdi`.
#'
#'  * If `hdi = TRUE`, a `numeric vector` reporting the upper and lower limit
#'   of the hdi (attributes provide more detail on `credMass` and the applied
#'   sapwood model (`sw_data`)).
#'  * If `hdi = FALSE`, a `matrix` with scaled p values for each number of
#'   observed sapwood rings. This matrix

sw_model.R

Great to see input checking at the top of the function - I do think these should be rewritten as check input functions.

Helper function d.count I think should be put into a separate R file called utils.R or helpers.R

d.count should use switch pattern and pass functions rather than using if controls.

d.count should be d_count

sw_sum_plot.R

indentation in this code is not consistent - recommend applying a style guide.

Examples should show different variations possible for function arguments. E.g., bar_col, spline_col, dot_col, and dot_size should all be specified in the examples so the user can see what the input should/could be.

sw_sum.R

See note above on including plots.

tests

maelle commented 6 months ago

Thanks @njtierney for your very thorough review! :pray:

maelle commented 6 months ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/618#issuecomment-1867354502 time 5

ropensci-review-bot commented 6 months ago

Logged review for njtierney (hours: 5)

hanecakr commented 6 months ago

Thanks a lot for all your time and valuable feedback. I will start processing your comments soon. But will probably need some further advice from you or the team to tackle all issues raised. But first ... a Christmas break 🎄

ropensci-review-bot commented 6 months ago

:calendar: @ajpelu you have 2 days left before the due date for your review (2023-12-30).

ajpelu commented 6 months ago

Sorry for the slight delay in the review. Here it is.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

I think the target audience of the pkg would be indicated more explicitly.

The images are not very illustrative of what the package does, they simply seem to indicate that they are wooden tokens. Perhaps it would be interesting to think of a schematic or similar, but the photos do not seem to provide much.

In the installation instructions, it might be more effective to provide a link to the GitHub repository of the package rather than displaying the raw GitHub link itself.

Why does the author prefer to use the package pak rather than devtools to install the development version of the pkg? I see that in the vignette he indicated the two ways (pak and devtools).

The vignette outlines the primary functions but lacks a straightforward workflow for beginners.

There are missing citations in the vignette. Please add a reference list at the end.

In the vignette of the website it appears:

sw_sum(fellingdateR:::trs_example7, plot = TRUE)

but when I run locally i got:

sw_sum(fellingdateR:::trs_example7, plot = TRUE)
Error in as.data.frame(x) : object 'trs_example7' not found

Please consider change the three ::: by ::

No community guidelines found, only in DESCRIPTION there is a BugReports linked.

Functionality

All test pass. The coverage is 77.44 % (using devtools::test_coverage())

Estimated hours spent reviewing: 13


Review Comments

Thank you for your contribution with the fellingdateR package. It provides a valuable suite of functions for estimating, reporting, and combining felling dates of historical tree-ring series. The comprehensive set of tools makes it easier to infer felling date estimates, considering factors like preserved sapwood or waney edge.

Given my expertise and interest in dendrochronology, I thoroughly enjoyed reviewing this package. The functions enable users to estimate felling date ranges for both individual series and groups of timbers, offering a valuable feature for summing sapwood probability distributions.

This being my first software review for Ropensci, I apologize for any potential errors or unclear comments. While I've provided various comments (a lot of them), feel free to focus on those that are most relevant or useful. I commend you for effectively addressing a challenging problem with this package. If you need clarification or assistance implementing the suggestions, please reach out. Thank you for submitting this software; the review process was a pleasure!

General comments

Coverage

Please consider review the test coverage of the movAv.R function (~57.14%)

Goodpractices

After run the goodpractices::gp() several code lines does not pass the check:

✖ write short and simple functions. These functions have high cyclomatic complexity (>50): read_fh (150). You can make them easier to reason about by encapsulating distinct steps of your function into subfunctions. ✖ use '<-' for assignment instead of '='. '<-' is the standard, and R users and developers are used it and it is easier to read your code for them if you use '<-'.

R/cor_table.R:108:18
R/cor_table.R:109:22
R/cor_table.R:112:22
R/hdi.R:48:12
R/hdi.R:49:14
... and 2 more lines

✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines shorter than 80 characters

data-raw/DATASETS.Rmd:17:81
data-raw/DATASETS.Rmd:19:81
data-raw/DATASETS.Rmd:35:81
data-raw/DATASETS.Rmd:51:81
data-raw/DATASETS.Rmd:68:81
... and 153 more lines

✖ avoid sapply(), it is not type safe. It might return a vector, or a list, depending on the input data. Consider using vapply() instead.

R/cor_table.R:192:20
R/cor_table.R:195:20

✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...) expressions. They are error prone and result 1:0 if the expression on the right hand side is zero. Use seq_len() or seq_along() instead.

R/fd_report.R:162:19
R/hdi.R:54:16
R/sw_combine.R:123:27
R/sw_combine.R:180:27
R/sw_combine.R:361:35
... and 1 more lines

✖ fix this R CMD check NOTE: Note: found 11 marked UTF-8 strings

✖ avoid 'T' and 'F', as they are just variables which are set to the logicals 'TRUE' and 'FALSE' by default, but are not reserved words and hence can be overwritten by the user. Hence, one should always use 'TRUE' and 'FALSE' for the logicals.

R/sw_combine_plot.R:NA:NA
R/sw_combine_plot.R:NA:NA
R/sw_combine_plot.R:NA:NA
R/sw_combine_plot.R:NA:NA
R/sw_combine_plot.R:NA:NA
... and 12 more lines

read_fh.R

get_header.R

if (!is.data.frame(rwl) || !inherits(rwl, "rwl")) {
    stop("Input should be a data.frame of class 'rwl'")
  }

cor_table.R

sw_model.R

s <- sw_model("Hollstein_1980", plot  = TRUE)
s + theme(panel.grid = element_blank())

So if there is an specific function to plot the data, this function would include the ... in the arguments, to allow include other ggplot2 arguments.

I suggest to use .data[[1]]

ggplot2::geom_segment(
  data = hdi_model,
  ggplot2::aes(
    x = .data[[1]],
    xend = .data[[2]],
    y = 0,
    yend = 0
  ),
  colour = "grey30",
  linewidth = .8,
  alpha = 0.8
)

sw_interval.R

sw_interval_plot.R

if (!all(c("year", "n_sapwood", "p") %in% names(attributes(x)))) {
  stop("Input structure differs from the expected output of sw_interval()")
}

sw_combine.R

sw_combine_plot.R

sw_sum.R

sw_sum_plot.R

Other comments

Please be consistent along the package.

                ggplot2::geom_area(
                        ggplot2::aes(x = ifelse(year >= lower, year, NA),
                                     y = p.x),
                        fill = "tomato3",
                        color = "tomato3",
                        alpha = 0.3,
                        linewidth = 1

could be indicated as arguments, e.g. area_fill and area_color.

hanecakr commented 6 months ago

Thank you for your time investment and extensive review report. I hope to find some time next week to start working on all comments and suggestions. Might need some further assistance to tackle all issues raised. Happy new year! 🎇🎇🎇

maelle commented 5 months ago

@ajpelu thanks a ton for your review!

maelle commented 5 months ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/618#issuecomment-1872633358 time 13

ropensci-review-bot commented 5 months ago

Logged review for ajpelu (hours: 13)

ropensci-review-bot commented 5 months ago

@hanecakr: please post your response with @ropensci-review-bot submit response <url to issue comment> if you haven't done so already (this is an automatic reminder).

Here's the author guide for response. https://devguide.ropensci.org/authors-guide.html

maelle commented 4 months ago

:wave: @hanecakr, any update? :smile_cat:

hanecakr commented 4 months ago

Dear Maëlle,

I very sorry for the horrible delay in my response to the review reports. Especially given the timely reports of the reviewers. Every year, jan-feb are months at work where most of my time is consumed by strickt deadlines. But in a week or two, sky’s look bright again 😉 and I will start working on the review!

have a nice day, Kristof.

Kristof Haneca Erfgoedonderzoeker | dendrochronoloog

Agentschap Onroerend Erfgoed M +32 (0)474 44 81 96 @.https://orcid.org/0000-0002-7719-8305 @. https://twitter.com/KristofHaneca @.*** https://www.researchgate.net/profile/Kristof_Haneca

From: Maëlle Salmon @.> Sent: Friday, February 9, 2024 7:54 AM To: ropensci/software-review @.> Cc: Haneca Kristof @.>; Mention @.> Subject: Re: [ropensci/software-review] fellingdateR: Estimate, report and combine felling dates of historical tree-ring series (Issue #618)

👋 @hanecakrhttps://github.com/hanecakr, any update? 😸

— Reply to this email directly, view it on GitHubhttps://github.com/ropensci/software-review/issues/618#issuecomment-1935419730, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABYHJXJO4KYPMYSIAYQKB5DYSXBXZAVCNFSM6AAAAAA7THBDQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZVGQYTSNZTGA. You are receiving this because you were mentioned.Message ID: @.**@.>>

maelle commented 4 months ago

Thanks for the update @hanecakr, and good luck with the deadlines!

hanecakr commented 3 months ago

Again a big thank you @njtierney for your review report and time. I was not able to address all issues raised earlier, but have now found some time to work on the package and to provide an answer to your comments and suggestions. I've copy-pasted the review report and inserted my replies below.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

I have added you as a reviewer in de DESCRIPTION

Documentation

The package includes all the following forms of documentation:

The opening paragraphs of the README are good, and I think that this R package solves a challenging problem, so firstly, well done! I think could be made a little bit clearer in terms of the problem it solves, and the input it takes. While I find the photos useful, it initially made me think that this software takes images as input. I would suggest something more like what is in the vignette to start:

fellingdateR offers a set of functions that assist in inferring felling date estimates from dated tree-ring series.

Then, describe the problem you want to solve, which I think is estimating when the timber was cut down. Then show the data, explain what the columns mean, and how this might be a typical example of dated tree-ring series data.

Then show a short example of the output, clearly demonstrating the problem the package solves.

The rest of the first paragraph:

The presence of (partially) preserved sapwood or waney edge allows to estimate a range for the actual felling date, for individual series as well as for a group of timbers. Furthermore, an additional function provides a tool to sum sapwood probability distributions, comparable to 'summed probability densities' commonly applied to sets of radiocarbon (14C) dates.

Is important, but I think could go into more of a methods/general introduction part of the README, perhaps further down.

I'm not sure what the images show me, and so to communicate this effectively I think they should contain a caption.

I think the target audience could be more clearly stated in the README. Perhaps at the end of the first paragraph.

README has been rewritten according to comments of both reviews.

The 'Get started' vignette provides more detail and examples.

All installed well for me!

It did run successfully locally! T and F should be specified as TRUE and FALSE.

Now TRUE and FALSE are used consistently

The examples ran without error, using:

devtools::run_examples()

There are no community guidelines in the README, I see them in the file: .github/CONTRIBUTING.md, but these are not linked to in the README. Once these are linked, e.g., by writing something like:

## Code of Conduct

Please note that the visdat project is released with a [Contributor Code of Conduct](https://github.com/hanecakr/fellingdateR/blob/main/.github/CONTRIBUTING.md). By contributing to this project, you agree to abide by its terms.

Community guidelines and code of conduct have been added

Functionality

All tests pass - unit tests seem quite good coverage, evaluated using devtools::test_coverage().

Not sure what is the best way to do this. Any practical guidelines? UPDATE: I was a bit afraid to initiate a Git inferno, but apparently it's not that difficult. So the package name had been changed to fellingdater.

Estimated hours spent reviewing: 5

Review Comments

I wanted to open by saying that while I have a lot of feedback, I think that this is a great piece of software that helps solve a tough problem, so well done on the author for writing this! I hope that the feedback is useful 😄 . Please let me know if something is not clear or if you need help implementing these, or further information. Thank you for submitting this software, I enjoyed reviewing it.

General comments

There are a fair few examples from the rOpenSci packaging guide, which I don't think are followed, I have gone through the guide and written some examples here. After the author makes these changes, I would recommend they double check the guide.

All code is in snake_case now

Except for the `read_fh()` function. In the fellingdateR package I build upon the originale code of the read.fh function from the dplR package. I would prefer to stay a close as possible to the original code in the dplR package in order to facilitate future cooperation and possible integration of both functions.

cat() no longer used (except for read_fh - see comment above)

Code has been restyled using the styler-package

Code has been restyled using the styler-package

  • = is sometimes used over <- - I recommend using <- consistently.

    <- is now used consistently as assignment operator

This has been added

see https://hanecakr.github.io/fellingdater

These function now resided in helper-functions.R with #' @noRD

Most examples now include all arguments

sw = [s]{.underline}ap[w]{.underline}ood, fd = [f]{.underline}elling [d]{.underline}ate. I've made this more obvious in the README introduction.

  • Some of the documentation uses reversed backticks, which I haven't seen before, e.g.: ´n_sapwood´ and ´count´

corrected

  ✖ write short and simple
    functions. These functions
    have high cyclomatic
    complexity (>50): read_fh
    (150). You can make them
    easier to reason about by
    encapsulating distinct steps
    of your function into
    subfunctions.
  ✖ use '<-' for
    assignment instead of '='.
    '<-' is the standard, and R
    users and developers are used
    it and it is easier to read
    your code for them if you use
    '<-'.
  ✖ avoid long code lines,
    it is bad for readability.
    Also, many people prefer
    editor windows that are about
    80 characters wide. Try make
    your lines shorter than 80
    characters
  ✖ avoid sapply(), it is
    not type safe. It might return
    a vector, or a list, depending
    on the input data. Consider
    using vapply() instead.
  ✖ avoid 1:length(...),
    1:nrow(...), 1:ncol(...),
    1:NROW(...) and 1:NCOL(...)
    expressions. They are error
    prone and result 1:0 if the
    expression on the right hand
    side is zero. Use seq_len() or
    seq_along() instead.
  ✖ avoid 'T' and 'F', as
    they are just variables which
    are set to the logicals 'TRUE'
    and 'FALSE' by default, but
    are not reserved words and
    hence can be overwritten by
    the user.  Hence, one should
    always use 'TRUE' and 'FALSE'
    for the logicals.

all codes has been styled with the styler-package, <- , TRUE and FALSE now used consistently, and length of some functions reduced by implementing some helper-functions, e.g. for checking input. Use of sapply and 1:length() has been avoided.

= no longer used as assignment operator

if (is.null(y)) {
     y = x
     noRef = TRUE
}
else {
     noRef = FALSE
     y_ori <- y
}

any(pdf_matrix[, 2:length(keycodes) + 1] == 1, na.rm = TRUE))

All numerical information needed to build the plots can be found in the output of the sw_model(), sw_interval(), sw_combine() and sw_sum() functions. Their plot argument defaults to plot = FALSE. So the output of e.g sw_combine(trs_example1) can be used as the input for sw_combine_plot()

```{r}
tmp <- sw_combine(trs_example1) 
tmp
sw_combine_plot(tmp)
```

all snake_case now

Defensive programming has been implemented now, avoiding the use of `else` statements followed by a stop/error-message.

Input checking

I would recommend writing small helpers for input checking, and considering using cli to help write error messages, as it means you could transform this:

if (!inherits(x, "rwl"))  {
     warning("'x' is not class rwl")
}
if (!inherits(y, "rwl"))  {
     warning("'y' is not class rwl")
}

Into:

warn_if_not_rwl(x)
warn_if_not_rwl(y)

And that code could look like this:

warn_if_not_rwl <- function(x,
                            arg = rlang::caller_arg(x),
                            call = rlang::caller_env()){
     cli::cli_warn(
          c("{arg}' is not of class {.cls rwl}")
     )
}

Similarly,

increasing_consecutive_years <- all(diff(as.numeric(row.names(x))) == 1)
if (!increasing_consecutive_years) {
     stop(
          "The tree-ring series 'x' have/has no consecutive years in increasing order as rownames."
     )
}

Could be written as a function:

check_if_increasing_consecutive_years(x)
check_if_increasing_consecutive_years(y)

Admittedly, I do have a strong preference for writing these types of functions, having written about it recently, but I do think that at least using explaining variables, which you've already done in places like:

increasing_consecutive_years <- all(diff(as.numeric(row.names(x))) == 1)

Are a great idea, and there are a few notable places where that would help make the code a bit easier to read, e.g.,

any(
length(min_overlap) != 1 |
!is.numeric(min_overlap) |
min_overlap %% 1 != 0 |
min_overlap < 3
)
  • check_input() is now one of the helper functions in helper-functions.R

  • smaller checks for input values are now available as a helper-function.

  • Most examples you give above is from the read_fh() function. see my previous motivation why I would like to stay close to the original dplR::read.fh() code.

cor_table.R

Refactoring values argument of cor_table. There is a lot of input checking for the values argument. I think that things such as :

if ("glk" %in%  values) {

And so on indicate to me that these could be written up as separate functions, which could return a list of their inputs, perhaps. These could then be delivered using switch, which I often forget how to use, but it would be something like:

values_output <- switch(values,
       "glk" = values_glk(inputs),
       "pearson" = values_pearson(inputs))

Examples should demonstrate all types of the inputs for the function arguments.

parameter `values` was removed from the function. Looking back, this is not an option that would be used frequently., and is certainly not required. Removing it from the function allows to shorten the code a bit, and avoids a lot of the necessary checks.

data.R

I would recommend standardising the dataset names to be all lowercase, so that they are easier to remember. E.g., Sohar_2012_FWE_c becomes: sohar_2012_fwe_c

The datasets include names of authors. The names of the datasets can be easily copied from sw_data_overview()

fd_report.R

I think that fd_report could be renamed felling_report or felling_date_report or similar. While fd is concise, I think it doesn't help facilitate discoverability of the functions.

Similar to cor_table.R, I think that:

if (!series %in% names(df)) {
      stop("--> 'series' does not exist")
}
if (!last %in% names(df)) {
      stop("--> 'last' does not exist")
}
if (!n_sapwood %in% names(df)) {
      stop("--> 'n_sapwood' does not exist")
}
if (!waneyedge %in% names(df)) {
      stop("--> 'waneyedge' does not exist")
}

Could be rewritten as check_if_variable_exists(). Something like:

check_if_variable_exists <- function(x,
                                     df,
                                     arg = rlang::caller_arg(x),
                                     call = rlang::caller_env()){
     arg_in_data <- x %in% names(df)
     if (!arg_in_data) {
          cli::cli_abort(
               c("{.arg {arg}} does not exist")
          )
     }
}

example_checker <- function(x, 
                            series = "series", 
                            last = "last"){
     check_if_variable_exists(series, x)
     check_if_variable_exists(last, x)
}

example_checker(mtcars, 
                series = "wrong")

## Error in `check_if_variable_exists()`:
## ! `series` does not exist

The check_input function is now part of helper-functions.R

get_header.R

This function should move the cat message up the top - and should not use cat, instead using one of the cli functions, like cli_abort.

I think you could use structure instead of setting attributes to NULL:

attr(rwl, "row.names") <- NULL
attr(rwl, "po") <- NULL
attr(rwl, "class") <- NULL
attr(rwl, "names") <- NULL

## becomes

rwl <- structure(
          rwl,
          row.names = NULL,
          po = NULL,
          class = NULL,
          names = NULL
     )

Although I think that they are functionally the same, so feel free to ignore!

cat() no longer used

hdi

This function uses = and <- - suggest sticking to just <-

=no longer used, in favour of <-

movAv

I think this starting chunk would be clearer if only if and not else is used.

The stop error can move to the top of this, so we clearly capture if align is not "center" or "right" or "left". This makes it easier to understand the conditions of error.

if (align == "center") {
     before <- floor((w - 1) / 2)
     after  <- ceiling((w - 1) / 2)
} else if (align == "right") {
     before <- w - 1
     after  <- 0
} else if (align == "left") {
     before <- 0
     after  <- w - 1
} else {
     stop("'align' should be 'center', 'left' or 'right'")
}

I suggest using another explaining variable inside mean:

mean(x[max(0, (i - before)):(i + after)], na.rm = TRUE)

## to something like:

earliest_to_latest <- x[max(0, (i - before)):(i + after)]
mean(earliest_to_latest, na.rm = TRUE)

## or given that this is repeated later
## potentially write this up as a function for reuse?
mean_earliest_latest(x, i, before, after)

As that mean statement is a bit involved to unfurl.

Similarly, the pattern, if (edges == "fill") { and } else if (edges == "nofill") { should be bundled up into a function and applied with switch

Checks for edges and fill are now on top of the script. Else statements have been avoided.

read_fh.R

        # NEW: verbose = TRUE, header = FALSE
        inp <- readLines(fname, ok = TRUE, warn = FALSE)
        # NEW: removes empty lines in .fh file
        inp <- inp[nchar(inp) != 0]
        ## Get start and end positions of headers and data blocks
        header.begin <- grep("^HEADER:$", inp)
        # NEW: Quadro => chrono
        # NEW: Double => half chrono
lengths <- numeric(n) # commit Ronald Visser

I have found that moving comments either into documentation or into issues to help track them is helpful, but I appreciate that sometimes it is best to leave them in the code, but just something that might be worth thinking about :)

Tidying up the error messages in this function would make some of these nested if/else clauses easier to understand.

This is a pretty massive function, a bit over 1200 lines of code. I would recommend breaking down the steps inside this into smaller functions, as this will make the code easier to reason with and maintain in the future.

In the fellingdateR package I build upon the original code of the read.fh function from the dplR package. I would prefer to stay a close as possible to the original code in the dplR package in order to facilitate future cooperation and possible integration of both functions.

I removed all unnecessary comments as they were highlighting sections where I've made changes to the original code.

dplR::read.fh() concentrates on extracting the measurement data. The fellingdateR::read_fh() function extracts also the descriptive (meta-)data from the HEADER fields in a .fh file. This is not possible with the dplR::read.fh function.

Furthermore the fellingdateR::read_fh function allows to read data in CHRON or HALF-CHRONO format.

read.fh() also throws errors when header fields include Capital letters (depends on the software used to produce the .fh files: TSAP, PAST, ...). read_fh() is case-insensitive

sw_combin_plot.R

This is the first time I've seen ############ comment blocks - I'm all for stylistic choices but I am not sure this is needed, especially if this isn't used in other functions.

comment blocks with #### removed

I've not seen this pattern to avoid R CMD Check notes before

   # to avoid notes in CMD check
   year <-
      p <-
      lower <-
      upper <- COMB <- last <- n_sapwood <- A_i <- agreement <- NULL

My tactic has always been to have a separate definition of these, as answered by Carson Sievert on the posit community paage. I don't think there's anything inherently wrong with that, but I could imagine that in some cases this could accidentally erase inputs. Something to be aware of, perhaps?

When I run devtools::check() I get

❯ checking R code for possible problems ... NOTE  
x: no visible binding for global variable ‘p’ 

assigning NULL to these variables avoids the notes., as described in R Packages (2e) https://r-pkgs.org/package-within.html#delta-a-failed-attempt-at-making-a-package

I am all for using the new base R pipe |> - however you need to update your Depends in your DESCRIPTION like so in order to use it, since it only came out in R 4.1.0:

Depends: 
    R (>= 4.1.0)

Thanks! Corrected.

This comment should probably live in a github issue or just be removed:

      # NEXT LINE TRIGGERS WARNING
      # Warning message:
      # Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
      # ℹ Please use one dimensional logical vectors instead.
      # ℹ The deprecated feature was likely used in the fellingdateR package.
      # Please report the issue to the authors.
      # { if (nrow(summary |> dplyr::filter(agreement == "poor")) != 0)
      # replaced by:

these comments are removed

sw_combine.R

This error should check each of the conditions separately - either it has missing values, or it is not numeric.

if (any(is.na(endDate)) | !is.numeric(endDate)) {
     stop(
          "--> Please check the column with 'end dates'.
Some values are possibly missing or the values are not numeric"
     )
}

A check_input() function (in helper-functions.R) now takes care of the input

sw_data_info.R

I think these error messages would benefit from using cli, as discussed above.

sw_data_overview.R

This is a nice function to include to facilitate data discovery

sw_interval_plot.R

This code

if (all(
     !(attributes(x)$names) %in% c(
          "year",
          "n_sapwood",
          "p")
))
     stop("Input differs from output sw_interval()")

Could be rewritten as an error function or the condition in if could be expressed as a function.

sw_interval.R

In the final line of documentation for this function there is a hanging sentence:

#' @return Depends on the value of `hdi`.
#'
#'  * If `hdi = TRUE`, a `numeric vector` reporting the upper and lower limit
#'   of the hdi (attributes provide more detail on `credMass` and the applied
#'   sapwood model (`sw_data`)).
#'  * If `hdi = FALSE`, a `matrix` with scaled p values for each number of
#'   observed sapwood rings. This matrix

Well spotted! Corrected.

sw_model.R

Great to see input checking at the top of the function - I do think these should be rewritten as check input functions.

Helper function d.count I think should be put into a separate R file called utils.R or helpers.R

d.count should use switch pattern and pass functions rather than using if controls.

d.count should be d_count

check_input() and d_dens() (instead of d.count) are now part of helper-functions.R

sw_sum_plot.R

indentation in this code is not consistent - recommend applying a style guide.

Examples should show different variations possible for function arguments. E.g., bar_col, spline_col, dot_col, and dot_size should all be specified in the examples so the user can see what the input should/could be.

examples have been updated with more visibility for the different parameters.

sw_sum.R

See note above on including plots.

tests

_Originally posted by @njtierney in https://github.com/ropensci/software-review/issues/618#issuecomment-1867354502_

hanecakr commented 3 months ago

Dear @ajpelu. You comments and suggestions in your review report have been most helpfull to improve the package. Thank you for your time investment! I've copy-pasted the review report and inserted my replies below.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

I serve at a government agency and mostly work from behind a firewall. Installing packages with devtool::install_packages() often fails (timeout) but miraculously works fine with pak::pak(). I have no idea why and it is probably related to the settings of the firewall, but therefore I prefer pak over devtools now.

  • [ ] Vignette(s): demonstrating major functionality that runs successfully locally

The vignette outlines the primary functions but lacks a straightforward workflow for beginners.

There are missing citations in the vignette. Please add a reference list at the end.

references added as url's to the publication

In the vignette of the website it appears:

 sw_sum(fellingdateR:::trs_example7, plot = TRUE)

but when I run locally i got:

 sw_sum(fellingdateR:::trs_example7, plot = TRUE)
 Error in as.data.frame(x) : object 'trs_example7' not found

Please consider change the three ::: by ::

redundant use of ::: has been removed from the examples in the vignette

  • [x] Function Documentation: for all exported functions
  • [x] Examples: (that run successfully locally) for all exported functions
  • [ ] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

No community guidelines found, only in DESCRIPTION there is a BugReports linked.

Guidelines have been added to th README

Functionality

    snake_case and lowercase naming have been implemented
    consistently
  • Not all the functions follow the object_verb() naming scheme. For instance, the functions read_fh and get_header. I would suggest to author rename them to fh_read and fh_header

    Changed get_header to fh_header, but to highlight the close
    similarity to the original read.fh() function in the dplR
    package, I would like to keep read_fh as the function name.
  • The movAv function's name deviates from the consistent use of snake_case observed in the rest of the functions. To maintain uniformity, it would be preferable to rename it in accordance with the snake_case as indicated in the packaging guidelines

    Changed to mov_av()
  • In the get_header, movAv, read_fh functions please consider change cat() or print() for message or warning, as packaging guidelines indicated.

    cat() no longer used in any of the functions

Estimated hours spent reviewing: 13

I've included you as a reviewer in the DESCRIPTON

Review Comments

Thank you for your contribution with the fellingdateR package. It provides a valuable suite of functions for estimating, reporting, and combining felling dates of historical tree-ring series. The comprehensive set of tools makes it easier to infer felling date estimates, considering factors like preserved sapwood or waney edge.

Given my expertise and interest in dendrochronology, I thoroughly enjoyed reviewing this package. The functions enable users to estimate felling date ranges for both individual series and groups of timbers, offering a valuable feature for summing sapwood probability distributions.

This being my first software review for Ropensci, I apologize for any potential errors or unclear comments. While I've provided various comments (a lot of them), feel free to focus on those that are most relevant or useful. I commend you for effectively addressing a challenging problem with this package. If you need clarification or assistance implementing the suggestions, please reach out. Thank you for submitting this software; the review process was a pleasure!

General comments

This is because some of de provided sapwood datasets do not
cover a specific country, but rather a region, or are composed of
observations of timber from e.g. Norway but the timbers themselves
have been found in the Netherlands (van Daalen_Norway).*** ***So I
named the sapwood datasets according to their authors (and
publication date if applicable), and a few letters to describe the
region/country they cover or method used.

The names of the data sets can be easily copied from the output
of `sw-data_overview()`
  • Some code and documentation overpass the 80 characters (particularly data.R). Please consider adjust to this recommended length.

    All code has been styled using the stylerpackage , and should now adhere to to the Tidyverse styleguide.

  • The README file does not contain any information about how to cite. I found that one in the CITATION file and also in the website of the package. According to Ropensci packaging guidelines the README need to include information about how to cite.

    Citation guidelines have been added tot the README

  • Please consider generate a theme for all plots of your package in a utils.R function

    theme_fdr() is now part of helper-functions.R and implemented in all plotting functions.

  • I found very useful the /paper/paper.md, and I think much of this information would improve also the documentation and the website of the package. For instance, the fellingdateR_workflow is very useful to understand how the package works. Would the author consider include it in the REAMDE or in the DESCRIPTION of the package instead of the current pictures? I think the former is more informative than the latter

    I've inserted the workflow in the README. The paper.md file is a first draft for a paper to be submitted to JOSS, after software review..

  • Why is there no auxiliary function to sw_model (eg. sw_model_plot) like the rest of the functions in the package (eg. sw_interval and sw_interval_plot)? This could lighten the function size and also would be coherent with the rest of the package.

    There is now a new function named sw_model_plot() as suggested.

  • The code style isn't consistent. Please consider use the styler package.

    I've used the styler package to reformat all code.

  • I didn't find top-level documentation:

    I've added a fellingdateR-package.R file with top-level documentation

 ?fellingdateR
 #No documentation for ‘fellingdateR’ in specified packages and libraries:
 #you could try ‘??fellingdateR’```
 Pleas consider generate ?fellingdateR or `?fellingdateR-package`. See  https://devguide.ropensci.org/building.html#general 

The README has been rewritten, including some parts of the paper.md.

  • The vignette "getting started" includes some citations (eg. Hollstein 1980) but not a full reference for this citations. Please include them at the end of the vignette.

    Thanks! Bibliographic references are included as url's/doi's now.

  • Please be consistent with the use of = or \<-, but avoid mixed them. For instance in hdi.R

    Now there is a consistent use of <- as an assignment operator

  • spell check: DESCRIPTION does not contain 'Language' field. Defaulting to 'en-US'. Please consider to specify.

    en-GB is now set as 'Language'

  • Is there code duplication in the package that should be reduced? Yes for instance the ggthemes within the plot functions

    theme_fdr() is now part of helper-functions.R and implemented in all plotting functions

Coverage

Please consider review the test coverage of the movAv.R function (\~57.14%)

test coverage increased for mov_av()

Goodpractices

After run the goodpractices::gp() several code lines does not pass the check:

✖ write short and simple functions. These functions have high cyclomatic complexity (>50): read_fh (150). You can make them easier to reason about by encapsulating distinct steps of your function into subfunctions.

This is particularly true for the read_fh function. Since the .fh file structure is quite flexible and combines both meta-data and measurement data into one format, the code is quite lengthy. Furthermore, in the fellingdateR package I build upon the original code of the read.fh function from the dplR package. I would prefer to stay a close as possible to the original code in the dplR package in order to facilitate future cooperation and possible integration of both functions

✖ use '\<-' for assignment instead of '='. '\<-' is the standard, and R users and developers are used it and it is easier to read your code for them if you use '\<-'.

Now there is a consistent use of \<- as an assignment operator

R/cor_table.R:108:18
R/cor_table.R:109:22
R/cor_table.R:112:22
R/hdi.R:48:12
R/hdi.R:49:14
... and 2 more lines

✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines shorter than 80 characters

styler package was used to reformat all code

data-raw/DATASETS.Rmd:17:81
data-raw/DATASETS.Rmd:19:81
data-raw/DATASETS.Rmd:35:81
data-raw/DATASETS.Rmd:51:81
data-raw/DATASETS.Rmd:68:81
... and 153 more lines

✖ avoid sapply(), it is not type safe. It might return a vector, or a list, depending on the input data. Consider using vapply() instead.

 R/cor_table.R:192:20
 R/cor_table.R:195:20

sapply() no longer used

✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...) expressions. They are error prone and result 1:0 if the expression on the right hand side is zero. Use seq_len() or seq_along() instead.

 R/fd_report.R:162:19
 R/hdi.R:54:16
 R/sw_combine.R:123:27
 R/sw_combine.R:180:27
 R/sw_combine.R:361:35
 ... and 1 more lines

seq_len() is now used instead of 1:length(...)

✖ fix this R CMD check NOTE: Note: found 11 marked UTF-8 strings

✖ avoid 'T' and 'F', as they are just variables which are set to the logicals 'TRUE' and 'FALSE' by default, but are not reserved words and hence can be overwritten by the user. Hence, one should always use 'TRUE' and 'FALSE' for the logicals.

TRUE/FALSE now used consistently

R/sw_combine_plot.R:NA:NA
R/sw_combine_plot.R:NA:NA
R/sw_combine_plot.R:NA:NA
R/sw_combine_plot.R:NA:NA
R/sw_combine_plot.R:NA:NA
... and 12 more lines

read_fh.R

dplR::read.fh() concentrates on extracting the measurement
data. The fellingdateR::read_fh() function extracts also the
descriptive (meta-)data from the HEADER fields in a .fh file. This
is not possible with the dplR::read.fh function.

Furthermore the fellingdateR::read_fh function allows to read
data in CHRON or HALF-CHRONO format.

read.fh() also throws errors when header fields include Capital
letters (depends on the software used to produce the .fh files:
TSAP, PAST, ...). read_fh() is case-insensitive.
  • There are several code commented that could be removed. eg. # NEW: verbose = TRUE, header = FALSE Please revise them

    Indeed. The comments highlight what sections have been added compared to the original dplR::read.fh function. I've removed the `NEW:` comments in the current version of read_fh()

  • The nomenclature of the objetcs exhibits a mix of snake_case and dot.case. Let's ensure consistency in the naming convention.

    In order to stay as close as possible to the original dplR::read.fh function (and to facilitate possible future integration of both functions) I haven't changed any of the variable names in the code. For all other functions in the package snake_case was implemented.

  • The function appears to be extensive. Has the author explored the possibility of breaking it down into several auxiliary functions to enhance readability and maintainability?

    See reply on bullet point 3

  • In the function documentation, the author mentions the Heidelberg format file. It would be beneficial to include a link to the specifications of this format file for reference, eg. https://www.treeringsociety.org/resources/SOM/Brewer_Murphy_SupplementaryMaterial.pdf

    Perfect! I've added the link in the description

get_header.R

The function now starts with a check of the input and no longer
requires cat (or equivalent).

The get_header function was renamed to `fh_header`.

cor_table.R

Certainly considered, but this would require a full rewrite of
the current function. Then it should also include the possibility
to use the separate correlation values for crossdating a series on
all positions along a reference series. For this package however,
the goal is to check the date assigned to the tree-ring series and
to find the chronology that matches best in order to decide which
sapwood model to use (depending on timber provenance). So I would
like to keep the function as it is, and have the breakdown into
modular components for each correlation value on the todo-list for
a future project/package.
  • The nomenclature of arguments exhibits a mix of snake_case and dot.case. Let's ensure consistency in the naming convention.

    now all snake_case

  • Remove code comments in the argument section, eg.

    Removed

...
output = "table",
#c("matrix", "table")
... 

Added to the documentation

  • Ensure uniformity in assignment by using the same operator consistently---avoid mixing <- and =

    = no longer used as assignment operator

  • Consider avoid the use of sapply. See this https://adv-r.hadley.nz/functionals.html. For instance, consider

 y <- y[, apply(!is.na(y), 2, sum) > 3, drop = FALSE]

instead of

 y <-
                y[, sapply(y, function(col)
                     sum(!is.na(col))) > 3, drop = FALSE]

sapply() no longer used in this, or other functions

sw_model.R

`=` no longer used as assignment operator
  • I'm not sure if incorporating the reading of a csv is optimal in a function. What if a user wants to input a dataframe that isn't from a CSV file? I propose modifying the function to accept a dataframe or tibble with mandatory n_sapwood and count columns. Additionally, allowing users to import from a CSV could be achieved by either creating a small auxiliary function or simply utilizing utils::read.csv(). This also reduce the size of the sw_model function. Please consider this for all those functions with the same issue.

    You are right! I might have over complicated the sw-functions with this. So I removed this option from this and all other sw_functions. Now a user-defined custom data set - adata.frame with columns n_sapwood and count - can be supplied to the function.

  • I think it is better to have a separate function for plotting, which could be called after the computation is completed by sw_model, for instance, sw_model_plot. This also allow to user modify the plot generated. For instance if the user doesn`t like the grid within the plot. In the current form to do that the user need to type:

 s <- sw_model("Hollstein_1980", plot  = TRUE)
 s + theme()panel.grid = element_blank()

sw_model_plot() was added to the package

So if there is an specific function to plot the data, this function would include the ... in the arguments, to allow include other ggplot2 arguments.

no longer needed in the new function sw_model_plot()

  • In the documentation of sw_model, the values of the densfun parameter are not in italics and are enclosed in double quotation marks. It doesn't matter, but please maintain consistency with the rest of the documentation. For example, in sw_interval.R, they are in italics.

    densfun documentation in italic now

sw_interval.R

comments have been removed
  • In this function, the separation of the plot part from the computation one is appreciated. A further step would be to remove the plotting option completely from the function, and specified in the documentation (and also in the vignette) to use the sw_interval_plot function to obtain the plot.

    plot argument is now set to plot = FALSE as default. The numeric output of the function can then be supplied to the plotting functions to create the visualization.

sw_interval_plot.R

assigning NULL to these variables avoids the notes., as
described in R Packages (2e)
<https://r-pkgs.org/package-within.html#delta-a-failed-attempt-at-making-a-package>
# to avoid notes in CMD check
p.x <- upper <- year <- NULL

¿any alternative to solve them?

OK. I've replaced this section in the code

  • You mixed snake_case with dot.case in the naming of objects. I would suggest to choose one and be consistent.

    should_be_consistent_now

  • What this code snippet do? If I understand correctly, the sw_interval_plot requires the output of sw_interval, so is a .csv format an output of the sw_interval?

    The following was deleted because the option to use a .csv file as input was removed from all functions.

if (grepl("\\.csv$", sw_data.p))
sw_data.p <- basename(sw_data.p)
p = probability. documented now in \@return

sw_combine.R

More defensive programming is now implemented. check_input() is now a helper function (helper-functions.R).

  • Camel case and snake_case styles are mixed in the function. Please be consistent.

    all snake_case now

  • There are some duplications in the function, for instance pdf_matrix[, 1] <- timeAxis

    now only mentioned once

  • Please revise the comments within the function. Are they all necessary? Example:

       # when multiple exact felling dates are listed that do no correspond
       # --> COMB = 0 and after scaling NaN (division by 0)
       # check rowwise if there is any p-value == 1,
       # and replace COMB at that position with 1

Most comments have been removed

  • Perhaps this function could be modularized into several simpler functions. Initially, a function could evaluate whether the dataset contains all series with preserved sapwood or if any series has an exact felling date, among other conditions (as indicated in the examples of the function). Subsequently, based on the initial evaluation, the function could call auxiliary functions.

    A check_input() function (in helper-functions.R) now takes care of the input

  • In any case, the function should provide detailed information about the approaches used for each case.

    This is explained in the Get started vignette.

  • Please, see my comment about the data input in csv format for the sw_model.R

    Input from .csv file no longer supported.

  • The elements of the output list generated by this function are not described. Please add to the documentation.

    Added to the documentation (\@return)

sw_combine_plot.R

`rescale()`is now part of helper-functions.R
  • A lot of mixing T, F with TRUE and FALSE. Please be consistent.

    only TRUE and FALSE are used now

  • Camel case and snake_case styles are mixed in the function. Please be consistent..

    all snake_case

  • I would suggest to allow the user customize the color used. See my comment in the below section (Plot functions).

    now possible to choose colors for fill and line.

  • Remove the commented codes inside the function:

    comments removed

# NEXT LINE TRIGGERS WARNING
# Warning message:
# Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
# ℹ Please use one dimensional logical vectors instead.
# ℹ The deprecated feature was likely used in the fellingdateR package.
# Please report the issue to the authors.
# { if (nrow(summary |> dplyr::filter(agreement == "poor")) != 0)
# replaced by:
Correct. Label added to X-axis.
  • Any way to indicate the "A_i" as real subscript? maybe parse = TRUE within geom_text?

    Finally found out how to do this ;-) Done.

  • Please add information in the documentation about the agregation index

    Added to the documentation

  • In the second example:

 combo <- fellingdateR::sw_combine(trs_example2)
 fellingdateR::sw_combine_plot(combo)

I removed ::from the example.

  • What do the arrow and the dot refer? Please include in the documentation.

sw_sum.R

Thanks! Corrected
  • In the documentation the values of the densfun parameter are not in italics and are enclosed in double quotation marks. It doesn't matter, but please maintain consistency with the rest of the documentation. For example, in sw_interval.R, they are in italics.

    italic and consistent now

  • Regarding defensive programming idem comment that for sw_combine. This function requires a dataframe, as indicated in the documentation. However, there is no check for that; instead, the first line forces x to be a data.frame. I would suggest to add a defensive programming line such as:

    defensive programming with helper function check_input() implemented

    if (!is.data.frame(x)) {
     stop("Input 'x' must be a dataframe.")
   }

sw_sum_plot.R

Done for sw_model_plot, sw_interval_plot and sw_sum
  • Can a user change the width of the moving average window used within the sw_sum_plot()? It seems that the sw_sum_plot() uses always the same.

    Extra parameter added to tweak the window width of the smoother

  • Idem with the splines used.

    See previous bullet

  • What do the blue dots, blue bars and red line mean? Please indicate in the documentation

    The dots represent exact felling dates (with waney edge). Probability of 1 assigned to 1 specific year.

    Added to documentation

Other comments

Now same approach in all functions.

  • Please review the naming conventions, including CamelCase, snake_case, and dot.case, used for objects throughout the package. It is recommended to choose a consistent style, and I would suggest using snake_case. Ensure uniformity across the package for improved clarity and maintainability.

    Done. All snake_case now.

  • I'm not sure if incorporating the reading of a csv is optimal in a function. What if a user wants to input a dataframe that isn't from a CSV file? I propose modifying the function to accept a dataframe or tibble with mandatory n_sapwood and count columns. Additionally, allowing users to import from a CSV could be achieved by either creating a small auxiliary function or simply utilizing utils::read.csv(). This also reduce the size of the sw_model function.

    This option has been removed.

  • In the different plot functions, the authors uses a common theme. It might be useful to create an auxiliary function with a ggplot2::theme() specification. Then, each plot function could used this auxiliary function by default or also allow to the user to specify a custom theme.

    theme_fdr() is now part of helper-functions.R and implemented in all plotting functions.

  • I would suggest changing the specified colors in each plot function to custom arguments, allowing users to choose their preferred colors. For instance, in th sw_interval_plot the colors of the area specified by

                 ggplot2::geom_area(
                         ggplot2::aes(x = ifelse(year >= lower, year, NA),
                                      y = p.x),
                         fill = "tomato3",
                         color = "tomato3",
                         alpha = 0.3,
                         linewidth = 1

could be indicated as arguments, e.g. area_fill and area_color.

Done for sw_sum_plot, sw_interval_plot and sw_model.plot

Code style

All code/functions/files have been styled according to the tidyverse styleguide using styler::style_file(…) to avoid inconsistencies in indentation and to avoid code goes over 80 characters

Now <- is used consistently throughout the package (some code in the previous version included = for assignment.`

TRUE instead of T

README

The package description in the README.md was revised according to the comments of both reviewers.

The original illustration was removed, and replaced by an image of a piece of historical timber with sapwood. The question mark highlights what this package basically does: estimate the missing number of sapwood rings.

Now README.md has a final paragraph with a link to the Contributor Code of Conduct

## Comments and contributions

cor_table

parameter `values` was removed from the function. Looking back, this is not an option that would be used frequently., and is certainly not required. Removing it from the function allows to shorten the code a bit.

hanecakr commented 3 months ago

It was suggested to rename the package to fellingdater instead of fellingdateR. Any tips on how to do this without breaking links and initializing git issues? @maelle @ajpelu @njtierney

hanecakr commented 3 months ago

I have a first draft of a paper ["paper/paper.md"] that I want to submit to JOSS if/when I manage to address all issues raised in the review proces. Can this paper remain a .md file under the paper-folder, should I remove it from the package until after review by JOSS, or should I already reformat it to a vignette?

maelle commented 3 months ago

@hanecakr thank you! In your answer, it is hard for me to see what is quoted from the reviews, and what is your answer. Could you please consistently use quote formatting > for the parts that come from reviewers please? Thank you!