fellingdateR: Estimate, report and combine felling dates of historical tree-ring series

hanecakr commented 7 months ago

Date accepted: 2024-04-08

Submitting Author Name: Kristof Haneca Submitting Author Github Handle: !--author1-->@hanecakr@maelle<!--end-editor-- Reviewers: @njtierney, @ajpelu

Archive: TBD Version accepted: TBD Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: fellingdateR
Title: Estimate, report and combine felling dates of historical tree-ring 
    series
Version: 0.0.0.9003
Authors@R: c(
    person("Kristof", "Haneca", , "Kristof.Haneca@vlaanderen.be",role = c("aut", "cre"),
        comment = c(ORCID = "0000-0002-7719-8305")),
    person("Koen", "Van Daele", role = "ctb",
        comment = c(ORCID = "0000-0002-8153-2978")),
    person("Ronald", "Visser", role = "ctb",
        comment = c(ORCID = "0000-0001-6966-1729"))    
    )
Description: `fellingdateR` is an R package that aims to facilitate the 
  analysis and interpretation of tree-ring data from wooden 
  cultural heritage objects and structures. The package standarizes the process
  of computing and combining felling date estimates, both for individual and 
  groups of related tree-ring series.
URL: https://github.com/hanecakr/fellingdateR
BugReports: https://github.com/hanecakr/fellingdateR/issues
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Depends: 
    R (>= 2.10)
LazyData: true
Imports: 
    dplyr,
    ggplot2,
    ggtext,
    MASS,
    plyr,
    tidyr,
    utils,
    dplR
Suggests: 
    covr,
    knitr,
    rmarkdown,
    testthat (>= 3.0.0)
Config/testthat/edition: 3
VignetteBuilder: knitr

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [ ] data retrieval
- [ ] data extraction
- [ ] data munging
- [ ] data deposition
- [ ] data validation and testing
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [x] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences):

The package aims to facilitate and standardize the computation of felling dates from dated tree-ring series. It covers all steps of processing measured ring-width series from raw data to the reporting of felling date estimates, for single series and for groups of related tree-ring series.

Who is the target audience and what are scientific applications of this package?

Maily professional tree-ring scientists (dendrochronologists) that work on historical timbers and wooden objects (archaeology, architectural history, sculptures, panel paintings, etc.)

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

No, no other R-packages do the same job.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

yes

If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any pkgcheck items which your package is unable to pass.

All checks pass

Technical checks

Confirm each of the following by checking the box.

[x ] I have read the rOpenSci packaging guide.
[x ] I have read the author guide and I expect to maintain this package for at least 2 years or to find a replacement.

This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions, created with roxygen2.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, including reporting of test coverage.

Publication options

[x] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

[x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

ropensci-review-bot commented 7 months ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 7 months ago

:rocket:

The following problem was found in your submission template:

URL = [https] is not valid The package could not be checked because of problems with the URL. Editors: Please ensure these problems are rectified, and then call @ropensci-review-bot check package.

:wave:

mpadge commented 7 months ago

@hanecakr The error was because your URL field was markdown-style: "[url](url)", and should just be plain "url". I've edited to fix; you should now be able to call @ropensci-review-bot check package yourself to restart checks. Sorry for any inconvencience.

hanecakr commented 7 months ago

@ropensci-review-bot check package

ropensci-review-bot commented 7 months ago

Thanks, about to send the query.

ropensci-review-bot commented 7 months ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 7 months ago

Checks for fellingdateR (v0.0.0.9003)

git hash: 05c449fb

:heavy_check_mark: Package name is available
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_check_mark: Package coverage is 77.6%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.
:eyes: Function names are duplicated in other packages

(Checks marked with :eyes: may be optionally addressed.)

Package License: MIT + file LICENSE

1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:------------|------:| |internal |base | 363| |internal |stats | 33| |internal |fellingdateR | 26| |internal |grDevices | 7| |internal |graphics | 2| |imports |ggplot2 | 32| |imports |utils | 15| |imports |dplyr | 6| |imports |ggtext | 3| |imports |plyr | 2| |imports |tidyr | 2| |imports |dplR | 2| |imports |MASS | 1| |suggests |covr | NA| |suggests |knitr | NA| |suggests |rmarkdown | NA| |suggests |testthat | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

base

sub (32), rep (31), length (21), c (17), nrow (17), apply (15), matrix (15), max (14), ncol (13), data.frame (10), seq (10), which (10), range (9), for (8), paste0 (8), as.data.frame (7), attributes (7), numeric (7), as.numeric (6), by (6), drop (5), is.na (5), mean (5), min (5), list (4), summary (4), diff (3), floor (3), format (3), rownames (3), setdiff (3), sum (3), T (3), unique (3), any (2), character (2), get (2), grep (2), lapply (2), names (2), round (2), sapply (2), scale (2), seq_len (2), abs (1), all (1), as.character (1), as.integer (1), ceiling (1), colnames (1), colSums (1), complex (1), cumsum (1), dim (1), file.exists (1), gettext (1), Im (1), intersect (1), lengths (1), logical (1), match (1), merge (1), message (1), Re (1), readLines (1), regexpr (1), seq_along (1), sort (1), sqrt (1), substring (1), table (1), vapply (1)

stats

df (18), spline (7), end (3), sd (2), start (2), sigma (1)

ggplot2

element_blank (13), element_text (5), ggplot (3), aes (2), arrow (2), geom_line (2), unit (2), geom_area (1), scale_x_continuous (1), theme (1)

fellingdateR

hdi (7), sw_model (6), sw_data_overview (3), d.count (2), d.dens (2), fd_report (2), sw_interval (2), strip.comment (1), sw_combine_plot (1)

utils

data (13), citation (1), tail (1)

grDevices

pdf (7)

dplyr

left_join (1), mutate (1), pull (1), relocate (1), select (1), summarize (1)

ggtext

element_markdown (3)

dplR

rwl.stats (2)

graphics

title (2)

plyr

round_any (2)

tidyr

pivot_longer (2)

MASS

fitdistr (1)

2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 16 files) and - 1 authors - 1 vignette - 13 internal data files - 8 imported packages - 15 exported functions (median 80 lines of code) - 21 non-exported functions in R (median 50 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 16| 74.9| | |files_vignettes | 1| 68.4| | |files_tests | 14| 93.8| | |loc_R | 2855| 89.8| | |loc_vignettes | 193| 48.1| | |loc_tests | 480| 74.1| | |num_vignettes | 1| 64.8| | |data_size_total | 6023| 68.2| | |data_size_median | 465| 60.0| | |n_fns_r | 36| 45.9| | |n_fns_r_exported | 15| 58.5| | |n_fns_r_not_exported | 21| 42.1| | |n_fns_per_file_r | 1| 20.7| | |num_params_per_fn | 4| 67.3| | |loc_per_fn_r | 65| 93.5| | |loc_per_fn_r_exp | 80| 89.2| | |loc_per_fn_r_not_exp | 50| 90.3| | |rel_whitespace_R | 11| 80.9| | |rel_whitespace_vignettes | 68| 73.3| | |rel_whitespace_tests | 11| 58.1| | |doclines_per_fn_exp | 36| 43.7| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 30| 55.5| | ---

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

3. `goodpractice` and other checks

Details of goodpractice checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check.yaml](https://github.com/hanecakr/fellingDateR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/hanecakr/fellingdateR/actions) [![pkgcheck](https://github.com/hanecakr/fellingDateR/workflows/pkgcheck/badge.svg)](https://github.com/hanecakr/fellingdateR/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:-------------|:----------|:------|----------:|:----------| | 6932735657|pkgcheck |success |05c449 | 5|2023-11-20 | | 6932735666|R-CMD-check |success |05c449 | 9|2023-11-20 | | 6932735656|test-coverage |success |05c449 | 12|2023-11-20 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following note: 1. checking data for non-ASCII characters ... NOTE Note: found 11 marked UTF-8 strings R CMD check generated the following check_fails: 1. cyclocomp 2. rcmdcheck_non_ascii_characters_in_data #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 77.6 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- read_fh | 150 fd_report | 42 sw_sum | 32 cor_table | 31 sw_combine | 27 sw_interval | 17 movAv | 16 sw_combine_plot | 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 170 potential issues: message | number of times --- | --- Avoid 1:length(...) expressions, use seq_len. | 6 Avoid library() and require() calls in packages | 1 Avoid using sapply, consider vapply instead, that's type safe | 2 Lines should not be more than 80 characters. | 154 Use <-, not =, for assignment. | 7

4. Other Checks

Details of other checks (click to open)

:heavy_multiplication_x: The following 3 function names are duplicated in other packages: - - `get_header` from eventr - - `hdi` from bayestestR, ggdist, hdi, HDInterval - - `movAv` from berryFunctions

Package Versions

|package |version | |:--------|:--------| |pkgstats |0.1.3.9 | |pkgcheck |0.1.2.11 |

Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

jhollist commented 7 months ago

@hanecakr Thank you for the submission. All looks good to me. Will pass on the handling editor momentarily.

jhollist commented 7 months ago

@ropensci-review-bot assign @maelle as editor

ropensci-review-bot commented 7 months ago

Assigned! @maelle is now the editor

maelle commented 7 months ago

Thanks for your submission @hanecakr! I have some comments before I can proceed with the reviewer search. :deciduous_tree:

Happy to discuss the comments below! :smile_cat:

Editor checks:

[ ] Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package. In particular,
- [x] Is the case for the package well made?
- [ ] Is the reference index page clear (grouped by topic if necessary)?
- [ ] Are vignettes readable, sufficiently detailed and not just perfunctory?
[x] Fit: The package meets criteria for fit and overlap.
[ ] Installation instructions: Are installation instructions clear enough for human users?
[ ] Tests: If the package has some interactivity / HTTP / plot production etc. are the tests using state-of-the-art tooling?
[x] Contributing information: Is the documentation for contribution clear enough e.g. tokens for tests, playgrounds?
[x] License: The package has a CRAN or OSI accepted license.
[x] Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?

Editor comments

Installation instructions

For brevity you can keep only the installation instructions with pak.

Dependencies

I wonder whether the dependency on plyr for very few functions could be avoided?

Metadata

I don't think it's allowed to have backticks in a DESCRIPTION file, I see some in the Description field. Or if it's allowed, at least it's unusual?

Docs

I'd recommend creating a pkgdown website. In the vignette and README it'd make sense to remove the description of functions and datasets, instead pointing to the reference index of the pkgdown website. It'd make it easier to skim the README, and to see what functions/datasets there are at once (for reviewers but also users!).

Data

I see the example datasets are saved as internal datasets however you use them in the documentation (in the README) so why not make them exported datasets? https://r-pkgs.org/data.html#sec-data-data

Rather than using the names with "dummy" I'd recommend informative names. (Beside, some might consider "dummy" to be an offensive word related to ableism).

Tests

I see some top-level code in for instance https://github.com/hanecakr/fellingdateR/blob/master/tests/testthat/test-fd_report.R I'd recommend instead having a function defined in a helper file like tests/testthat/helper-testdata

test_data <- function() {
data.frame(series = c("aaa", "bbb", "ccc", "no_last", "no_sapwood"),
                       n_sapwood = c(10, 11, 12, 10, NA),
                       waneyedge = c(FALSE, FALSE, TRUE, FALSE, FALSE),
                       last = c(123, 456, 1789, NA, 1978))
}

that you'd then call from each test (not each test file) to create the data. Even better, it could have a more informative name. For more context: https://blog.r-hub.io/2020/11/18/testthat-utility-belt/#code-called-in-your-tests and https://r-pkgs.org/testing-advanced.html#sec-testing-advanced-fixture-helper With such a strategy, each test is easier to read and to run, as it's self-contained.

I think some test lines could be more condensed, e.g. https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/tests/testthat/test-fd_report.R#L105 could go on the previous line, and https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/tests/testthat/test-fd_report.R#L96-L98 or https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/tests/testthat/test-fd_report.R#L120-L122 might be a single line, to see more of the test file at once on the screen?

You shouldn't namespace fellingdateR::: data in test files.

For test-sw_combine_plot.R you might want to look into snapshot testing rather than using ggplot2's internal data structure: https://testthat.r-lib.org/articles/snapshotting.html#whole-file-snapshotting (other relevant source: https://www.tidyverse.org/blog/2022/09/playing-on-the-same-team-as-your-dependecy/#testing-testing)

Why is there a test that is commented out? https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/tests/testthat/test-sw_interval.R#L2 (as opposed to, say, completely removed)

Instead of the regular expressions for testing errors, you might be interested in this approach of classed errors: https://www.mm218.dev/posts/2023-11-07-classed-errors/

Code

Please read the output from lintr in the automatic checks above, and fix accordingly (or update this thread to explain why you disagree).

Lines such as https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/R/cor_table.R#L117 should use inherits()

if (!inherits(x, "rwl")) {

In for instance cor_table.R, I think the code might be more readable with explaining variables. For instance

if (!all(diff(as.numeric(row.names(x))) == 1)) {

would be

increasing_consecutive_years <- (all(diff(as.numeric(row.names(x))) == 1))
if (!increasing_consecutive_years) {

Same comments as for test files, I think there could be fewer new lines in scripts such as https://github.com/hanecakr/fellingdateR/blob/master/R/cor_table.R, to facilitate reading / vertical scrolling. I guess this conflicts a bit with the huge indentations in the files (which you are obviously allowed to prefer!).

For the else if in https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/R/sw_interval.R#L236, why not use the switch() function? I must confess I like using it but have to look up the docs every time to remember how to input the arguments. But it'll decrease the complexity of the current logic.

In https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/R/sw_interval_plot.R#L137 what warnings are suppressed? In the same post mentioned previously https://www.mm218.dev/posts/2023-11-07-classed-errors/, it's shown that one can selectively suppress warnings of a given class.

hanecakr commented 7 months ago

Thanks a lot @maelle for this first round of comments. Will try to tackle most of these issues in the next few days.

hanecakr commented 7 months ago

Dear @maelle. I've been working on the package and think I'm able now to address most of your comments. Thanks a lot for your time, valuable insights and recommendations!

I'll list the answers to your comments here:

Installation instructions

For brevity you can keep only the installation instructions with pak.

[x ] Ok. I've removed the install with devtools from README

Dependencies

I wonder whether the dependency on plyr for very few functions could be avoided?

[x ] The function plyr::round_any is now removed from sw_interval_plot(), sw_combine_plot() and sw_sum_plot()
[x ] floor/ceiling(x/10)*10 now does the job

Metadata

I don't think it's allowed to have backticks in a DESCRIPTION file, I see some in the Description field. Or if it's allowed, at least it's unusual?

[x ] Backticks removed from DESCRIPTION

Docs

I'd recommend creating a pkgdown website.

[x ] Initiated a pkgdown website now. I had postponed this because I also have a short paper ready (after code-review) for JOSS. Planned to use that paper (and reviewer comments) for an article/vignette on a pkgdown site. But obviously, for reviewers a pkgdown-site is most helpful at this stage already.

Data

I see the example datasets are saved as internal datasets however you use them in the documentation (in the README) so why not make them exported datasets? https://r-pkgs.org/data.html#sec-data-data

[x ] The exported data are factual (published) data that are used by nearly all functions in the package. With sw_data_overview()users get an overview of what sapwood datasets are available in the package and with sw_data_info() basic descriptives of those dataset.
[ ] When I would make the made-up date also external, that would mix real-world data and made-up data. And would also complicate the code for sw_data_overview(). Therefore I would like to keep the made-up data for examples and tests as internal data.

Rather than using the names with "dummy" I'd recommend informative names. (Beside, some might consider "dummy" to be an offensive word related to ableism).

[x ] Thanks for pointing this out. Changed this throughout all functions and tests to “trs_example1” (tree-ring series example). Indeed much better than ‘dummy’ or ‘fake’.

Tests

I see some top-level code in for instance https://github.com/hanecakr/fellingdateR/blob/master/tests/testthat/test-fd_report.R I'd recommend instead having a function defined in a helper file like tests/testthat/helper-testdata ...

[x ] The top-level code is in fact redundant. The test can equally be run with the internal data. This has now been implemented in the tests.

I think some test lines could be more condensed, e.g. ...

[x ] I reformatted most test-files to make them more condensed. Should have done that before submission…

You shouldn't namespace fellingdateR::: data in test files.

[x ] Right. Removed the namespace to fellingdateR in tests.

For test-sw_combine_plot.R you might want to look into snapshot testing rather than using ggplot2's internal data structure: https://testthat.r-lib.org/articles/snapshotting.html#whole-file-snapshotting (other relevant source: https://www.tidyverse.org/blog/2022/09/playing-on-the-same-team-as-your-dependecy/#testing-testing)

[ ] I've left this as it is now. Can't find an elegant way to implement snapshot testing right now. Need to look deeper into that.

Why is there a test that is commented out? https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/tests/testthat/test-sw_interval.R#L2 (as opposed to, say, completely removed)

[x ] Oops, Removed this from the test-file

Instead of the regular expressions for testing errors, you might be interested in this approach of classed errors: https://www.mm218.dev/posts/2023-11-07-classed-errors/

Code

Please read the output from lintr in the automatic checks above, and fix accordingly (or update this thread to explain why you disagree).

Lines such as https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/R/cor_table.R#L117 should use inherits() if (!inherits(x, "rwl")) {

[x ] Now using inherits() in the latest version of cor_table().

In for instance cor_table.R, I think the code might be more readable with explaining variables. For instance if (!all(diff(as.numeric(row.names(x))) == 1)) { would be increasing_consecutive_years <- (all(diff(as.numeric(row.names(x))) == 1)) if (!increasing_consecutive_years) {

[x ] OK, I've changed this to increase readability

Same comments as for test files, I think there could be fewer new lines in scripts such as https://github.com/hanecakr/fellingdateR/blob/master/R/cor_table.R, to facilitate reading / vertical scrolling. I guess this conflicts a bit with the huge indentations in the files (which you are obviously allowed to prefer!).

[x ] I’ve reformatted the code with RStudio > reformat code. Is indeed (a bit) more condensed now.

For the else if in https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/R/sw_interval.R#L236, why not use the switch() function? I must confess I like using it but have to look up the docs every time to remember how to input the arguments. But it'll decrease the complexity of the current logic.

[ ] I remember I tried this before, but failed to implement switch() for this function. Would prefer to leave it as it is now.

In https://github.com/hanecakr/fellingdateR/blob/05c449fb13b678ac1700177d73a20fad6c55da8c/R/sw_interval_plot.R#L137 what warnings are suppressed? In the same post mentioned previously https://www.mm218.dev/posts/2023-11-07-classed-errors/, it's shown that one can selectively suppress warnings of a given class.

[x ] Doesn’t seem to be necessary anymore to use suppressWarnings()to avoid messages from ggplot, for sw_model()and sw_combine(). So I’ve removed the calls to suppressWarnings()for these functions.
[ ] For sw_interval() however I’ve kept the call to suppressWarnings to avoid warnings as:

 Warning: Removed 52 rows containing non-finite values (stat_align()).  Warning: Removed 43 rows containing missing values (geom_line()).

I'm sure there are more elegant and effective ways to improve the scripts, functions and their documentation, but I hope that these initial revisions have elevated the quality of the package sufficiently, making it amenable for the review process. Of course, I'm open for additional insights and recommendations to further improved the quality and performance of the package.

-- Kristof

maelle commented 7 months ago

Thanks a ton @hanecakr!

A small note, in GitHub Markdown if you type - [x] the checkbox appears as a ticked checkbox, whereas - [x ] does not.

In the README when mentioning the reference index, you could rephrase the link so as not to use "here" (https://webaccess.berkeley.edu/ask-pecan/click-here), for instance "You can find the [list of functions]".

When I would make the made-up date also external, that would mix real-world data and made-up data. And would also complicate the code for sw_data_overview(). Therefore I would like to keep the made-up data for examples and tests as internal data.

Ok, but in that case would it work to not load sysdata (which might look confusing) but instead use data(dataname, package = "fellingdateR")? Also to avoid using ::: when using the data in examples.

Should have done that before submission…

There's already a lot of stuff to do before submission!

Don't hesitate to ask if you need help with snapshot testing, but it's maybe not useful here anyway.

A small non compulsory thing, I see your default branch is named master, you could change it to the new community standard "main" using the usethis function for instance. See also https://www.tidyverse.org/blog/2021/10/renaming-default-branch/

Another Git thing, your repository contains the .Rproj.user folder, should it be removed and added to .gitignore? See https://usethis.r-lib.org/reference/git_vaccinate.html for a handy way to gitignore it globally.

I remember I tried this before, but failed to implement switch() for this function. Would prefer to leave it as it is now.

Fair enough! Happy to try and make a PR if you change your mind (maybe this time I'd remember how to use it straight away :joy: )

I'll now look for reviewers! Thanks again for all your work!

maelle commented 7 months ago

@ropensci-review-bot seeking reviewers

ropensci-review-bot commented 7 months ago

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/618_status.svg)](https://github.com/ropensci/software-review/issues/618)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

hanecakr commented 7 months ago

@maelle Thanks a lot! Meanwhile, I will change the branch name to main, use the data(...) function to load internal data, and have a look at the .Rproj.user folder and try to find a way to add it to .gitignore. The badge will be added to README soon.

maelle commented 7 months ago

@ropensci-review-bot add @njtierney to reviewers

ropensci-review-bot commented 7 months ago

@njtierney added to the reviewers list. Review due date is 2023-12-22. Thanks @njtierney for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot commented 7 months ago

@njtierney: If you haven't done so, please fill this form for us to update our reviewers records.

maelle commented 7 months ago

Thank you @njtierney for accepting to review this package!

hanecakr commented 7 months ago

@njtierney: with a little delay, I decided to follow the advice of @maelle and moved the example datasets from internal to external data, so they become easily available for testing and examples, both for end-uses as during code review. The latest commit implements the necessary (small) changes. Looking forward to your comments and suggestions. -- Kristof.

njtierney commented 7 months ago

Thanks, team! I've got a few deadlines this week but will take a look next week and aim to have this done by the 22nd.

maelle commented 7 months ago

Thank you @njtierney and good luck with the deadlines!

maelle commented 6 months ago

@ropensci-review-bot add @ajpelu to reviewers

ropensci-review-bot commented 6 months ago

@ajpelu added to the reviewers list. Review due date is 2023-12-30. Thanks @ajpelu for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot commented 6 months ago

@ajpelu: If you haven't done so, please fill this form for us to update our reviewers records.

maelle commented 6 months ago

Thank you @ajpelu for accepting to review this package!

ropensci-review-bot commented 6 months ago

:calendar: @njtierney you have 2 days left before the due date for your review (2023-12-22).

njtierney commented 6 months ago

Hi team!

I'm still working through this review. I've done a first pass, but I am just finishing up collecting together my points and polishing up the review so it can be easy to follow. Should be done later this afternoon! :)

njtierney commented 6 months ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
☒ As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

☐ A statement of need: clearly stating problems the software is designed to solve and its target audience in README

The opening paragraphs of the README are good, and I think that this R package solves a challenging problem, so firstly, well done! I think could be made a little bit clearer in terms of the problem it solves, and the input it takes. While I find the photos useful, it initially made me think that this software takes images as input. I would suggest something more like what is in the vignette to start:

fellingdateR offers a set of functions that assist in inferring felling date estimates from dated tree-ring series.

Then, describe the problem you want to solve, which I think is estimating when the timber was cut down. Then show the data, explain what the columns mean, and how this might be a typical example of dated tree-ring series data.

Then show a short example of the output, clearly demonstrating the problem the package solves.

The rest of the first paragraph:

The presence of (partially) preserved sapwood or waney edge allows to estimate a range for the actual felling date, for individual series as well as for a group of timbers. Furthermore, an additional function provides a tool to sum sapwood probability distributions, comparable to ‘summed probability densities’ commonly applied to sets of radiocarbon (14C) dates.

Is important, but I think could go into more of a methods/general introduction part of the README, perhaps further down.

I’m not sure what the images show me, and so to communicate this effectively I think they should contain a caption.

I think the target audience could be more clearly stated in the README. Perhaps at the end of the first paragraph.

☒ Installation instructions: for the development version of package and any non-standard dependencies in README

All installed well for me!

☒ Vignette(s): demonstrating major functionality that runs successfully locally

It did run successfully locally! T and F should be specified as TRUE and FALSE.

☒ Function Documentation: for all exported functions
☒ Examples: (that run successfully locally) for all exported functions

The examples ran without error, using:

devtools::run_examples()

☐ Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

There are no community guidelines in the README, I see them in the file: .github/CONTRIBUTING.md, but these are not linked to in the README. Once these are linked, e.g., by writing something like:

## Code of Conduct

Please note that the visdat project is released with a [Contributor Code of Conduct](https://github.com/hanecakr/fellingdateR/blob/main/.github/CONTRIBUTING.md). By contributing to this project, you agree to abide by its terms.

Functionality

☒ Installation: Installation succeeds as documented.
☒ Functionality: Any functional claims of the software been confirmed.
☒ Performance: Any performance claims of the software been confirmed.
☒ Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.

All tests pass - unit tests seem quite good coverage, evaluated using devtools::test_coverage().

☐ Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
package name passes checks on available::available("fellingdateR")
I think if possible the author should consider renaming the package to all lowercase, fellingdater or fellingdatr.
There are other considerations that I think mean it does not currently conform to the rOpenSci packaging guidelines. Rather than discuss them in too much depth here, I will put them in the review section below.

Estimated hours spent reviewing: 5

☒ Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer (“rev” role) in the package DESCRIPTION file.

Review Comments

I wanted to open by saying that while I have a lot of feedback, I think that this is a great piece of software that helps solve a tough problem, so well done on the author for writing this! I hope that the feedback is useful 😄 . Please let me know if something is not clear or if you need help implementing these, or further information. Thank you for submitting this software, I enjoyed reviewing it.

General comments

There are a fair few examples from the rOpenSci packaging guide, which I don’t think are followed, I have gone through the guide and written some examples here. After the author makes these changes, I would recommend they double check the guide.

Recommend making sure all functions and objects use snake_case.
argument name uses x for most data frame inputs. I would recommend considering naming data things data or .data or similar to help distinguish them from a vector, x. Not required but worth considering, I think.
There is some use of cat in the package, recommend using cli as described in the tidyverse style guide on writing error messages. I expand on this below.
Code style is not consistent, there is mixed use of the number of indentations : between 0 and 8 spaces. I would recommend applying the tidyverse style guide to the package with styler::style_pkg()
Indenting code is important but this 8 space indentation is not consistent with other indentation used in your package, and when reading the code gives the impression that the code is happening inside some/several if/else/for control statements. I would recommend applying a style guide such as the tidyverse style guide or similar, to the code, so that indentation is consistent.
Due to indentation, a lot of lines of code go over 80 characters. I think it is worth the time to re-indent, or rewrite some code by using explaining variables, so the code doesn’t go over 80 characters
= is sometimes used over <- - I recommend using <- consistently.
There is no top level documentation for ?fellingdateR - this could be achieved using usethis::use_package_doc().
package should use a website. See the ropensci guide on building a website
internal functions, like d.dens and d.count should have a #' @noRd tag to mark is as an internal function
examples in code should use all argument parameters
Recommend the author reads through the CRAN gotchas
What does the sw stand for in things like sw_combine and co?
Some of the documentation uses reversed backticks, which I haven’t seen before, e.g.: ´n_sapwood´ and ´count´
There are still a few lines of code that don’t pass the goodpractice::gp() checks. In particular, I think these comments are important:

  ✖ write short and simple
    functions. These functions
    have high cyclomatic
    complexity (>50): read_fh
    (150). You can make them
    easier to reason about by
    encapsulating distinct steps
    of your function into
    subfunctions.
  ✖ use '<-' for
    assignment instead of '='.
    '<-' is the standard, and R
    users and developers are used
    it and it is easier to read
    your code for them if you use
    '<-'.
  ✖ avoid long code lines,
    it is bad for readability.
    Also, many people prefer
    editor windows that are about
    80 characters wide. Try make
    your lines shorter than 80
    characters
  ✖ avoid sapply(), it is
    not type safe. It might return
    a vector, or a list, depending
    on the input data. Consider
    using vapply() instead.
  ✖ avoid 1:length(...),
    1:nrow(...), 1:ncol(...),
    1:NROW(...) and 1:NCOL(...)
    expressions. They are error
    prone and result 1:0 if the
    expression on the right hand
    side is zero. Use seq_len() or
    seq_along() instead.
  ✖ avoid 'T' and 'F', as
    they are just variables which
    are set to the logicals 'TRUE'
    and 'FALSE' by default, but
    are not reserved words and
    hence can be overwritten by
    the user.  Hence, one should
    always use 'TRUE' and 'FALSE'
    for the logicals.

You interchange between using = and <- in your code. I would recommend using <- only. See for example in cor_table.R:

if (is.null(y)) {
     y = x
     noRef = TRUE
}
else {
     noRef = FALSE
     y_ori <- y
}

Error messages. I would recommend building input checking functions to assist in how your write up error messages. There are a few key benefits to this:
- The input checking function then does not get in the way of understanding the intent of your function code.
- You can reuse the input checking functions, so you don’t need to write them again.
- using cli to build the error messages allows you to use glue strings, so you don’t have to try and quote or inject other information into the message string, it should be easier to add details you care about.
Error messages are hard to write well, and it’s great that you’ve included some good input checking! I think you could make the error functions a bit better for the user by following the tidyverse style guide on error messages.
explaining variables. I’ve mention this a few times in the other functions in your package, I think it would be worthwhile searching through your cases of using if and if there is a long conditional in there, e.g.,

any(pdf_matrix[, 2:length(keycodes) + 1] == 1, na.rm = TRUE))

Then I think it would be worthwhile either writing a small wrapper function to identify this, or wrap that up in an explaining variable.
plot = TRUE as a function option.
- I believe plotting functions should be separate to statistical transformation/operations. You have written these in ggplot, and so you can specify an autoplot method or a separate plot_<function> command.
- The user should be able to reconstruct the plot from the data that you give them in these functions. E.g., they should be able to get the key information out, such as for sw_interval, the following information should be given from the function: n, hdi, and the number of sapwood rings.
- It is not clear to me how to get this information, and I think that this is really important that the user doesn’t end up locked into a plot to get their vital statistics. If they want to be able to generate tables or other statistics, then they cannot do this programmatically, they would have to physically eyeball a plot and record down the numbers, like `hdi (95.4%) = between 8 and 26 sapwood rings”. Which is prone to errors.
consistent file names. Some of the files have camelCase names (movAv.R), others are snake_case. I would recommend sticking to a consistent naming scheme, snake_case.
I would try and avoid having else statements contain errors/stops/warnings/messages. This is because in order to understand the message at the end, you need to then walk back up through the condition of logic beforehand. The way to avoid this is to clearly state the error condition at the top.

Input checking

I would recommend writing small helpers for input checking, and considering using cli to help write error messages, as it means you could transform this:

if (!inherits(x, "rwl"))  {
     warning("'x' is not class rwl")
}
if (!inherits(y, "rwl"))  {
     warning("'y' is not class rwl")
}

Into:

warn_if_not_rwl(x)
warn_if_not_rwl(y)

And that code could look like this:

warn_if_not_rwl <- function(x,
                            arg = rlang::caller_arg(x),
                            call = rlang::caller_env()){
     cli::cli_warn(
          c("{arg}' is not of class {.cls rwl}")
     )
}

Similarly,

increasing_consecutive_years <- all(diff(as.numeric(row.names(x))) == 1)
if (!increasing_consecutive_years) {
     stop(
          "The tree-ring series 'x' have/has no consecutive years in increasing order as rownames."
     )
}

Could be written as a function:

check_if_increasing_consecutive_years(x)
check_if_increasing_consecutive_years(y)

Admittedly, I do have a strong preference for writing these types of functions, having written about it recently, but I do think that at least using explaining variables, which you’ve already done in places like:

increasing_consecutive_years <- all(diff(as.numeric(row.names(x))) == 1)

Are a great idea, and there are a few notable places where that would help make the code a bit easier to read, e.g.,

any(
length(min_overlap) != 1 |
!is.numeric(min_overlap) |
min_overlap %% 1 != 0 |
min_overlap < 3
)

cor_table.R

Refactoring values argument of cor_table. There is a lot of input checking for the values argument. I think that things such as :

if ("glk" %in%  values) {

And so on indicate to me that these could be written up as separate functions, which could return a list of their inputs, perhaps. These could then be delivered using switch, which I often forget how to use, but it would be something like:

values_output <- switch(values,
       "glk" = values_glk(inputs),
       "pearson" = values_pearson(inputs))

Examples should demonstrate all types of the inputs for the function arguments.

data.R

I would recommend standardising the dataset names to be all lowercase, so that they are easier to remember. E.g., Sohar_2012_FWE_c becomes: sohar_2012_fwe_c

fd_report.R

I think that fd_report could be renamed felling_report or felling_date_report or similar. While fd is concise, I think it doesn’t help facilitate discoverability of the functions.

Similar to cor_table.R, I think that:

if (!series %in% names(df)) {
      stop("--> 'series' does not exist")
}
if (!last %in% names(df)) {
      stop("--> 'last' does not exist")
}
if (!n_sapwood %in% names(df)) {
      stop("--> 'n_sapwood' does not exist")
}
if (!waneyedge %in% names(df)) {
      stop("--> 'waneyedge' does not exist")
}

Could be rewritten as check_if_variable_exists(). Something like:

check_if_variable_exists <- function(x,
                                     df,
                                     arg = rlang::caller_arg(x),
                                     call = rlang::caller_env()){
     arg_in_data <- x %in% names(df)
     if (!arg_in_data) {
          cli::cli_abort(
               c("{.arg {arg}} does not exist")
          )
     }
}

example_checker <- function(x, 
                            series = "series", 
                            last = "last"){
     check_if_variable_exists(series, x)
     check_if_variable_exists(last, x)
}

example_checker(mtcars, 
                series = "wrong")

## Error in `check_if_variable_exists()`:
## ! `series` does not exist

get_header.R

This function should move the cat message up the top - and should not use cat, instead using one of the cli functions, like cli_abort.

I think you could use structure instead of setting attributes to NULL:

attr(rwl, "row.names") <- NULL
attr(rwl, "po") <- NULL
attr(rwl, "class") <- NULL
attr(rwl, "names") <- NULL

## becomes

rwl <- structure(
          rwl,
          row.names = NULL,
          po = NULL,
          class = NULL,
          names = NULL
     )

Although I think that they are functionally the same, so feel free to ignore!

hdi

This function uses = and <- - suggest sticking to just <-

movAv

I think this starting chunk would be clearer if only if and not else is used.

The stop error can move to the top of this, so we clearly capture if align is not “center” or “right” or “left”. This makes it easier to understand the conditions of error.

if (align == "center") {
     before <- floor((w - 1) / 2)
     after  <- ceiling((w - 1) / 2)
} else if (align == "right") {
     before <- w - 1
     after  <- 0
} else if (align == "left") {
     before <- 0
     after  <- w - 1
} else {
     stop("'align' should be 'center', 'left' or 'right'")
}

I suggest using another explaining variable inside mean:

mean(x[max(0, (i - before)):(i + after)], na.rm = TRUE)

## to something like:

earliest_to_latest <- x[max(0, (i - before)):(i + after)]
mean(earliest_to_latest, na.rm = TRUE)

## or given that this is repeated later
## potentially write this up as a function for reuse?
mean_earliest_latest(x, i, before, after)

As that mean statement is a bit involved to unfurl.

Similarly, the pattern, if (edges == "fill") { and } else if (edges == "nofill") { should be bundled up into a function and applied with switch

read_fh.R

Nice work in the attribution to the other previous work this extends. It looks like this is borrowed from dplR directly, and as such there are small style changes. I think it is worthwhile updating the code style to fit within your package.
Be consistent with naming variables, header.taken should be header_taken etc.
There are a few random comments that I’m not sure need to be there:

        # NEW: verbose = TRUE, header = FALSE
        inp <- readLines(fname, ok = TRUE, warn = FALSE)
        # NEW: removes empty lines in .fh file
        inp <- inp[nchar(inp) != 0]
        ## Get start and end positions of headers and data blocks
        header.begin <- grep("^HEADER:$", inp)
        # NEW: Quadro => chrono
        # NEW: Double => half chrono
lengths <- numeric(n) # commit Ronald Visser

I have found that moving comments either into documentation or into issues to help track them is helpful, but I appreciate that sometimes it is best to leave them in the code, but just something that might be worth thinking about :)

Tidying up the error messages in this function would make some of these nested if/else clauses easier to understand.

This is a pretty massive function, a bit over 1200 lines of code. I would recommend breaking down the steps inside this into smaller functions, as this will make the code easier to reason with and maintain in the future.

sw_combin_plot.R

This is the first time I’ve seen ############ comment blocks - I’m all for stylistic choices but I am not sure this is needed, especially if this isn’t used in other functions.

I’ve not seen this pattern to avoid R CMD Check notes before

   # to avoid notes in CMD check
   year <-
      p <-
      lower <-
      upper <- COMB <- last <- n_sapwood <- A_i <- agreement <- NULL

My tactic has always been to have a separate definition of these, as answered by Carson Sievert on the posit community paage. I don’t think there’s anything inherently wrong with that, but I could imagine that in some cases this could accidentally erase inputs. Something to be aware of, perhaps?

I am all for using the new base R pipe |> - however you need to update your Depends in your DESCRIPTION like so in order to use it, since it only came out in R 4.1.0:

Depends: 
    R (>= 4.1.0)

This comment should probably live in a github issue or just be removed:

      # NEXT LINE TRIGGERS WARNING
      # Warning message:
      # Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
      # ℹ Please use one dimensional logical vectors instead.
      # ℹ The deprecated feature was likely used in the fellingdateR package.
      # Please report the issue to the authors.
      # { if (nrow(summary |> dplyr::filter(agreement == "poor")) != 0)
      # replaced by:

sw_combine.R

This error should check each of the conditions separately - either it has missing values, or it is not numeric.

if (any(is.na(endDate)) | !is.numeric(endDate)) {
     stop(
          "--> Please check the column with 'end dates'.
Some values are possibly missing or the values are not numeric"
     )
}

sw_data_info.R

I think these error messages would benefit from using cli, as discussed above.

sw_data_overview.R

This is a nice function to include to facilitate data discovery

sw_interval_plot.R

This code

if (all(
     !(attributes(x)$names) %in% c(
          "year",
          "n_sapwood",
          "p")
))
     stop("Input differs from output sw_interval()")

Could be rewritten as an error function or the condition in if could be expressed as a function.

sw_interval.R

In the final line of documentation for this function there is a hanging sentence:

#' @return Depends on the value of `hdi`.
#'
#'  * If `hdi = TRUE`, a `numeric vector` reporting the upper and lower limit
#'   of the hdi (attributes provide more detail on `credMass` and the applied
#'   sapwood model (`sw_data`)).
#'  * If `hdi = FALSE`, a `matrix` with scaled p values for each number of
#'   observed sapwood rings. This matrix

sw_model.R

Great to see input checking at the top of the function - I do think these should be rewritten as check input functions.

Helper function d.count I think should be put into a separate R file called utils.R or helpers.R

d.count should use switch pattern and pass functions rather than using if controls.

d.count should be d_count

sw_sum_plot.R

indentation in this code is not consistent - recommend applying a style guide.

Examples should show different variations possible for function arguments. E.g., bar_col, spline_col, dot_col, and dot_size should all be specified in the examples so the user can see what the input should/could be.

sw_sum.R

See note above on including plots.

tests

Do not need to namespace testthat calls, e.g., remove testthat::
consider using snapshot testing, to capture exact values and shape of data that should be stable - rather than always testing for data shape and type columns, this should be able to capture those outputs
consider snapshot error testing to capture exact error messages
consider using vdiffr for testing ggplot plots. See visdat for examples

maelle commented 6 months ago

Thanks @njtierney for your very thorough review! :pray:

maelle commented 6 months ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/618#issuecomment-1867354502 time 5

ropensci-review-bot commented 6 months ago

Logged review for njtierney (hours: 5)

hanecakr commented 6 months ago

Thanks a lot for all your time and valuable feedback. I will start processing your comments soon. But will probably need some further advice from you or the team to tackle all issues raised. But first ... a Christmas break 🎄

ropensci-review-bot commented 6 months ago

:calendar: @ajpelu you have 2 days left before the due date for your review (2023-12-30).

ajpelu commented 6 months ago

Sorry for the slight delay in the review. Here it is.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[ ] A statement of need: clearly stating problems the software is designed to solve and its target audience in README

I think the target audience of the pkg would be indicated more explicitly.

The images are not very illustrative of what the package does, they simply seem to indicate that they are wooden tokens. Perhaps it would be interesting to think of a schematic or similar, but the photos do not seem to provide much.

[ ] Installation instructions: for the development version of package and any non-standard dependencies in README

In the installation instructions, it might be more effective to provide a link to the GitHub repository of the package rather than displaying the raw GitHub link itself.

Why does the author prefer to use the package pak rather than devtools to install the development version of the pkg? I see that in the vignette he indicated the two ways (pak and devtools).

[ ] Vignette(s): demonstrating major functionality that runs successfully locally

The vignette outlines the primary functions but lacks a straightforward workflow for beginners.

There are missing citations in the vignette. Please add a reference list at the end.

In the vignette of the website it appears:

sw_sum(fellingdateR:::trs_example7, plot = TRUE)

but when I run locally i got:

sw_sum(fellingdateR:::trs_example7, plot = TRUE)
Error in as.data.frame(x) : object 'trs_example7' not found

Please consider change the three ::: by ::

[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[ ] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

No community guidelines found, only in DESCRIPTION there is a BugReports linked.

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software have been confirmed.
[x] Performance: Any performance claims of the software have been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.

All test pass. The coverage is 77.44 % (using devtools::test_coverage())

[ ] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
- Please consider to use lowercase name as indicated by 1.1.1 point of the packaging guidelines
- Not all the functions follow the object_verb() naming scheme. For instance, the functions read_fh and get_header. I would suggest to author rename them to fh_read and fh_header
- The movAv function's name deviates from the consistent use of snake_case observed in the rest of the functions. To maintain uniformity, it would be preferable to rename it in accordance with the snake_case as indicated in the packaging guidelines
- In the get_header, movAv, read_fh functions please consider change cat() or print() for message or warning, as packaging guidelines indicated.

Estimated hours spent reviewing: 13

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Thank you for your contribution with the fellingdateR package. It provides a valuable suite of functions for estimating, reporting, and combining felling dates of historical tree-ring series. The comprehensive set of tools makes it easier to infer felling date estimates, considering factors like preserved sapwood or waney edge.

Given my expertise and interest in dendrochronology, I thoroughly enjoyed reviewing this package. The functions enable users to estimate felling date ranges for both individual series and groups of timbers, offering a valuable feature for summing sapwood probability distributions.

This being my first software review for Ropensci, I apologize for any potential errors or unclear comments. While I've provided various comments (a lot of them), feel free to focus on those that are most relevant or useful. I commend you for effectively addressing a challenging problem with this package. If you need clarification or assistance implementing the suggestions, please reach out. Thank you for submitting this software; the review process was a pleasure!

General comments

The data description lacks a consistent pattern. The author occasionally employs country abbreviations while at other times refrains from doing so. Let's strive for coherence in this regard.
Some code and documentation overpass the 80 characters (particularly data.R). Please consider adjust to this recommended length.
The README file does not contain any information about how to cite. I found that one in the CITATION file and also in the website of the package. According to Ropensci packaging guidelines the README need to include information about how to cite.
Please consider generate a theme for all plots of your package in a utils.R function
I found very useful the /paper/paper.md, and I think much of this information would improve also the documentation and the website of the package. For instance, the fellingdateR_workflow is very useful to understand how the package works. Would the author consider include it in the REAMDE or in the DESCRIPTION of the package instead of the current pictures? I think the former is more informative than the latter
Why is there no auxiliary function to sw_model (eg. sw_model_plot) like the rest of the functions in the package (eg. sw_interval and sw_interval_plot)? This could lighten the function size and also would be coherent with the rest of the package.
The code style isn't consistent. Please consider use the styler package.

I didn't find top-level documentation:

?fellingdateR
#No documentation for ‘fellingdateR’ in specified packages and libraries:
#you could try ‘??fellingdateR’```
Pleas consider generate ?fellingdateR or `?fellingdateR-package`. See  https://devguide.ropensci.org/building.html#general

The package has a well-structured website, however I thin much of the information included in the paper.md is not in the website. For instance the background and Statement of need within the paper.md is very useful and I think it would help to the potential users of the pacakge. Please consider include in the README or in an specific vignette.
The vignette "getting started" includes some citations (eg. Hollstein 1980) but not a full reference for this citations. Please include them at the end of the vignette.
Please be consistent with the use of = or <-, but avoid mixed them. For instance in hdi.R
spell check: DESCRIPTION does not contain 'Language' field. Defaulting to 'en-US'. Please consider to specify.
Is there code duplication in the package that should be reduced? Yes for instance the ggthemes within the plot functions

Coverage

Please consider review the test coverage of the movAv.R function (~57.14%)

Goodpractices

After run the goodpractices::gp() several code lines does not pass the check:

✖ write short and simple functions. These functions have high cyclomatic complexity (>50): read_fh (150). You can make them easier to reason about by encapsulating distinct steps of your function into subfunctions. ✖ use '<-' for assignment instead of '='. '<-' is the standard, and R users and developers are used it and it is easier to read your code for them if you use '<-'.

R/cor_table.R:108:18
R/cor_table.R:109:22
R/cor_table.R:112:22
R/hdi.R:48:12
R/hdi.R:49:14
... and 2 more lines

✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines shorter than 80 characters

data-raw/DATASETS.Rmd:17:81
data-raw/DATASETS.Rmd:19:81
data-raw/DATASETS.Rmd:35:81
data-raw/DATASETS.Rmd:51:81
data-raw/DATASETS.Rmd:68:81
... and 153 more lines

✖ avoid sapply(), it is not type safe. It might return a vector, or a list, depending on the input data. Consider using vapply() instead.

R/cor_table.R:192:20
R/cor_table.R:195:20

✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...) expressions. They are error prone and result 1:0 if the expression on the right hand side is zero. Use seq_len() or seq_along() instead.

R/fd_report.R:162:19
R/hdi.R:54:16
R/sw_combine.R:123:27
R/sw_combine.R:180:27
R/sw_combine.R:361:35
... and 1 more lines

✖ fix this R CMD check NOTE: Note: found 11 marked UTF-8 strings

✖ avoid 'T' and 'F', as they are just variables which are set to the logicals 'TRUE' and 'FALSE' by default, but are not reserved words and hence can be overwritten by the user. Hence, one should always use 'TRUE' and 'FALSE' for the logicals.

R/sw_combine_plot.R:NA:NA
R/sw_combine_plot.R:NA:NA
R/sw_combine_plot.R:NA:NA
R/sw_combine_plot.R:NA:NA
R/sw_combine_plot.R:NA:NA
... and 12 more lines

`read_fh.R`

Whats new funtionalities include read_fh not covered by dplR::read.fh()?
There are several code commented that could be removed. eg.
# NEW: verbose = TRUE, header = FALSE Please revise them
The nomenclature of the objetcs exhibits a mix of snake_case and dot.case. Let's ensure consistency in the naming convention.
The function appears to be extensive. Has the author explored the possibility of breaking it down into several auxiliary functions to enhance readability and maintainability?
In the function documentation, the author mentions the Heidelberg format file. It would be beneficial to include a link to the specifications of this format file for reference, eg. https://www.treeringsociety.org/resources/SOM/Brewer_Murphy_SupplementaryMaterial.pdf

`get_header.R`

Please consider include some defensive programming such us:

if (!is.data.frame(rwl) || !inherits(rwl, "rwl")) {
    stop("Input should be a data.frame of class 'rwl'")
  }

avoid cat as indicated in packaging guidelines

`cor_table.R`

Very interesting and useful function. Has the author contemplated breaking down this function into more modular components? For instance, each of the correlation values could potentially be extracted as auxiliary functions, providing users with the flexibility to call them individually as needed.
The nomenclature of arguments exhibits a mix of snake_case and dot.case. Let's ensure consistency in the naming convention.

Remove code comments in the argument section, eg.

...
output = "table",
#c("matrix", "table")
...

In the documentation for the argument sort_by, it would be useful to add the choices to avoid errors. For instance,

#' @param sort_by The correlation value by which the output is sorted for each
#'   series in `x`. One of "r_pearson", "t_St", "glk", "glk_p", "t_BP", "t_Ho". Default to "t_Ho"

Ensure uniformity in assignment by using the same operator consistently—avoid mixing <- and =

Consider avoid the use of sapply. See this https://adv-r.hadley.nz/functionals.html. For instance, consider

y <- y[, apply(!is.na(y), 2, sum) > 3, drop = FALSE]

instead of

y <-
           y[, sapply(y, function(col)
                sum(!is.na(col))) > 3, drop = FALSE]

`sw_model.R`

Ensure uniformity in assignment by using the same operator consistently—avoid mixing <- and =. eg. n_obs = sum(observed$count)
I'm not sure if incorporating the reading of a csv is optimal in a function. What if a user wants to input a dataframe that isn't from a CSV file? I propose modifying the function to accept a dataframe or tibble with mandatory n_sapwood and count columns. Additionally, allowing users to import from a CSV could be achieved by either creating a small auxiliary function or simply utilizing utils::read.csv(). This also reduce the size of the sw_model function. Please consider this for all those functions with the same issue.
I think it is better to have a separate function for plotting, which could be called after the computation is completed by sw_model, for instance, sw_model_plot. This also allow to user modify the plot generated. For instance if the user doesn`t like the grid within the plot. In the current form to do that the user need to type:

s <- sw_model("Hollstein_1980", plot  = TRUE)
s + theme(panel.grid = element_blank())

So if there is an specific function to plot the data, this function would include the ... in the arguments, to allow include other ggplot2 arguments.

Anycase, in the plot section would be better to use .data, for instance, in

ggplot2::geom_segment(
data = hdi_model,
ggplot2::aes(
x = hdi_model[[1]],
xend = hdi_model[[2]],
y = 0,
yend = 0
),
colour = "grey30",
linewidth = .8,
alpha = 0.8
)

I suggest to use .data[[1]]

ggplot2::geom_segment(
  data = hdi_model,
  ggplot2::aes(
    x = .data[[1]],
    xend = .data[[2]],
    y = 0,
    yend = 0
  ),
  colour = "grey30",
  linewidth = .8,
  alpha = 0.8
)

In the documentation of sw_model, the values of the densfun parameter are not in italics and are enclosed in double quotation marks. It doesn't matter, but please maintain consistency with the rest of the documentation. For example, in sw_interval.R, they are in italics.

`sw_interval.R`

Please remove code commented

# Add calendar years to output when y is provided
# if (last == 0) {
#         attr(hdi_int, "credMass") <- credMass
#         attr(hdi_int, "sapwood_data") <- sw_data
#         attr(hdi_int, "model") <- densfun
#    }
# if (last != 0) {

In this function, the separation of the plot part from the computation one is appreciated. A further step would be to remove the plotting option completely from the function, and specified in the documentation (and also in the vignette) to use the sw_interval_plot function to obtain the plot.

`sw_interval_plot.R`

I would suggest to add the ellipsis argument (i.e. ...) to allow another ggplot2 arguments. See also ellipsis pkg for more info. Consider apply this throughout the plot functions of the package.
Which notes are the author trying to avoid with?
```
# to avoid notes in CMD check
    p.x <- upper <- year <- NULL
```
¿any alternative to solve them?

Please, consider change this code

    if (all(
            !(attributes(x)$names) %in% c(
                    "year",
                    "n_sapwood",
                    "p")
    ))
    stop("Input differs from output sw_interval()")

by:

if (!all(c("year", "n_sapwood", "p") %in% names(attributes(x)))) {
  stop("Input structure differs from the expected output of sw_interval()")
}

You mixed snake_case with dot.case in the naming of objects. I would suggest to choose one and be consistent.
What this code snippet do? If I understand correctly, the sw_interval_plot requires the output of sw_interval, so is a .csv format an output of the sw_interval?
```
    if (grepl("\\.csv$", sw_data.p))
            sw_data.p <- basename(sw_data.p)
```
Please specify the meaning of the 'p' on the y-axis.

`sw_combine.R`

This function requires a dataframe, as indicated in the documentation. However, there is no check for that; instead, the first line forces x to be a data.frame. I would suggest to add a defensive programming line such as:
```
if (!is.data.frame(x)) {
stop("Input 'x' must be a dataframe.")
}
```
Camel case and snake_case styles are mixed in the function. Please be consistent.
There are some duplications in the function, for instance pdf_matrix[, 1] <- timeAxis

Please revise the comments within the function. Are they all necessary? Example:

  # when multiple exact felling dates are listed that do no correspond
  # --> COMB = 0 and after scaling NaN (division by 0)
  # check rowwise if there is any p-value == 1,
  # and replace COMB at that position with 1

Perhaps this function could be modularized into several simpler functions. Initially, a function could evaluate whether the dataset contains all series with preserved sapwood or if any series has an exact felling date, among other conditions (as indicated in the examples of the function). Subsequently, based on the initial evaluation, the function could call auxiliary functions.
In any case, the function should provide detailed information about the approaches used for each case.
Please, see my comment about the data input in csv format for the sw_model.R
The elements of the output list generated by this function are not described. Please add to the documentation.

`sw_combine_plot.R`

Please consider taking out the rescale function out of the sw_combine_plot function.
A lot of mixing T, F with TRUE and FALSE. Please be consistent.
Camel case and snake_case styles are mixed in the function. Please be consistent.
I would suggest to allow the user customize the color used. See my comment in the below section (Plot functions).

Remove the commented codes inside the function:

  # NEXT LINE TRIGGERS WARNING
  # Warning message:
  # Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
  # ℹ Please use one dimensional logical vectors instead.
  # ℹ The deprecated feature was likely used in the fellingdateR package.
  # Please report the issue to the authors.
  # { if (nrow(summary |> dplyr::filter(agreement == "poor")) != 0)
  # replaced by:

The function lacks an axis legend. I presume that the x-axis corresponds to calendar time, and the y-axis corresponds to 'p'. Is my understanding correct?
Any way to indicate the "A_i" as real subscript? maybe parse = TRUE within geom_text?
Please add information in the documentation about the agregation index

In the second example:

combo <- fellingdateR::sw_combine(trs_example2)
fellingdateR::sw_combine_plot(combo)

What do the arrow and the dot refer? Please include in the documentation.

`sw_sum.R`

In the documentation there is a mistake. In the description of the last argument please add a blank space in xcontaining
In the documentation the values of the densfun parameter are not in italics and are enclosed in double quotation marks. It doesn't matter, but please maintain consistency with the rest of the documentation. For example, in sw_interval.R, they are in italics.
Regarding defensive programming idem comment that for sw_combine. This function requires a dataframe, as indicated in the documentation. However, there is no check for that; instead, the first line forces x to be a data.frame. I would suggest to add a defensive programming line such as:
```
if (!is.data.frame(x)) {
stop("Input 'x' must be a dataframe.")
}
```

`sw_sum_plot.R`

Good point here using customizable colors. Let's implement this feature consistently across other plot functions for a cohesive and customizable user experience.
Can a user change the width of the moving average window used within the sw_sum_plot()? It seems that the sw_sum_plot() uses always the same.
Idem with the splines used.
What do the blue dots, blue bars and red line mean? Please indicate in the documentation

Other comments

I am surprised that different functions (for instance sw_combine and sw_sum), that accessing similar subdata within a data set, use different approaches:
```
cambium <- df[[waneyedge]] 
```
and
```
cambium <- x[, waneyedge]
```

Please be consistent along the package.

Please review the naming conventions, including CamelCase, snake_case, and dot.case, used for objects throughout the package. It is recommended to choose a consistent style, and I would suggest using snake_case. Ensure uniformity across the package for improved clarity and maintainability.
I'm not sure if incorporating the reading of a csv is optimal in a function. What if a user wants to input a dataframe that isn't from a CSV file? I propose modifying the function to accept a dataframe or tibble with mandatory n_sapwood and count columns. Additionally, allowing users to import from a CSV could be achieved by either creating a small auxiliary function or simply utilizing utils::read.csv(). This also reduce the size of the sw_model function.
In the different plot functions, the authors uses a common theme. It might be useful to create an auxiliary function with a ggplot2::theme() specification. Then, each plot function could used this auxiliary function by default or also allow to the user to specify a custom theme.
I would suggest changing the specified colors in each plot function to custom arguments, allowing users to choose their preferred colors. For instance, in th sw_interval_plot the colors of the area specified by

                ggplot2::geom_area(
                        ggplot2::aes(x = ifelse(year >= lower, year, NA),
                                     y = p.x),
                        fill = "tomato3",
                        color = "tomato3",
                        alpha = 0.3,
                        linewidth = 1

could be indicated as arguments, e.g. area_fill and area_color.

hanecakr commented 6 months ago

Thank you for your time investment and extensive review report. I hope to find some time next week to start working on all comments and suggestions. Might need some further assistance to tackle all issues raised. Happy new year! 🎇🎇🎇

maelle commented 5 months ago

@ajpelu thanks a ton for your review!

maelle commented 5 months ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/618#issuecomment-1872633358 time 13

ropensci-review-bot commented 5 months ago

Logged review for ajpelu (hours: 13)

ropensci-review-bot commented 5 months ago

@hanecakr: please post your response with @ropensci-review-bot submit response <url to issue comment> if you haven't done so already (this is an automatic reminder).

Here's the author guide for response. https://devguide.ropensci.org/authors-guide.html

maelle commented 4 months ago

:wave: @hanecakr, any update? :smile_cat:

hanecakr commented 4 months ago

Dear Maëlle,

I very sorry for the horrible delay in my response to the review reports. Especially given the timely reports of the reviewers. Every year, jan-feb are months at work where most of my time is consumed by strickt deadlines. But in a week or two, sky’s look bright again 😉 and I will start working on the review!

have a nice day, Kristof.

Kristof Haneca Erfgoedonderzoeker | dendrochronoloog

Agentschap Onroerend Erfgoed M +32 (0)474 44 81 96 @.https://orcid.org/0000-0002-7719-8305 @. https://twitter.com/KristofHaneca @.*** https://www.researchgate.net/profile/Kristof_Haneca

From: Maëlle Salmon @.> Sent: Friday, February 9, 2024 7:54 AM To: ropensci/software-review @.> Cc: Haneca Kristof @.>; Mention @.> Subject: Re: [ropensci/software-review] fellingdateR: Estimate, report and combine felling dates of historical tree-ring series (Issue #618)

👋 @hanecakrhttps://github.com/hanecakr, any update? 😸

— Reply to this email directly, view it on GitHubhttps://github.com/ropensci/software-review/issues/618#issuecomment-1935419730, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABYHJXJO4KYPMYSIAYQKB5DYSXBXZAVCNFSM6AAAAAA7THBDQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZVGQYTSNZTGA. You are receiving this because you were mentioned.Message ID: @.**@.>>

maelle commented 4 months ago

Thanks for the update @hanecakr, and good luck with the deadlines!

hanecakr commented 3 months ago

Again a big thank you @njtierney for your review report and time. I was not able to address all issues raised earlier, but have now found some time to work on the package and to provide an answer to your comments and suggestions. I've copy-pasted the review report and inserted my replies below.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

I have added you as a reviewer in de DESCRIPTION

Documentation

The package includes all the following forms of documentation:

[ ] A statement of need: clearly stating problems the software is designed to solve and its target audience in README

The opening paragraphs of the README are good, and I think that this R package solves a challenging problem, so firstly, well done! I think could be made a little bit clearer in terms of the problem it solves, and the input it takes. While I find the photos useful, it initially made me think that this software takes images as input. I would suggest something more like what is in the vignette to start:

fellingdateR offers a set of functions that assist in inferring felling date estimates from dated tree-ring series.

Then, describe the problem you want to solve, which I think is estimating when the timber was cut down. Then show the data, explain what the columns mean, and how this might be a typical example of dated tree-ring series data.

Then show a short example of the output, clearly demonstrating the problem the package solves.

The rest of the first paragraph:

The presence of (partially) preserved sapwood or waney edge allows to estimate a range for the actual felling date, for individual series as well as for a group of timbers. Furthermore, an additional function provides a tool to sum sapwood probability distributions, comparable to 'summed probability densities' commonly applied to sets of radiocarbon (14C) dates.

Is important, but I think could go into more of a methods/general introduction part of the README, perhaps further down.

I'm not sure what the images show me, and so to communicate this effectively I think they should contain a caption.

I think the target audience could be more clearly stated in the README. Perhaps at the end of the first paragraph.

README has been rewritten according to comments of both reviews.

The 'Get started' vignette provides more detail and examples.

[x] Installation instructions: for the development version of package and any non-standard dependencies in README

All installed well for me!

[x] Vignette(s): demonstrating major functionality that runs successfully locally

It did run successfully locally! T and F should be specified as TRUE and FALSE.

Now TRUE and FALSE are used consistently

[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions

The examples ran without error, using:

devtools::run_examples()

[ ] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

There are no community guidelines in the README, I see them in the file: .github/CONTRIBUTING.md, but these are not linked to in the README. Once these are linked, e.g., by writing something like:

## Code of Conduct

Please note that the visdat project is released with a [Contributor Code of Conduct](https://github.com/hanecakr/fellingdateR/blob/main/.github/CONTRIBUTING.md). By contributing to this project, you agree to abide by its terms.

Community guidelines and code of conduct have been added

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.

All tests pass - unit tests seem quite good coverage, evaluated using devtools::test_coverage().

[ ] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
package name passes checks on available::available("fellingdateR")
I think if possible the author should consider renaming the package to all lowercase, fellingdater or fellingdatr.

Not sure what is the best way to do this. Any practical guidelines? UPDATE: I was a bit afraid to initiate a Git inferno, but apparently it's not that difficult. So the package name had been changed to fellingdater.

There are other considerations that I think mean it does not currently conform to the rOpenSci packaging guidelines. Rather than discuss them in too much depth here, I will put them in the review section below.

Estimated hours spent reviewing: 5

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

You have been added as reviewer in the DESCRIPTION

Review Comments

I wanted to open by saying that while I have a lot of feedback, I think that this is a great piece of software that helps solve a tough problem, so well done on the author for writing this! I hope that the feedback is useful 😄 . Please let me know if something is not clear or if you need help implementing these, or further information. Thank you for submitting this software, I enjoyed reviewing it.

General comments

There are a fair few examples from the rOpenSci packaging guide, which I don't think are followed, I have gone through the guide and written some examples here. After the author makes these changes, I would recommend they double check the guide.

Recommend making sure all functions and objects use snake_case.

All code is in snake_case now

Except for the `read_fh()` function. In the fellingdateR package I build upon the originale code of the read.fh function from the dplR package. I would prefer to stay a close as possible to the original code in the dplR package in order to facilitate future cooperation and possible integration of both functions.

argument name uses x for most data frame inputs. I would recommend considering naming data things data or .data or similar to help distinguish them from a vector, x. Not required but worth considering, I think.
There is some use of cat in the package, recommend using cli as described in the tidyverse style guide on writing error messages. I expand on this below.

cat() no longer used (except for read_fh - see comment above)

Code style is not consistent, there is mixed use of the number of indentations : between 0 and 8 spaces. I would recommend applying the tidyverse style guide to the package with styler::style_pkg()

Code has been restyled using the styler-package

Indenting code is important but this 8 space indentation is not consistent with other indentation used in your package, and when reading the code gives the impression that the code is happening inside some/several if/else/for control statements. I would recommend applying a style guide such as the tidyverse style guide or similar, to the code, so that indentation is consistent.
Due to indentation, a lot of lines of code go over 80 characters. I think it is worth the time to re-indent, or rewrite some code by using explaining variables, so the code doesn't go over 80 characters

Code has been restyled using the styler-package

= is sometimes used over <- - I recommend using <- consistently.

<- is now used consistently as assignment operator

There is no top level documentation for ?fellingdateR - this could be achieved using usethis::use_package_doc().

This has been added

package should use a website. See the ropensci guide on building a website

see https://hanecakr.github.io/fellingdater

internal functions, like d.dens and d.count should have a #' @noRd tag to mark is as an internal function

These function now resided in helper-functions.R with #' @noRD

examples in code should use all argument parameters

Most examples now include all arguments

Recommend the author reads through the CRAN gotchas
What does the sw stand for in things like sw_combine and co?

sw = [s]{.underline}ap[w]{.underline}ood, fd = [f]{.underline}elling [d]{.underline}ate. I've made this more obvious in the README introduction.

Some of the documentation uses reversed backticks, which I haven't seen before, e.g.: ´n_sapwood´ and ´count´

corrected

There are still a few lines of code that don't pass the goodpractice::gp() checks. In particular, I think these comments are important:

  ✖ write short and simple
    functions. These functions
    have high cyclomatic
    complexity (>50): read_fh
    (150). You can make them
    easier to reason about by
    encapsulating distinct steps
    of your function into
    subfunctions.
  ✖ use '<-' for
    assignment instead of '='.
    '<-' is the standard, and R
    users and developers are used
    it and it is easier to read
    your code for them if you use
    '<-'.
  ✖ avoid long code lines,
    it is bad for readability.
    Also, many people prefer
    editor windows that are about
    80 characters wide. Try make
    your lines shorter than 80
    characters
  ✖ avoid sapply(), it is
    not type safe. It might return
    a vector, or a list, depending
    on the input data. Consider
    using vapply() instead.
  ✖ avoid 1:length(...),
    1:nrow(...), 1:ncol(...),
    1:NROW(...) and 1:NCOL(...)
    expressions. They are error
    prone and result 1:0 if the
    expression on the right hand
    side is zero. Use seq_len() or
    seq_along() instead.
  ✖ avoid 'T' and 'F', as
    they are just variables which
    are set to the logicals 'TRUE'
    and 'FALSE' by default, but
    are not reserved words and
    hence can be overwritten by
    the user.  Hence, one should
    always use 'TRUE' and 'FALSE'
    for the logicals.

all codes has been styled with the styler-package, <- , TRUE and FALSE now used consistently, and length of some functions reduced by implementing some helper-functions, e.g. for checking input. Use of sapply and 1:length() has been avoided.

You interchange between using = and <- in your code. I would recommend using <- only. See for example in cor_table.R:

= no longer used as assignment operator

if (is.null(y)) {
     y = x
     noRef = TRUE
}
else {
     noRef = FALSE
     y_ori <- y
}

Error messages. I would recommend building input checking functions to assist in how your write up error messages. There are a few key benefits to this:
- The input checking function then does not get in the way of understanding the intent of your function code.
- You can reuse the input checking functions, so you don't need to write them again.
- using cli to build the error messages allows you to use glue strings, so you don't have to try and quote or inject other information into the message string, it should be easier to add details you care about.
check_input() is now one of the helper functions in helper-functions.R.
Error messages are hard to write well, and it's great that you've included some good input checking! I think you could make the error functions a bit better for the user by following the tidyverse style guide on error messages.
explaining variables. I've mention this a few times in the other functions in your package, I think it would be worthwhile searching through your cases of using if and if there is a long conditional in there, e.g.,

any(pdf_matrix[, 2:length(keycodes) + 1] == 1, na.rm = TRUE))

Then I think it would be worthwhile either writing a small wrapper function to identify this, or wrap that up in an explaining variable.
plot = TRUE as a function option.
- I believe plotting functions should be separate to statistical transformation/operations. You have written these in ggplot, and so you can specify an autoplot method or a separate plot_<function> command.
- The user should be able to reconstruct the plot from the data that you give them in these functions. E.g., they should be able to get the key information out, such as for sw_interval, the following information should be given from the function: n, hdi, and the number of sapwood rings.
- It is not clear to me how to get this information, and I think that this is really important that the user doesn't end up locked into a plot to get their vital statistics. If they want to be able to generate tables or other statistics, then they cannot do this programmatically, they would have to physically eyeball a plot and record down the numbers, like `hdi (95.4%) = between 8 and 26 sapwood rings". Which is prone to errors.

All numerical information needed to build the plots can be found in the output of the sw_model(), sw_interval(), sw_combine() and sw_sum() functions. Their plot argument defaults to plot = FALSE. So the output of e.g sw_combine(trs_example1) can be used as the input for sw_combine_plot()

```{r}
tmp <- sw_combine(trs_example1) 
tmp
sw_combine_plot(tmp)
```

consistent file names. Some of the files have camelCase names (movAv.R), others are snake_case. I would recommend sticking to a consistent naming scheme, snake_case.

all snake_case now

I would try and avoid having else statements contain errors/stops/warnings/messages. This is because in order to understand the message at the end, you need to then walk back up through the condition of logic beforehand. The way to avoid this is to clearly state the error condition at the top.

Defensive programming has been implemented now, avoiding the use of `else` statements followed by a stop/error-message.

Input checking

I would recommend writing small helpers for input checking, and considering using cli to help write error messages, as it means you could transform this:

if (!inherits(x, "rwl"))  {
     warning("'x' is not class rwl")
}
if (!inherits(y, "rwl"))  {
     warning("'y' is not class rwl")
}

Into:

warn_if_not_rwl(x)
warn_if_not_rwl(y)

And that code could look like this:

warn_if_not_rwl <- function(x,
                            arg = rlang::caller_arg(x),
                            call = rlang::caller_env()){
     cli::cli_warn(
          c("{arg}' is not of class {.cls rwl}")
     )
}

Similarly,

increasing_consecutive_years <- all(diff(as.numeric(row.names(x))) == 1)
if (!increasing_consecutive_years) {
     stop(
          "The tree-ring series 'x' have/has no consecutive years in increasing order as rownames."
     )
}

Could be written as a function:

check_if_increasing_consecutive_years(x)
check_if_increasing_consecutive_years(y)

Admittedly, I do have a strong preference for writing these types of functions, having written about it recently, but I do think that at least using explaining variables, which you've already done in places like:

increasing_consecutive_years <- all(diff(as.numeric(row.names(x))) == 1)

Are a great idea, and there are a few notable places where that would help make the code a bit easier to read, e.g.,

any(
length(min_overlap) != 1 |
!is.numeric(min_overlap) |
min_overlap %% 1 != 0 |
min_overlap < 3
)

check_input() is now one of the helper functions in helper-functions.R

smaller checks for input values are now available as a helper-function.

Most examples you give above is from the read_fh() function. see my previous motivation why I would like to stay close to the original dplR::read.fh() code.

cor_table.R

Refactoring values argument of cor_table. There is a lot of input checking for the values argument. I think that things such as :

if ("glk" %in%  values) {

And so on indicate to me that these could be written up as separate functions, which could return a list of their inputs, perhaps. These could then be delivered using switch, which I often forget how to use, but it would be something like:

values_output <- switch(values,
       "glk" = values_glk(inputs),
       "pearson" = values_pearson(inputs))

Examples should demonstrate all types of the inputs for the function arguments.

parameter `values` was removed from the function. Looking back, this is not an option that would be used frequently., and is certainly not required. Removing it from the function allows to shorten the code a bit, and avoids a lot of the necessary checks.

data.R

I would recommend standardising the dataset names to be all lowercase, so that they are easier to remember. E.g., Sohar_2012_FWE_c becomes: sohar_2012_fwe_c

The datasets include names of authors. The names of the datasets can be easily copied from sw_data_overview()

fd_report.R

I think that fd_report could be renamed felling_report or felling_date_report or similar. While fd is concise, I think it doesn't help facilitate discoverability of the functions.

Similar to cor_table.R, I think that:

if (!series %in% names(df)) {
      stop("--> 'series' does not exist")
}
if (!last %in% names(df)) {
      stop("--> 'last' does not exist")
}
if (!n_sapwood %in% names(df)) {
      stop("--> 'n_sapwood' does not exist")
}
if (!waneyedge %in% names(df)) {
      stop("--> 'waneyedge' does not exist")
}

Could be rewritten as check_if_variable_exists(). Something like:

check_if_variable_exists <- function(x,
                                     df,
                                     arg = rlang::caller_arg(x),
                                     call = rlang::caller_env()){
     arg_in_data <- x %in% names(df)
     if (!arg_in_data) {
          cli::cli_abort(
               c("{.arg {arg}} does not exist")
          )
     }
}

example_checker <- function(x, 
                            series = "series", 
                            last = "last"){
     check_if_variable_exists(series, x)
     check_if_variable_exists(last, x)
}

example_checker(mtcars, 
                series = "wrong")

## Error in `check_if_variable_exists()`:
## ! `series` does not exist

The check_input function is now part of helper-functions.R

get_header.R

This function should move the cat message up the top - and should not use cat, instead using one of the cli functions, like cli_abort.

I think you could use structure instead of setting attributes to NULL:

attr(rwl, "row.names") <- NULL
attr(rwl, "po") <- NULL
attr(rwl, "class") <- NULL
attr(rwl, "names") <- NULL

## becomes

rwl <- structure(
          rwl,
          row.names = NULL,
          po = NULL,
          class = NULL,
          names = NULL
     )

Although I think that they are functionally the same, so feel free to ignore!

cat() no longer used

hdi

This function uses = and <- - suggest sticking to just <-

=no longer used, in favour of <-

movAv

I think this starting chunk would be clearer if only if and not else is used.

The stop error can move to the top of this, so we clearly capture if align is not "center" or "right" or "left". This makes it easier to understand the conditions of error.

if (align == "center") {
     before <- floor((w - 1) / 2)
     after  <- ceiling((w - 1) / 2)
} else if (align == "right") {
     before <- w - 1
     after  <- 0
} else if (align == "left") {
     before <- 0
     after  <- w - 1
} else {
     stop("'align' should be 'center', 'left' or 'right'")
}

I suggest using another explaining variable inside mean:

mean(x[max(0, (i - before)):(i + after)], na.rm = TRUE)

## to something like:

earliest_to_latest <- x[max(0, (i - before)):(i + after)]
mean(earliest_to_latest, na.rm = TRUE)

## or given that this is repeated later
## potentially write this up as a function for reuse?
mean_earliest_latest(x, i, before, after)

As that mean statement is a bit involved to unfurl.

Similarly, the pattern, if (edges == "fill") { and } else if (edges == "nofill") { should be bundled up into a function and applied with switch

Checks for edges and fill are now on top of the script. Else statements have been avoided.

read_fh.R

Nice work in the attribution to the other previous work this extends. It looks like this is borrowed from dplR directly, and as such there are small style changes. I think it is worthwhile updating the code style to fit within your package.
Be consistent with naming variables, header.taken should be header_taken etc.
There are a few random comments that I'm not sure need to be there:

        # NEW: verbose = TRUE, header = FALSE
        inp <- readLines(fname, ok = TRUE, warn = FALSE)
        # NEW: removes empty lines in .fh file
        inp <- inp[nchar(inp) != 0]
        ## Get start and end positions of headers and data blocks
        header.begin <- grep("^HEADER:$", inp)
        # NEW: Quadro => chrono
        # NEW: Double => half chrono
lengths <- numeric(n) # commit Ronald Visser

I have found that moving comments either into documentation or into issues to help track them is helpful, but I appreciate that sometimes it is best to leave them in the code, but just something that might be worth thinking about :)

Tidying up the error messages in this function would make some of these nested if/else clauses easier to understand.

This is a pretty massive function, a bit over 1200 lines of code. I would recommend breaking down the steps inside this into smaller functions, as this will make the code easier to reason with and maintain in the future.

In the fellingdateR package I build upon the original code of the read.fh function from the dplR package. I would prefer to stay a close as possible to the original code in the dplR package in order to facilitate future cooperation and possible integration of both functions.

I removed all unnecessary comments as they were highlighting sections where I've made changes to the original code.

dplR::read.fh() concentrates on extracting the measurement data. The fellingdateR::read_fh() function extracts also the descriptive (meta-)data from the HEADER fields in a .fh file. This is not possible with the dplR::read.fh function.

Furthermore the fellingdateR::read_fh function allows to read data in CHRON or HALF-CHRONO format.

read.fh() also throws errors when header fields include Capital letters (depends on the software used to produce the .fh files: TSAP, PAST, ...). read_fh() is case-insensitive

sw_combin_plot.R

This is the first time I've seen ############ comment blocks - I'm all for stylistic choices but I am not sure this is needed, especially if this isn't used in other functions.

comment blocks with #### removed

I've not seen this pattern to avoid R CMD Check notes before

   # to avoid notes in CMD check
   year <-
      p <-
      lower <-
      upper <- COMB <- last <- n_sapwood <- A_i <- agreement <- NULL

My tactic has always been to have a separate definition of these, as answered by Carson Sievert on the posit community paage. I don't think there's anything inherently wrong with that, but I could imagine that in some cases this could accidentally erase inputs. Something to be aware of, perhaps?

When I run devtools::check() I get

❯ checking R code for possible problems ... NOTE  
x: no visible binding for global variable ‘p’

assigning NULL to these variables avoids the notes., as described in R Packages (2e) https://r-pkgs.org/package-within.html#delta-a-failed-attempt-at-making-a-package

I am all for using the new base R pipe |> - however you need to update your Depends in your DESCRIPTION like so in order to use it, since it only came out in R 4.1.0:

Depends: 
    R (>= 4.1.0)

Thanks! Corrected.

This comment should probably live in a github issue or just be removed:

      # NEXT LINE TRIGGERS WARNING
      # Warning message:
      # Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
      # ℹ Please use one dimensional logical vectors instead.
      # ℹ The deprecated feature was likely used in the fellingdateR package.
      # Please report the issue to the authors.
      # { if (nrow(summary |> dplyr::filter(agreement == "poor")) != 0)
      # replaced by:

these comments are removed

sw_combine.R

This error should check each of the conditions separately - either it has missing values, or it is not numeric.

if (any(is.na(endDate)) | !is.numeric(endDate)) {
     stop(
          "--> Please check the column with 'end dates'.
Some values are possibly missing or the values are not numeric"
     )
}

A check_input() function (in helper-functions.R) now takes care of the input

sw_data_info.R

I think these error messages would benefit from using cli, as discussed above.

sw_data_overview.R

This is a nice function to include to facilitate data discovery

sw_interval_plot.R

This code

if (all(
     !(attributes(x)$names) %in% c(
          "year",
          "n_sapwood",
          "p")
))
     stop("Input differs from output sw_interval()")

Could be rewritten as an error function or the condition in if could be expressed as a function.

sw_interval.R

In the final line of documentation for this function there is a hanging sentence:

#' @return Depends on the value of `hdi`.
#'
#'  * If `hdi = TRUE`, a `numeric vector` reporting the upper and lower limit
#'   of the hdi (attributes provide more detail on `credMass` and the applied
#'   sapwood model (`sw_data`)).
#'  * If `hdi = FALSE`, a `matrix` with scaled p values for each number of
#'   observed sapwood rings. This matrix

Well spotted! Corrected.

sw_model.R

Great to see input checking at the top of the function - I do think these should be rewritten as check input functions.

Helper function d.count I think should be put into a separate R file called utils.R or helpers.R

d.count should use switch pattern and pass functions rather than using if controls.

d.count should be d_count

check_input() and d_dens() (instead of d.count) are now part of helper-functions.R

sw_sum_plot.R

indentation in this code is not consistent - recommend applying a style guide.

Examples should show different variations possible for function arguments. E.g., bar_col, spline_col, dot_col, and dot_size should all be specified in the examples so the user can see what the input should/could be.

examples have been updated with more visibility for the different parameters.

sw_sum.R

See note above on including plots.

tests

Do not need to namespace testthat calls, e.g., remove testthat::
consider using snapshot testing, to capture exact values and shape of data that should be stable - rather than always testing for data shape and type columns, this should be able to capture those outputs
consider snapshot error testing to capture exact error messages
consider using vdiffr for testing ggplot plots. See visdat for examples

_Originally posted by @njtierney in https://github.com/ropensci/software-review/issues/618#issuecomment-1867354502_

hanecakr commented 3 months ago

Dear @ajpelu. You comments and suggestions in your review report have been most helpfull to improve the package. Thank you for your time investment! I've copy-pasted the review report and inserted my replies below.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:
[ ] A statement of need: clearly stating problems the software is designed to solve and its target audience in README

I think the target audience of the pkg would be indicated more explicitly.

The images are not very illustrative of what the package does, they simply seem to indicate that they are wooden tokens. Perhaps it would be interesting to think of a schematic or similar, but the photos do not seem to provide much.
[ ] Installation instructions: for the development version of package and any non-standard dependencies in README

In the installation instructions, it might be more effective to provide a link to the GitHub repository of the package rather than displaying the raw GitHub link itself.

Why does the author prefer to use the package pak rather than devtools to install the development version of the pkg? I see that in the vignette he indicated the two ways (pak and devtools).

I serve at a government agency and mostly work from behind a firewall. Installing packages with devtool::install_packages() often fails (timeout) but miraculously works fine with pak::pak(). I have no idea why and it is probably related to the settings of the firewall, but therefore I prefer pak over devtools now.

[ ] Vignette(s): demonstrating major functionality that runs successfully locally

The vignette outlines the primary functions but lacks a straightforward workflow for beginners.

There are missing citations in the vignette. Please add a reference list at the end.

references added as url's to the publication

In the vignette of the website it appears:

 sw_sum(fellingdateR:::trs_example7, plot = TRUE)

but when I run locally i got:

 sw_sum(fellingdateR:::trs_example7, plot = TRUE)
 Error in as.data.frame(x) : object 'trs_example7' not found

Please consider change the three ::: by ::

redundant use of ::: has been removed from the examples in the vignette

[x] Function Documentation: for all exported functions

[x] Examples: (that run successfully locally) for all exported functions

[ ] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

No community guidelines found, only in DESCRIPTION there is a BugReports linked.

Guidelines have been added to th README

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software have been confirmed.
[x] Performance: Any performance claims of the software have been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.

All test pass. The coverage is 77.44 % (using devtools::test_coverage())
[ ] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
- Please consider to use lowercase name as indicated by 1.1.1 point of the packaging guidelines

    snake_case and lowercase naming have been implemented
    consistently
Not all the functions follow the object_verb() naming scheme. For instance, the functions read_fh and get_header. I would suggest to author rename them to fh_read and fh_header
Changed get_header to fh_header, but to highlight the close
similarity to the original read.fh() function in the dplR
package, I would like to keep read_fh as the function name.
The movAv function's name deviates from the consistent use of snake_case observed in the rest of the functions. To maintain uniformity, it would be preferable to rename it in accordance with the snake_case as indicated in the packaging guidelines
Changed to mov_av()
In the get_header, movAv, read_fh functions please consider change cat() or print() for message or warning, as packaging guidelines indicated.
cat() no longer used in any of the functions
Estimated hours spent reviewing: 13

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

I've included you as a reviewer in the DESCRIPTON

Review Comments

Thank you for your contribution with the fellingdateR package. It provides a valuable suite of functions for estimating, reporting, and combining felling dates of historical tree-ring series. The comprehensive set of tools makes it easier to infer felling date estimates, considering factors like preserved sapwood or waney edge.

Given my expertise and interest in dendrochronology, I thoroughly enjoyed reviewing this package. The functions enable users to estimate felling date ranges for both individual series and groups of timbers, offering a valuable feature for summing sapwood probability distributions.

This being my first software review for Ropensci, I apologize for any potential errors or unclear comments. While I've provided various comments (a lot of them), feel free to focus on those that are most relevant or useful. I commend you for effectively addressing a challenging problem with this package. If you need clarification or assistance implementing the suggestions, please reach out. Thank you for submitting this software; the review process was a pleasure!

General comments

The data description lacks a consistent pattern. The author occasionally employs country abbreviations while at other times refrains from doing so. Let's strive for coherence in this regard.

This is because some of de provided sapwood datasets do not
cover a specific country, but rather a region, or are composed of
observations of timber from e.g. Norway but the timbers themselves
have been found in the Netherlands (van Daalen_Norway).*** ***So I
named the sapwood datasets according to their authors (and
publication date if applicable), and a few letters to describe the
region/country they cover or method used.

The names of the data sets can be easily copied from the output
of `sw-data_overview()`
Some code and documentation overpass the 80 characters (particularly data.R). Please consider adjust to this recommended length.

All code has been styled using the stylerpackage , and should now adhere to to the Tidyverse styleguide.

The README file does not contain any information about how to cite. I found that one in the CITATION file and also in the website of the package. According to Ropensci packaging guidelines the README need to include information about how to cite.

Citation guidelines have been added tot the README

Please consider generate a theme for all plots of your package in a utils.R function

theme_fdr() is now part of helper-functions.R and implemented in all plotting functions.

I found very useful the /paper/paper.md, and I think much of this information would improve also the documentation and the website of the package. For instance, the fellingdateR_workflow is very useful to understand how the package works. Would the author consider include it in the REAMDE or in the DESCRIPTION of the package instead of the current pictures? I think the former is more informative than the latter

I've inserted the workflow in the README. The paper.md file is a first draft for a paper to be submitted to JOSS, after software review..

Why is there no auxiliary function to sw_model (eg. sw_model_plot) like the rest of the functions in the package (eg. sw_interval and sw_interval_plot)? This could lighten the function size and also would be coherent with the rest of the package.

There is now a new function named sw_model_plot() as suggested.

The code style isn't consistent. Please consider use the styler package.

I've used the styler package to reformat all code.

I didn't find top-level documentation:

I've added a fellingdateR-package.R file with top-level documentation

 ?fellingdateR
 #No documentation for ‘fellingdateR’ in specified packages and libraries:
 #you could try ‘??fellingdateR’```
 Pleas consider generate ?fellingdateR or `?fellingdateR-package`. See  https://devguide.ropensci.org/building.html#general

The package has a well-structured website, however I thin much of the information included in the paper.md is not in the website. For instance the background and Statement of need within the paper.md is very useful and I think it would help to the potential users of the pacakge. Please consider include in the README or in an specific vignette.

The README has been rewritten, including some parts of the paper.md.

The vignette "getting started" includes some citations (eg. Hollstein 1980) but not a full reference for this citations. Please include them at the end of the vignette.

Thanks! Bibliographic references are included as url's/doi's now.

Please be consistent with the use of = or \<-, but avoid mixed them. For instance in hdi.R

Now there is a consistent use of <- as an assignment operator

spell check: DESCRIPTION does not contain 'Language' field. Defaulting to 'en-US'. Please consider to specify.

en-GB is now set as 'Language'

Is there code duplication in the package that should be reduced? Yes for instance the ggthemes within the plot functions

theme_fdr() is now part of helper-functions.R and implemented in all plotting functions

Coverage

Please consider review the test coverage of the movAv.R function (\~57.14%)

test coverage increased for mov_av()

Goodpractices

After run the goodpractices::gp() several code lines does not pass the check:

✖ write short and simple functions. These functions have high cyclomatic complexity (>50): read_fh (150). You can make them easier to reason about by encapsulating distinct steps of your function into subfunctions.

This is particularly true for the read_fh function. Since the .fh file structure is quite flexible and combines both meta-data and measurement data into one format, the code is quite lengthy. Furthermore, in the fellingdateR package I build upon the original code of the read.fh function from the dplR package. I would prefer to stay a close as possible to the original code in the dplR package in order to facilitate future cooperation and possible integration of both functions

✖ use '\<-' for assignment instead of '='. '\<-' is the standard, and R users and developers are used it and it is easier to read your code for them if you use '\<-'.

Now there is a consistent use of \<- as an assignment operator
R/cor_table.R:108:18
R/cor_table.R:109:22
R/cor_table.R:112:22
R/hdi.R:48:12
R/hdi.R:49:14
... and 2 more lines

✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines shorter than 80 characters

styler package was used to reformat all code

data-raw/DATASETS.Rmd:17:81
data-raw/DATASETS.Rmd:19:81
data-raw/DATASETS.Rmd:35:81
data-raw/DATASETS.Rmd:51:81
data-raw/DATASETS.Rmd:68:81
... and 153 more lines

✖ avoid sapply(), it is not type safe. It might return a vector, or a list, depending on the input data. Consider using vapply() instead.

 R/cor_table.R:192:20
 R/cor_table.R:195:20

sapply() no longer used

✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...) expressions. They are error prone and result 1:0 if the expression on the right hand side is zero. Use seq_len() or seq_along() instead.

 R/fd_report.R:162:19
 R/hdi.R:54:16
 R/sw_combine.R:123:27
 R/sw_combine.R:180:27
 R/sw_combine.R:361:35
 ... and 1 more lines

seq_len() is now used instead of 1:length(...)

✖ fix this R CMD check NOTE: Note: found 11 marked UTF-8 strings

✖ avoid 'T' and 'F', as they are just variables which are set to the logicals 'TRUE' and 'FALSE' by default, but are not reserved words and hence can be overwritten by the user. Hence, one should always use 'TRUE' and 'FALSE' for the logicals.

TRUE/FALSE now used consistently

R/sw_combine_plot.R:NA:NA
R/sw_combine_plot.R:NA:NA
R/sw_combine_plot.R:NA:NA
R/sw_combine_plot.R:NA:NA
R/sw_combine_plot.R:NA:NA
... and 12 more lines

`read_fh.R`

Whats new funtionalities include read_fh not covered by dplR::read.fh()?

dplR::read.fh() concentrates on extracting the measurement
data. The fellingdateR::read_fh() function extracts also the
descriptive (meta-)data from the HEADER fields in a .fh file. This
is not possible with the dplR::read.fh function.

Furthermore the fellingdateR::read_fh function allows to read
data in CHRON or HALF-CHRONO format.

read.fh() also throws errors when header fields include Capital
letters (depends on the software used to produce the .fh files:
TSAP, PAST, ...). read_fh() is case-insensitive.
There are several code commented that could be removed. eg. # NEW: verbose = TRUE, header = FALSE Please revise them

Indeed. The comments highlight what sections have been added compared to the original dplR::read.fh function. I've removed the `NEW:` comments in the current version of read_fh()

The nomenclature of the objetcs exhibits a mix of snake_case and dot.case. Let's ensure consistency in the naming convention.

In order to stay as close as possible to the original dplR::read.fh function (and to facilitate possible future integration of both functions) I haven't changed any of the variable names in the code. For all other functions in the package snake_case was implemented.

The function appears to be extensive. Has the author explored the possibility of breaking it down into several auxiliary functions to enhance readability and maintainability?

See reply on bullet point 3

In the function documentation, the author mentions the Heidelberg format file. It would be beneficial to include a link to the specifications of this format file for reference, eg. https://www.treeringsociety.org/resources/SOM/Brewer_Murphy_SupplementaryMaterial.pdf

Perfect! I've added the link in the description

get_header.R

Please consider include some defensive programming such us:

if (!is.data.frame(rwl) || !inherits(rwl, "rwl")) {
 stop("Input should be a data.frame of class 'rwl'")
}

avoid cat as indicated in packaging guidelines

The function now starts with a check of the input and no longer
requires cat (or equivalent).

The get_header function was renamed to `fh_header`.

`cor_table.R`

Very interesting and useful function. Has the author contemplated breaking down this function into more modular components? For instance, each of the correlation values could potentially be extracted as auxiliary functions, providing users with the flexibility to call them individually as needed.

Certainly considered, but this would require a full rewrite of
the current function. Then it should also include the possibility
to use the separate correlation values for crossdating a series on
all positions along a reference series. For this package however,
the goal is to check the date assigned to the tree-ring series and
to find the chronology that matches best in order to decide which
sapwood model to use (depending on timber provenance). So I would
like to keep the function as it is, and have the breakdown into
modular components for each correlation value on the todo-list for
a future project/package.

The nomenclature of arguments exhibits a mix of snake_case and dot.case. Let's ensure consistency in the naming convention.

now all snake_case
Remove code comments in the argument section, eg.

Removed

...
output = "table",
#c("matrix", "table")
...

In the documentation for the argument sort_by, it would be useful to add the choices to avoid errors. For instance,

#' @param sort_by The correlation value by which the output is sorted for each
#'   series in `x`. One of "r_pearson", "t_St", "glk", "glk_p", "t_BP", "t_Ho". Default to "t_Ho"

Added to the documentation

Ensure uniformity in assignment by using the same operator consistently---avoid mixing <- and =

= no longer used as assignment operator

Consider avoid the use of sapply. See this https://adv-r.hadley.nz/functionals.html. For instance, consider

 y <- y[, apply(!is.na(y), 2, sum) > 3, drop = FALSE]

instead of

 y <-
                y[, sapply(y, function(col)
                     sum(!is.na(col))) > 3, drop = FALSE]

sapply() no longer used in this, or other functions

sw_model.R

Ensure uniformity in assignment by using the same operator consistently---avoid mixing <- and =. eg. n_obs = sum(observed$count)

`=` no longer used as assignment operator
I'm not sure if incorporating the reading of a csv is optimal in a function. What if a user wants to input a dataframe that isn't from a CSV file? I propose modifying the function to accept a dataframe or tibble with mandatory n_sapwood and count columns. Additionally, allowing users to import from a CSV could be achieved by either creating a small auxiliary function or simply utilizing utils::read.csv(). This also reduce the size of the sw_model function. Please consider this for all those functions with the same issue.

You are right! I might have over complicated the sw-functions with this. So I removed this option from this and all other sw_functions. Now a user-defined custom data set - adata.frame with columns n_sapwood and count - can be supplied to the function.

I think it is better to have a separate function for plotting, which could be called after the computation is completed by sw_model, for instance, sw_model_plot. This also allow to user modify the plot generated. For instance if the user doesn`t like the grid within the plot. In the current form to do that the user need to type:

 s <- sw_model("Hollstein_1980", plot  = TRUE)
 s + theme()panel.grid = element_blank()

sw_model_plot() was added to the package

So if there is an specific function to plot the data, this function would include the ... in the arguments, to allow include other ggplot2 arguments.

Anycase, in the plot section would be better to use .data, for instance, in

ggplot2::geom_segment(
data = hdi_model,
ggplot2::aes(
 x = hdi_model[[1]],
 xend = hdi_model[[2]],
 y = 0,
 yend = 0
),
colour = "grey30",
linewidth = .8,
alpha = 0.8
)

I suggest to use .data[[1]]

ggplot2::geom_segment(
data = hdi_model,
ggplot2::aes(
 x = .data[[1]],
 xend = .data[[2]],
 y = 0,
 yend = 0
),
colour = "grey30",
linewidth = .8,
alpha = 0.8
)

no longer needed in the new function sw_model_plot()

In the documentation of sw_model, the values of the densfun parameter are not in italics and are enclosed in double quotation marks. It doesn't matter, but please maintain consistency with the rest of the documentation. For example, in sw_interval.R, they are in italics.

densfun documentation in italic now

sw_interval.R

Please remove code commented

   # Add calendar years to output when y is provided
   # if (last == 0) {
   #         attr(hdi_int, "credMass") <- credMass
   #         attr(hdi_int, "sapwood_data") <- sw_data
   #         attr(hdi_int, "model") <- densfun
   #    }
   # if (last != 0) {

comments have been removed
In this function, the separation of the plot part from the computation one is appreciated. A further step would be to remove the plotting option completely from the function, and specified in the documentation (and also in the vignette) to use the sw_interval_plot function to obtain the plot.

plot argument is now set to plot = FALSE as default. The numeric output of the function can then be supplied to the plotting functions to create the visualization.

sw_interval_plot.R

I would suggest to add the ellipsis argument (i.e. ...) to allow another ggplot2 arguments. See also ellipsis pkg for more info. Consider apply this throughout the plot functions of the package.

Which notes are the author trying to avoid with?

When I run devtools::check() I get

 ❯ checking R code for possible problems ... NOTE  
 x: no visible binding for global variable ‘p’

assigning NULL to these variables avoids the notes., as
described in R Packages (2e)
<https://r-pkgs.org/package-within.html#delta-a-failed-attempt-at-making-a-package>

# to avoid notes in CMD check
p.x <- upper <- year <- NULL

¿any alternative to solve them?

Please, consider change this code

     if (all(
             !(attributes(x)$names) %in% c(
                     "year",
                     "n_sapwood",
                     "p")
     ))
     stop("Input differs from output sw_interval()")

by:

if (!all(c("year", "n_sapwood", "p") %in% names(attributes(x)))) {
stop("Input structure differs from the expected output of sw_interval()")
}

OK. I've replaced this section in the code

You mixed snake_case with dot.case in the naming of objects. I would suggest to choose one and be consistent.

should_be_consistent_now

What this code snippet do? If I understand correctly, the sw_interval_plot requires the output of sw_interval, so is a .csv format an output of the sw_interval?

The following was deleted because the option to use a .csv file as input was removed from all functions.
if (grepl("\\.csv$", sw_data.p))
sw_data.p <- basename(sw_data.p)

Please specify the meaning of the 'p' on the y-axis.

p = probability. documented now in \@return

`sw_combine.R`

This function requires a dataframe, as indicated in the documentation. However, there is no check for that; instead, the first line forces x to be a data.frame. I would suggest to add a defensive programming line such as:
```
if (!is.data.frame(x)) {
 stop("Input 'x' must be a dataframe.")
}
```

More defensive programming is now implemented. check_input() is now a helper function (helper-functions.R).

Camel case and snake_case styles are mixed in the function. Please be consistent.

all snake_case now

There are some duplications in the function, for instance pdf_matrix[, 1] <- timeAxis

now only mentioned once

Please revise the comments within the function. Are they all necessary? Example:

       # when multiple exact felling dates are listed that do no correspond
       # --> COMB = 0 and after scaling NaN (division by 0)
       # check rowwise if there is any p-value == 1,
       # and replace COMB at that position with 1

Most comments have been removed

Perhaps this function could be modularized into several simpler functions. Initially, a function could evaluate whether the dataset contains all series with preserved sapwood or if any series has an exact felling date, among other conditions (as indicated in the examples of the function). Subsequently, based on the initial evaluation, the function could call auxiliary functions.

A check_input() function (in helper-functions.R) now takes care of the input

In any case, the function should provide detailed information about the approaches used for each case.

This is explained in the Get started vignette.

Please, see my comment about the data input in csv format for the sw_model.R

Input from .csv file no longer supported.

The elements of the output list generated by this function are not described. Please add to the documentation.

Added to the documentation (\@return)

sw_combine_plot.R

Please consider taking out the rescale function out of the sw_combine_plot function.

`rescale()`is now part of helper-functions.R
A lot of mixing T, F with TRUE and FALSE. Please be consistent.

only TRUE and FALSE are used now

Camel case and snake_case styles are mixed in the function. Please be consistent..

all snake_case

I would suggest to allow the user customize the color used. See my comment in the below section (Plot functions).

now possible to choose colors for fill and line.

Remove the commented codes inside the function:

comments removed
# NEXT LINE TRIGGERS WARNING
# Warning message:
# Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
# ℹ Please use one dimensional logical vectors instead.
# ℹ The deprecated feature was likely used in the fellingdateR package.
# Please report the issue to the authors.
# { if (nrow(summary |> dplyr::filter(agreement == "poor")) != 0)
# replaced by:

The function lacks an axis legend. I presume that the x-axis corresponds to calendar time, and the y-axis corresponds to 'p'. Is my understanding correct?

Correct. Label added to X-axis.
Any way to indicate the "A_i" as real subscript? maybe parse = TRUE within geom_text?

Finally found out how to do this ;-) Done.

Please add information in the documentation about the agregation index

Added to the documentation

In the second example:

 combo <- fellingdateR::sw_combine(trs_example2)
 fellingdateR::sw_combine_plot(combo)

I removed ::from the example.

What do the arrow and the dot refer? Please include in the documentation.

`sw_sum.R`

In the documentation there is a mistake. In the description of the last argument please add a blank space in xcontaining

Thanks! Corrected
In the documentation the values of the densfun parameter are not in italics and are enclosed in double quotation marks. It doesn't matter, but please maintain consistency with the rest of the documentation. For example, in sw_interval.R, they are in italics.

italic and consistent now

Regarding defensive programming idem comment that for sw_combine. This function requires a dataframe, as indicated in the documentation. However, there is no check for that; instead, the first line forces x to be a data.frame. I would suggest to add a defensive programming line such as:

defensive programming with helper function check_input() implemented

    if (!is.data.frame(x)) {
     stop("Input 'x' must be a dataframe.")
   }

`sw_sum_plot.R`

Good point here using customizable colors. Let's implement this feature consistently across other plot functions for a cohesive and customizable user experience.

Done for sw_model_plot, sw_interval_plot and sw_sum
Can a user change the width of the moving average window used within the sw_sum_plot()? It seems that the sw_sum_plot() uses always the same.

Extra parameter added to tweak the window width of the smoother

Idem with the splines used.

See previous bullet

What do the blue dots, blue bars and red line mean? Please indicate in the documentation

The dots represent exact felling dates (with waney edge). Probability of 1 assigned to 1 specific year.

Added to documentation

Other comments

I am surprised that different functions (for instance sw_combine and sw_sum), that accessing similar subdata within a data set, use different approaches:
```
cambium <- df[[waneyedge]] 
```
and
```
cambium <- x[, waneyedge]
```
Please be consistent along the package.

Now same approach in all functions.

Please review the naming conventions, including CamelCase, snake_case, and dot.case, used for objects throughout the package. It is recommended to choose a consistent style, and I would suggest using snake_case. Ensure uniformity across the package for improved clarity and maintainability.

Done. All snake_case now.

I'm not sure if incorporating the reading of a csv is optimal in a function. What if a user wants to input a dataframe that isn't from a CSV file? I propose modifying the function to accept a dataframe or tibble with mandatory n_sapwood and count columns. Additionally, allowing users to import from a CSV could be achieved by either creating a small auxiliary function or simply utilizing utils::read.csv(). This also reduce the size of the sw_model function.

This option has been removed.

In the different plot functions, the authors uses a common theme. It might be useful to create an auxiliary function with a ggplot2::theme() specification. Then, each plot function could used this auxiliary function by default or also allow to the user to specify a custom theme.

theme_fdr() is now part of helper-functions.R and implemented in all plotting functions.

I would suggest changing the specified colors in each plot function to custom arguments, allowing users to choose their preferred colors. For instance, in th sw_interval_plot the colors of the area specified by

                 ggplot2::geom_area(
                         ggplot2::aes(x = ifelse(year >= lower, year, NA),
                                      y = p.x),
                         fill = "tomato3",
                         color = "tomato3",
                         alpha = 0.3,
                         linewidth = 1

could be indicated as arguments, e.g. area_fill and area_color.

Done for sw_sum_plot, sw_interval_plot and sw_model.plot

Code style

All code/functions/files have been styled according to the tidyverse styleguide using styler::style_file(…) to avoid inconsistencies in indentation and to avoid code goes over 80 characters

Now <- is used consistently throughout the package (some code in the previous version included = for assignment.`

TRUE instead of T

README

The package description in the README.md was revised according to the comments of both reviewers.

The original illustration was removed, and replaced by an image of a piece of historical timber with sapwood. The question mark highlights what this package basically does: estimate the missing number of sapwood rings.

Now README.md has a final paragraph with a link to the Contributor Code of Conduct

## Comments and contributions

Please report any issues or bugs here: https://github.com/hanecakr/fellingdateR/issues.

Get citation information for `fellingdater` in R typing `citation(package = 'fellingdater')`.

Please note that the `fellingdateR` package is released with a Contributor Code of Conduct.]>>>(https://github.com/hanecakr/fellingdateR/blob/main/.github/CONTRIBUTING.md).) By contributing to this project, you agree to abide by its terms.

cor_table

parameter `values` was removed from the function. Looking back, this is not an option that would be used frequently., and is certainly not required. Removing it from the function allows to shorten the code a bit.

hanecakr commented 3 months ago

It was suggested to rename the package to fellingdater instead of fellingdateR. Any tips on how to do this without breaking links and initializing git issues? @maelle @ajpelu @njtierney

hanecakr commented 3 months ago

I have a first draft of a paper ["paper/paper.md"] that I want to submit to JOSS if/when I manage to address all issues raised in the review proces. Can this paper remain a .md file under the paper-folder, should I remove it from the package until after review by JOSS, or should I already reformat it to a vignette?

maelle commented 3 months ago

@hanecakr thank you! In your answer, it is hard for me to see what is quoted from the reviews, and what is your answer. Could you please consistently use quote formatting > for the parts that come from reviewers please? Thank you!

ropensci / software-review