## Reply to package review by @njtierney

Again a big thank you for your review report and time. I was not able to address all issues raised earlier, but have now found some time to work on the package and to provide an answer to your comments and suggestions. I've copy-pasted the review report and inserted my replies below.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

I have added you as a reviewer in de DESCRIPTION

Documentation

The package includes all the following forms of documentation:

[ ] A statement of need: clearly stating problems the software is designed to solve and its target audience in README

The opening paragraphs of the README are good, and I think that this R package solves a challenging problem, so firstly, well done! I think could be made a little bit clearer in terms of the problem it solves, and the input it takes. While I find the photos useful, it initially made me think that this software takes images as input. I would suggest something more like what is in the vignette to start:

fellingdateR offers a set of functions that assist in inferring felling date estimates from dated tree-ring series.

Then, describe the problem you want to solve, which I think is estimating when the timber was cut down. Then show the data, explain what the columns mean, and how this might be a typical example of dated tree-ring series data.

Then show a short example of the output, clearly demonstrating the problem the package solves.

The rest of the first paragraph:

The presence of (partially) preserved sapwood or waney edge allows to estimate a range for the actual felling date, for individual series as well as for a group of timbers. Furthermore, an additional function provides a tool to sum sapwood probability distributions, comparable to 'summed probability densities' commonly applied to sets of radiocarbon (14C) dates.

Is important, but I think could go into more of a methods/general introduction part of the README, perhaps further down.

I'm not sure what the images show me, and so to communicate this effectively I think they should contain a caption.

I think the target audience could be more clearly stated in the README. Perhaps at the end of the first paragraph.

README has been rewritten according to comments of both reviews.

The 'Get started' vignette provides more detail and examples.

[x] Installation instructions: for the development version of package and any non-standard dependencies in README

All installed well for me!

[x] Vignette(s): demonstrating major functionality that runs successfully locally

It did run successfully locally! T and F should be specified as TRUE and FALSE.

Now TRUE and FALSE are used consistently

[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions

The examples ran without error, using:

devtools::run_examples()

[ ] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

There are no community guidelines in the README, I see them in the file: .github/CONTRIBUTING.md, but these are not linked to in the README. Once these are linked, e.g., by writing something like:

## Code of Conduct

Please note that the visdat project is released with a [Contributor Code of Conduct](https://github.com/hanecakr/fellingdateR/blob/main/.github/CONTRIBUTING.md). By contributing to this project, you agree to abide by its terms.

Community guidelines and code of conduct have been added

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.

All tests pass - unit tests seem quite good coverage, evaluated using devtools::test_coverage().

[ ] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
package name passes checks on available::available("fellingdateR")
I think if possible the author should consider renaming the package to all lowercase, fellingdater or fellingdatr.

Not sure what is the best way to do this. Any practical guidelines?
There are other considerations that I think mean it does not currently conform to the rOpenSci packaging guidelines. Rather than discuss them in too much depth here, I will put them in the review section below.

Estimated hours spent reviewing: 5

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

You have been added as reviewer in the DESCRIPTION

Review Comments

I wanted to open by saying that while I have a lot of feedback, I think that this is a great piece of software that helps solve a tough problem, so well done on the author for writing this! I hope that the feedback is useful 😄 . Please let me know if something is not clear or if you need help implementing these, or further information. Thank you for submitting this software, I enjoyed reviewing it.

General comments

There are a fair few examples from the rOpenSci packaging guide, which I don't think are followed, I have gone through the guide and written some examples here. After the author makes these changes, I would recommend they double check the guide.

Recommend making sure all functions and objects use snake_case.

All code is in snake_case now

Except for the `read_fh()` function. in the fellingdateR package I build upon the originale code of the read.fh function from the dplR package. I would prefer to stay a close as possible to the original code in the dplR package in order to facilitate future cooperation and possible integration of both functions.
argument name uses x for most data frame inputs. I would recommend considering naming data things data or .data or similar to help distinguish them from a vector, x. Not required but worth considering, I think.
There is some use of cat in the package, recommend using cli as described in the tidyverse style guide on writing error messages. I expand on this below.

cat() no longer used (except for read_fh - see comment above)
Code style is not consistent, there is mixed use of the number of indentations : between 0 and 8 spaces. I would recommend applying the tidyverse style guide to the package with styler::style_pkg()

Code has been restyled using the styler-package
Indenting code is important but this 8 space indentation is not consistent with other indentation used in your package, and when reading the code gives the impression that the code is happening inside some/several if/else/for control statements. I would recommend applying a style guide such as the tidyverse style guide or similar, to the code, so that indentation is consistent.
Due to indentation, a lot of lines of code go over 80 characters. I think it is worth the time to re-indent, or rewrite some code by using explaining variables, so the code doesn't go over 80 characters

Code has been restyled using the styler-package
= is sometimes used over <- - I recommend using <- consistently.

<- is now used consistently as assignment operator
There is no top level documentation for ?fellingdateR - this could be achieved using usethis::use_package_doc().

This has been added
package should use a website. See the ropensci guide on building a website

see https://hanecakr.github.io/fellingdateR
internal functions, like d.dens and d.count should have a #' @noRd tag to mark is as an internal function

These function now resided in helper-functions.R with #' @noRD
examples in code should use all argument parameters

Most examples now include all arguments
Recommend the author reads through the CRAN gotchas
What does the sw stand for in things like sw_combine and co?

sw = [s]{.underline}ap[w]{.underline}ood, fd = [f]{.underline}elling [d]{.underline}ate. I've made this more obvious in the README introduction.
Some of the documentation uses reversed backticks, which I haven't seen before, e.g.: ´n_sapwood´ and ´count´

corrected
There are still a few lines of code that don't pass the goodpractice::gp() checks. In particular, I think these comments are important:

  ✖ write short and simple
    functions. These functions
    have high cyclomatic
    complexity (>50): read_fh
    (150). You can make them
    easier to reason about by
    encapsulating distinct steps
    of your function into
    subfunctions.
  ✖ use '<-' for
    assignment instead of '='.
    '<-' is the standard, and R
    users and developers are used
    it and it is easier to read
    your code for them if you use
    '<-'.
  ✖ avoid long code lines,
    it is bad for readability.
    Also, many people prefer
    editor windows that are about
    80 characters wide. Try make
    your lines shorter than 80
    characters
  ✖ avoid sapply(), it is
    not type safe. It might return
    a vector, or a list, depending
    on the input data. Consider
    using vapply() instead.
  ✖ avoid 1:length(...),
    1:nrow(...), 1:ncol(...),
    1:NROW(...) and 1:NCOL(...)
    expressions. They are error
    prone and result 1:0 if the
    expression on the right hand
    side is zero. Use seq_len() or
    seq_along() instead.
  ✖ avoid 'T' and 'F', as
    they are just variables which
    are set to the logicals 'TRUE'
    and 'FALSE' by default, but
    are not reserved words and
    hence can be overwritten by
    the user.  Hence, one should
    always use 'TRUE' and 'FALSE'
    for the logicals.

all codes has been styled with the styler-package, <- , TRUE and FALSE now used consistently, and length of some functions reduced by implementing some helper-functions, e.g. for checking input. Use of sapply and 1:length() has been avoided.

You interchange between using = and <- in your code. I would recommend using <- only. See for example in cor_table.R:

= no longer used as assignment operator

if (is.null(y)) {
     y = x
     noRef = TRUE
}
else {
     noRef = FALSE
     y_ori <- y
}

Error messages. I would recommend building input checking functions to assist in how your write up error messages. There are a few key benefits to this:
- The input checking function then does not get in the way of understanding the intent of your function code.
- You can reuse the input checking functions, so you don't need to write them again.
- using cli to build the error messages allows you to use glue strings, so you don't have to try and quote or inject other information into the message string, it should be easier to add details you care about.
check_input() is now one of the helper functions in helper-functions.R.
Error messages are hard to write well, and it's great that you've included some good input checking! I think you could make the error functions a bit better for the user by following the tidyverse style guide on error messages.
explaining variables. I've mention this a few times in the other functions in your package, I think it would be worthwhile searching through your cases of using if and if there is a long conditional in there, e.g.,

any(pdf_matrix[, 2:length(keycodes) + 1] == 1, na.rm = TRUE))

Then I think it would be worthwhile either writing a small wrapper function to identify this, or wrap that up in an explaining variable.
plot = TRUE as a function option.
- I believe plotting functions should be separate to statistical transformation/operations. You have written these in ggplot, and so you can specify an autoplot method or a separate plot_<function> command.
- The user should be able to reconstruct the plot from the data that you give them in these functions. E.g., they should be able to get the key information out, such as for sw_interval, the following information should be given from the function: n, hdi, and the number of sapwood rings.
- It is not clear to me how to get this information, and I think that this is really important that the user doesn't end up locked into a plot to get their vital statistics. If they want to be able to generate tables or other statistics, then they cannot do this programmatically, they would have to physically eyeball a plot and record down the numbers, like `hdi (95.4%) = between 8 and 26 sapwood rings". Which is prone to errors.
All numerical information needed to build the plots can be found in the output of the sw_model(), sw_interval(), sw_combine() and sw_sum() functions. Their plot argument defaults to plot = FALSE. So the output of e.g sw_combine(trs_example1) can be used as the input for sw_combine_plot()
```
tmp <- sw_combine(trs_example1) 
tmp
sw_combine_plot(tmp)
```
consistent file names. Some of the files have camelCase names (movAv.R), others are snake_case. I would recommend sticking to a consistent naming scheme, snake_case.

all snake_case now
I would try and avoid having else statements contain errors/stops/warnings/messages. This is because in order to understand the message at the end, you need to then walk back up through the condition of logic beforehand. The way to avoid this is to clearly state the error condition at the top.

Defensive programming has been implemented now, avoiding the use of `else` statements followed by a stop/error-message.

Input checking

I would recommend writing small helpers for input checking, and considering using cli to help write error messages, as it means you could transform this:

if (!inherits(x, "rwl"))  {
     warning("'x' is not class rwl")
}
if (!inherits(y, "rwl"))  {
     warning("'y' is not class rwl")
}

Into:

warn_if_not_rwl(x)
warn_if_not_rwl(y)

And that code could look like this:

warn_if_not_rwl <- function(x,
                            arg = rlang::caller_arg(x),
                            call = rlang::caller_env()){
     cli::cli_warn(
          c("{arg}' is not of class {.cls rwl}")
     )
}

Similarly,

increasing_consecutive_years <- all(diff(as.numeric(row.names(x))) == 1)
if (!increasing_consecutive_years) {
     stop(
          "The tree-ring series 'x' have/has no consecutive years in increasing order as rownames."
     )
}

Could be written as a function:

check_if_increasing_consecutive_years(x)
check_if_increasing_consecutive_years(y)

Admittedly, I do have a strong preference for writing these types of functions, having written about it recently, but I do think that at least using explaining variables, which you've already done in places like:

increasing_consecutive_years <- all(diff(as.numeric(row.names(x))) == 1)

Are a great idea, and there are a few notable places where that would help make the code a bit easier to read, e.g.,

any(
length(min_overlap) != 1 |
!is.numeric(min_overlap) |
min_overlap %% 1 != 0 |
min_overlap < 3
)

check_input() is now one of the helper functions in helper-functions.R
smaller checks for input values are now available as a helper-function.
Most examples you give above is from the read_fh() function. see my previous motivation why I would like to stay close to the original dplR::read.fh() code.

cor_table.R

Refactoring values argument of cor_table. There is a lot of input checking for the values argument. I think that things such as :

if ("glk" %in%  values) {

And so on indicate to me that these could be written up as separate functions, which could return a list of their inputs, perhaps. These could then be delivered using switch, which I often forget how to use, but it would be something like:

values_output <- switch(values,
       "glk" = values_glk(inputs),
       "pearson" = values_pearson(inputs))

Examples should demonstrate all types of the inputs for the function arguments.

parameter `values` was removed from the function. Looking back, this is not an option that would be used frequently., and is certainly not required. Removing it from the function allows to shorten the code a bit, and avoids a lot of the necessary checks.

data.R

I would recommend standardising the dataset names to be all lowercase, so that they are easier to remember. E.g., Sohar_2012_FWE_c becomes: sohar_2012_fwe_c

The datasets include names of authors. The names of the datasets can be easily copied from sw_data_overview()

fd_report.R

I think that fd_report could be renamed felling_report or felling_date_report or similar. While fd is concise, I think it doesn't help facilitate discoverability of the functions.

Similar to cor_table.R, I think that:

if (!series %in% names(df)) {
      stop("--> 'series' does not exist")
}
if (!last %in% names(df)) {
      stop("--> 'last' does not exist")
}
if (!n_sapwood %in% names(df)) {
      stop("--> 'n_sapwood' does not exist")
}
if (!waneyedge %in% names(df)) {
      stop("--> 'waneyedge' does not exist")
}

Could be rewritten as check_if_variable_exists(). Something like:

check_if_variable_exists <- function(x,
                                     df,
                                     arg = rlang::caller_arg(x),
                                     call = rlang::caller_env()){
     arg_in_data <- x %in% names(df)
     if (!arg_in_data) {
          cli::cli_abort(
               c("{.arg {arg}} does not exist")
          )
     }
}

example_checker <- function(x, 
                            series = "series", 
                            last = "last"){
     check_if_variable_exists(series, x)
     check_if_variable_exists(last, x)
}

example_checker(mtcars, 
                series = "wrong")

## Error in `check_if_variable_exists()`:
## ! `series` does not exist

The check_input function is now part of helper-functions.R

get_header.R

This function should move the cat message up the top - and should not use cat, instead using one of the cli functions, like cli_abort.

I think you could use structure instead of setting attributes to NULL:

attr(rwl, "row.names") <- NULL
attr(rwl, "po") <- NULL
attr(rwl, "class") <- NULL
attr(rwl, "names") <- NULL

## becomes

rwl <- structure(
          rwl,
          row.names = NULL,
          po = NULL,
          class = NULL,
          names = NULL
     )

Although I think that they are functionally the same, so feel free to ignore!

cat() no longer used

hdi

This function uses = and <- - suggest sticking to just <-

=no longer used, in favour of <-

movAv

I think this starting chunk would be clearer if only if and not else is used.

The stop error can move to the top of this, so we clearly capture if align is not "center" or "right" or "left". This makes it easier to understand the conditions of error.

if (align == "center") {
     before <- floor((w - 1) / 2)
     after  <- ceiling((w - 1) / 2)
} else if (align == "right") {
     before <- w - 1
     after  <- 0
} else if (align == "left") {
     before <- 0
     after  <- w - 1
} else {
     stop("'align' should be 'center', 'left' or 'right'")
}

I suggest using another explaining variable inside mean:

mean(x[max(0, (i - before)):(i + after)], na.rm = TRUE)

## to something like:

earliest_to_latest <- x[max(0, (i - before)):(i + after)]
mean(earliest_to_latest, na.rm = TRUE)

## or given that this is repeated later
## potentially write this up as a function for reuse?
mean_earliest_latest(x, i, before, after)

As that mean statement is a bit involved to unfurl.

Similarly, the pattern, if (edges == "fill") { and } else if (edges == "nofill") { should be bundled up into a function and applied with switch

Checks for edges and fill are now on top of the script. Else statements have been avoided.

read_fh.R

Nice work in the attribution to the other previous work this extends. It looks like this is borrowed from dplR directly, and as such there are small style changes. I think it is worthwhile updating the code style to fit within your package.
Be consistent with naming variables, header.taken should be header_taken etc.
There are a few random comments that I'm not sure need to be there:

        # NEW: verbose = TRUE, header = FALSE
        inp <- readLines(fname, ok = TRUE, warn = FALSE)
        # NEW: removes empty lines in .fh file
        inp <- inp[nchar(inp) != 0]
        ## Get start and end positions of headers and data blocks
        header.begin <- grep("^HEADER:$", inp)
        # NEW: Quadro => chrono
        # NEW: Double => half chrono
lengths <- numeric(n) # commit Ronald Visser

I have found that moving comments either into documentation or into issues to help track them is helpful, but I appreciate that sometimes it is best to leave them in the code, but just something that might be worth thinking about :)

Tidying up the error messages in this function would make some of these nested if/else clauses easier to understand.

This is a pretty massive function, a bit over 1200 lines of code. I would recommend breaking down the steps inside this into smaller functions, as this will make the code easier to reason with and maintain in the future.

In the fellingdateR package I build upon the original code of the read.fh function from the dplR package. I would prefer to stay a close as possible to the original code in the dplR package in order to facilitate future cooperation and possible integration of both functions.

I removed all unnecessary comments as they were highlighting sections where I've made changes to the original code.

dplR::read.fh() concentrates on extracting the measurement data. The fellingdateR::read_fh() function extracts also the descriptive (meta-)data from the HEADER fields in a .fh file. This is not possible with the dplR::read.fh function.

Furthermore the fellingdateR::read_fh function allows to read data in CHRON or HALF-CHRONO format.

read.fh() also throws errors when header fields include Capital letters (depends on the software used to produce the .fh files: TSAP, PAST, ...). read_fh() is case-insensitive

sw_combin_plot.R

This is the first time I've seen ############ comment blocks - I'm all for stylistic choices but I am not sure this is needed, especially if this isn't used in other functions.

comment blocks with #### removed

I've not seen this pattern to avoid R CMD Check notes before

   # to avoid notes in CMD check
   year <-
      p <-
      lower <-
      upper <- COMB <- last <- n_sapwood <- A_i <- agreement <- NULL

My tactic has always been to have a separate definition of these, as answered by Carson Sievert on the posit community paage. I don't think there's anything inherently wrong with that, but I could imagine that in some cases this could accidentally erase inputs. Something to be aware of, perhaps?

When I run devtools::check() I get

❯ checking R code for possible problems ... NOTE  
x: no visible binding for global variable ‘p’

assigning NULL to these variables avoids the notes., as described in R Packages (2e) https://r-pkgs.org/package-within.html#delta-a-failed-attempt-at-making-a-package

I am all for using the new base R pipe |> - however you need to update your Depends in your DESCRIPTION like so in order to use it, since it only came out in R 4.1.0:

Depends: 
    R (>= 4.1.0)

This comment should probably live in a github issue or just be removed:

      # NEXT LINE TRIGGERS WARNING
      # Warning message:
      # Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
      # ℹ Please use one dimensional logical vectors instead.
      # ℹ The deprecated feature was likely used in the fellingdateR package.
      # Please report the issue to the authors.
      # { if (nrow(summary |> dplyr::filter(agreement == "poor")) != 0)
      # replaced by:

these comments are removed

sw_combine.R

This error should check each of the conditions separately - either it has missing values, or it is not numeric.

if (any(is.na(endDate)) | !is.numeric(endDate)) {
     stop(
          "--> Please check the column with 'end dates'.
Some values are possibly missing or the values are not numeric"
     )
}

A check_input() function (in helper-functions.R) now takes care of the input

sw_data_info.R

I think these error messages would benefit from using cli, as discussed above.

sw_data_overview.R

This is a nice function to include to facilitate data discovery

sw_interval_plot.R

This code

if (all(
     !(attributes(x)$names) %in% c(
          "year",
          "n_sapwood",
          "p")
))
     stop("Input differs from output sw_interval()")

Could be rewritten as an error function or the condition in if could be expressed as a function.

sw_interval.R

In the final line of documentation for this function there is a hanging sentence:

#' @return Depends on the value of `hdi`.
#'
#'  * If `hdi = TRUE`, a `numeric vector` reporting the upper and lower limit
#'   of the hdi (attributes provide more detail on `credMass` and the applied
#'   sapwood model (`sw_data`)).
#'  * If `hdi = FALSE`, a `matrix` with scaled p values for each number of
#'   observed sapwood rings. This matrix

Well spotted! Corrected.

sw_model.R

Great to see input checking at the top of the function - I do think these should be rewritten as check input functions.

Helper function d.count I think should be put into a separate R file called utils.R or helpers.R

d.count should use switch pattern and pass functions rather than using if controls.

d.count should be d_count

check_input() and d_dens() (instead of d.count) are now part of helper-functions.R

sw_sum_plot.R

indentation in this code is not consistent - recommend applying a style guide.

Examples should show different variations possible for function arguments. E.g., bar_col, spline_col, dot_col, and dot_size should all be specified in the examples so the user can see what the input should/could be.

examples have been updated with more visibility for the different parameters.

sw_sum.R

See note above on including plots.

tests

Do not need to namespace testthat calls, e.g., remove testthat::
consider using snapshot testing, to capture exact values and shape of data that should be stable - rather than always testing for data shape and type columns, this should be able to capture those outputs
consider snapshot error testing to capture exact error messages
consider using vdiffr for testing ggplot plots. See visdat for examples

ropensci / software-review