maRce10 commented 1 year ago

Date accepted: 2023-09-13

Submitting Author Name: Marcelo Araya-Salas Submitting Author Github Handle: !--author1-->@maRce10@jhollist<!--end-editor-- Reviewers: @sammlapp, @robitalec

Archive: TBD Version accepted: TBD Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: ohun
Type: Package
Title: Optimizing Acoustic Signal Detection
Version: 0.1.0
Maintainer: Marcelo Araya-Salas <marcelo.araya@ucr.ac.cr>
Description: Facilitates the automatic detection of acoustic signals, 
  providing functions to diagnose and optimize the performance of detection 
  routines. Detections from other software can also be explored and optimized. 
  Reference: Hossin & Sulaiman (2015) <doi:10.5121/ijdkp.2015.5201>.
License: GPL (>= 2)
Encoding: UTF-8
URL: https://marce10.github.io/ohun/
BugReports: https://github.com/maRce10/ohun/issues/
VignetteBuilder: knitr, rmarkdown
RoxygenNote: 7.2.1
Repository: CRAN
Language: en-US
Authors@R: c(person("Marcelo", "Araya-Salas", role = c("aut", "cre"), email = "marcelo.araya@ucr.ac.cr", comment = c(ORCID = "0000-0003-3594-619X")))
Date/Publication: 2016-04-19 08:12:11
Imports: pbapply, viridis, crayon, methods, stats, utils, seewave (>= 2.0.1), RCurl, fftw, knitr, rjson, rlang, sp, igraph, Sim.DiffProc
Depends: R (>= 3.2.1), tuneR, warbleR (>= 1.1.28)
Suggests: rmarkdown, testthat (>= 3.0.0), formatR
Config/testthat/edition: 3

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [ ] data retrieval
- [x] data extraction
- [ ] data munging
- [ ] data deposition
- [ ] data validation and testing
- [x] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences):

The packages allows to get data from animal acoustic signal recordings in an automated manner

Who is the target audience and what are scientific applications of this package? Scientific community working with bioacoustic data. Useful to automate analysis of animal acoustic signals for a variety of research questions (e.g. behavior, ecology, evolution, conservation)
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
the warbleR package has similar functions for detecting but the implementation is not as friendly (I am also the maintainer) and has no tools for detection diagnose/optimization. monitoR also implements one of the detection approaches in ohun but no diagnosing/optimization tools are supplied.
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any pkgcheck items which your package is unable to pass.

Technical checks

Confirm each of the following by checking the box.

[x] I have read the rOpenSci packaging guide.
[x] I have read the author guide and I expect to maintain this package for at least 2 years or to find a replacement.

This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions, created with roxygen2.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, including reporting of test coverage.

Publication options

[x] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[x] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

- [x] The package is novel and will be of interest to the broad readership of the journal. - [x] The manuscript describing the package is no longer than 3000 words. - [x] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

[x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

ropensci-review-bot commented 1 year ago

:calendar: @robitalec you have 2 days left before the due date for your review (2023-05-11).

jhollist commented 1 year ago

@ropensci-review-bot set due date for @robitalec to 2023-06-01

ropensci-review-bot commented 1 year ago

Review due date for @robitalec is now 01-June-2023

robitalec commented 1 year ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[ ] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[ ] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 15

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Documentation

A statement of need

Please consider adding a section to the README that 1) clarifies how ohun fits into a broader workflow of working with sound event and 2) if any other packages or approaches are available that share similar functionality to ohun.

For 1), this could be adding few sentences describing what typical steps (and related packages) would come before and after using ohun. You could also add the workflow from the vignette to the README in this section. Consider a new user looking at the Reference page on the website or the list of available functions in ohun... how do they fit together?

Click to expand

```r library(ohun) #> Loading required package: tuneR #> Loading required package: warbleR #> Loading required package: seewave #> Loading required package: NatureSounds #> Loading required package: knitr lsf.str('package:ohun') #> consensus_detection : function (detection, by = "overlap", filter = "max", cores = 1, pb = TRUE) #> diagnose_detection : function (reference, detection, by.sound.file = FALSE, time.diagnostics = FALSE, #> cores = 1, pb = TRUE, path = NULL, by = NULL, macro.average = FALSE, #> min.overlap = 0.5) #> energy_detector : function (files = NULL, envelopes = NULL, path = ".", hop.size = 11.6, #> wl = NULL, thinning = 1, bp = NULL, smooth = 5, threshold = 5, peak.amplitude = 0, #> hold.time = 0, min.duration = 0, max.duration = Inf, cores = 1, pb = TRUE) #> feature_acoustic_data : function (path = ".", digits = 2) #> feature_reference : function (reference, path = NULL, by.sound.file = FALSE, units = c("ms", #> "kHz"), digits = 2) #> filter_detection : function (detection, by = "overlap", filter = "max", cores = 1, pb = TRUE) #> get_envelopes : function (path = ".", files = NULL, bp = NULL, hop.size = 11.6, wl = NULL, #> cores = 1, thinning = 1, pb = TRUE, smooth = 5, normalize = TRUE) #> get_templates : function (reference, acoustic.space = NULL, path = ".", n.sub.spaces = 1, #> plot = TRUE, color = "#21908C4D", ...) #> label_detection : function (reference, detection, cores = 1, pb = TRUE, min.overlap = 0.5) #> label_spectro : function (wave, reference = NULL, detection = NULL, envelope = FALSE, threshold = NULL, #> smooth = 5, collevels = seq(-100, 0, 5), palette = viridis::viridis, #> template.correlation = NULL, line.x.position = 2, hop.size = NULL, #> ...) #> merge_overlaps : function (X, pb = TRUE, cores = 1) #> optimize_energy_detector : function (reference, files = NULL, threshold = 5, peak.amplitude = 0, hop.size = 11.6, #> wl = NULL, smooth = 5, hold.time = 0, min.duration = NULL, max.duration = NULL, #> thinning = 1, cores = 1, pb = TRUE, by.sound.file = FALSE, bp = NULL, #> path = ".", previous.output = NULL, envelopes = NULL, macro.average = FALSE, #> min.overlap = 0.5) #> optimize_template_detector : function (template.correlations, reference, threshold, cores = 1, pb = TRUE, #> by.sound.file = FALSE, previous.output = NULL, macro.average = FALSE, #> min.overlap = 0.5) #> split_acoustic_data : function (path = ".", sgmt.dur = 10, sgmts = NULL, files = NULL, cores = 1, #> pb = TRUE, only.sels = FALSE, X = NULL) #> summarize_diagnostic : function (diagnostic, time.diagnostics = FALSE, macro.average = FALSE) #> template_correlator : function (templates, files = NULL, hop.size = 11.6, wl = NULL, ovlp = 0, #> wn = "hanning", cor.method = "pearson", cores = 1, path = ".", pb = TRUE, #> type = "fourier", fbtype = "mel", ...) #> template_detector : function (template.correlations, cores = 1, threshold, pb = TRUE, verbose = TRUE) ```

For 2), as @sammlapp mentions, it's really helpful for a user to be aware of what the options are and how they differ. The README for the gt package has a good example of this here: https://github.com/rstudio/gt#how-gt-fits-in-with-other-packages-that-generate-display-tables

Installation instructions

The dependency seewave brings system dependencies that might not be immediately obvious to a new user of ohun. It might be helpful to include a link to seewave's installing and troubleshooting system requirements page, for example: https://rug.mnhn.fr/seewave/inst.html

Vignette(s)

It might be worth splitting this vignette into multiple vignettes. At its current length, I find it a bit hard to digest all the moving pieces and options. I personally find it easier to learn and use R packages if there are many, shorter vignettes that describe the different major functions or steps offered. You could have maybe four vignettes: introduction, energy based detection, template based detection, and diagnosing/improving. The introductory could expand on how ohun integrates with other R packages in this field, and provide the background given in sections "Automatic sound event detection" and "Signal detection theory applied to bioacoustics". Then the following vignettes using the sections already written from the current vignette. This might just be my opinion - so if you feel like it is better for ohun users to have everything in one place, let me know.

Throughout the vignette, the warnings and messages were not shown using the chunk options warnings=FALSE and messages=FALSE. I would recommend leaving all messages and warnings in the rendered vignette, because a user may be surprised and confused to see them appear locally, when they are not shown in the vignette.

Community guidelines

Please consider also including the link to the repository (https://github.com/maRce10/ohun) in the URL field in the DESCRIPTION as it should have the most consistently up to date information for the package, for example in the case that the website is not updated.

Functionality

Packaging guidelines

Looks good: package name, snake_case, object_verb function naming, citation, and code metadata. Consider using this following options for keeping the code metadata up to date and for keeping the CITATION file up to date.

Code style improved with a recent commit using the styler package. Before that, logical statements with if else were not easy to read.

It is not clear whether or not functions in ohun are pipeable (see the second point here). If you think that is a relevant option, consider including an example in the vignette that shows this.

As mentioned above, the README could use more information about how ohun fits into the broader ecosystem, comparing ohun to other packages that provide related functionality and details on the system requirements for seewave. The README is also missing a brief demonstration of functionality (see the sixth point here). The long form examples in the vignettes are useful for a user that is already intending to use ohun but shorter examples in the README help new users understand what ohun is offering.

Documentation looks good with all functions having accompanying examples, detailed return values and references included in the vignette. There is good use of @seealso to link users to related functions in ohun. Only one URL needed editing (according to urlchecker), see below under Additional comments/README.

You could consider using vdiffr for testing plots. Currently the tests for functions that produce plots are just testing that they return NULL. This isn't a very useful test, and while testing plots is not straightforward or intuitive, you could see if vdiffr helps! More detailed comments on testing below under Additional comments/Tests.

Detailed comments on package dependencies below.

Additional comments

The following sections have further details from above, and sometimes an accompanying pull request. I found it easier to work through the review by looking at the source code locally, and making changes as I noticed things. I also felt like it would be easier to highlight specific lines of code by just editing them, instead of referring to eg. Lines 83-85 in some file. No pressure to accept any of these pull requests directly, and I hope that you find it a clear and useful approach.

Repository

The repository size is ~ 120 mb.

I recommend that you delete, at least, the following files:

.Rproj.user/
.Rhistory
..Rcheck/
..pdf
.Rdata
testing/
docs/ (in this case since you are using the pkgdown workflow with the gh-pages branch)

Some of these files were presumably added before they were listed in the .gitignore file. I have deleted them in the following pull request:

https://github.com/maRce10/ohun/pull/12

README

Consider stating explicitly which package provides the parallel processing for ohun in the sentence starting "All functions allow the parallelization of tasks..."

The badge for R-CMD-check seems to point to an old workflow. The current workflow (tic) shows failing tests, whereas the R-CMD-check shows passing test. I would recommend removing this old badge if it is no longer relevant.

A couple minor edits to the README are in the following pull request:

https://github.com/maRce10/ohun/pull/13

NEWS

It might be helpful to users if the items in NEWS were accompanied with references to the relevant pull requests and issues (eg. #5). For example: https://github.com/tidyverse/ggplot2/blob/main/NEWS.md

Vignette

As mentioned above, consider splitting the single vignette into multiple vignettes:

Get started vignette
- Background and theory
- How does ohun fit in
- What other packages does ohun interface with
- Overview of functions provided by ohun
Energy based detection
Template based detection
FAQ / best practices / advice / tips

Minor proposed edits to the vignette are in the following pull request:

https://github.com/maRce10/ohun/pull/14

Contributing

The CONTRIBUTING.md document looks like it's based on the template from https://contributing.md/. Because of this, there were some places that links to sections didn't work, and it often still referred to the CONTRIBUTING.md version.

There isn't a code of conduct in the repository to link to, so I removed that dead link. But please consider including one (see the rOpenSci pages for more details: Contributing Guide, Code of Conduct).

Relevant pull request:

https://github.com/maRce10/ohun/pull/15

Documentation

I propose removing the "last modification date" from the R/ files since this information is redundant and more error prone than the information provided by Git. This change is included in the pull request under R.

DESCRIPTION

I suggest moving packages under "Depends" to "Imports". The case of adding packages to the Depends field does not seem like a universal yes/no clear cut decision but from my understanding and reading around this topic, I feel like for the ohun package it is not necessary to include packages in the Depends field.

From https://devguide.ropensci.org/building.html#pkgdependencies

Use Imports instead of Depends for packages providing functions from other packages.

For an example where it might be relevant to list a package in Depends:

From https://r-pkgs.org/dependencies-mindset-background.html#sec-dependencies-imports-vs-depends

A more classic example of Depends is how the censored package depends on the parsnip and survival packages. Parsnip provides a unified interface for fitting models, and censored is an extension package for survival analysis. Censored is not useful without parsnip and survival, so it makes sense to list them in Depends.

In the case of ohun, some of the packages listed under Depends also list some packages under Depends:

From warbleR
- tuneR, seewave (>= 2.0.1), NatureSounds
From tuneR
- None
From seewave
- None
From NatureSounds
- knitr

The full dependency tree is here:

Click to expand

``` R: pak::pkg_deps_tree("ohun") ohun 0.1.0 [new][bld][dl] (2.64 MB) ├─tuneR 1.4.3 [new][bld][cmp][dl] (427.79 kB) │ └─signal 0.7-7 [new][bld][cmp][dl] (1.96 MB) │ └─MASS 7.3-58.2 -> 7.3-59 [upd][bld][cmp][dl] (515.84 kB) ├─warbleR 1.1.28 [new][bld][cmp][dl] (4.49 MB) │ ├─tuneR │ ├─seewave 2.2.0 [new][bld][dl] (2.69 MB) │ │ └─tuneR │ ├─NatureSounds 1.0.4 [new][bld][dl] (4.40 MB) │ │ ├─knitr 1.42 [new][bld][dl] (893.59 kB) │ │ │ ├─evaluate 0.20 [new][bld][dl] (26.66 kB) │ │ │ ├─highr 0.10 [new][bld][dl] (15.08 kB) │ │ │ │ └─xfun 0.39 [new][bld][cmp][dl] (135.48 kB) │ │ │ ├─yaml 2.3.7 [new][bld][cmp][dl] (94.33 kB) │ │ │ └─xfun │ │ └─tuneR │ ├─dtw 1.23-1 [new][bld][cmp][dl] (746.72 kB) │ │ └─proxy 0.4-27 [new][bld][cmp][dl] (74.62 kB) │ ├─fftw 1.0-7 [new][bld][cmp][dl] (42.45 kB) │ ├─monitoR 1.0.7 [new][bld][dl] (2.90 MB) │ │ └─tuneR │ ├─pbapply 1.7-0 [new][bld][dl] (23.81 kB) │ ├─RCurl 1.98-1.12 [new][bld][cmp][dl] (731.48 kB) │ │ └─bitops 1.0-7 [new][bld][cmp][dl] (10.81 kB) │ ├─rjson 0.2.21 [new][bld][cmp][dl] (116.51 kB) │ ├─Rcpp 1.0.10 [new][bld][cmp][dl] (2.94 MB) │ ├─knitr │ ├─crayon 1.5.2 [new][bld][dl] (40.57 kB) │ └─bioacoustics 0.2.8 [new][bld][cmp][dl] (829.33 kB) │ ├─htmltools 0.5.5 [new][bld][cmp][dl] (131.88 kB) │ │ ├─digest 0.6.31 [new][bld][cmp][dl] (163.50 kB) │ │ ├─base64enc 0.1-3 [new][bld][cmp][dl] (7.83 kB) │ │ ├─rlang 1.1.1 [new][bld][cmp][dl] (762.53 kB) │ │ ├─fastmap 1.1.1 [new][bld][cmp][dl] (46.41 kB) │ │ └─ellipsis 0.3.2 [new][bld][cmp][dl] (8.07 kB) │ │ └─rlang │ ├─moments 0.14.1 [new][bld][dl] (7.64 kB) │ ├─Rcpp │ ├─stringr 1.5.0 [new][bld][dl] (175.70 kB) │ │ ├─cli 3.6.1 [new][bld][cmp][dl] (567.29 kB) │ │ ├─glue 1.6.2 [new][bld][cmp][dl] (106.51 kB) │ │ ├─lifecycle 1.0.3 [new][bld][dl] (106.85 kB) │ │ │ ├─cli │ │ │ ├─glue │ │ │ └─rlang │ │ ├─magrittr 2.0.3 [new][bld][cmp][dl] (267.07 kB) │ │ ├─rlang │ │ ├─stringi 1.7.12 [new][bld][cmp][dl] (7.60 MB) │ │ └─vctrs 0.6.2 [new][bld][cmp][dl] (964.63 kB) │ │ ├─cli │ │ ├─glue │ │ ├─lifecycle │ │ └─rlang │ └─tuneR ├─pbapply ├─viridis 0.6.3 [new][bld][dl] (3.05 MB) │ ├─viridisLite 0.4.2 [new][bld][dl] (1.27 MB) │ ├─ggplot2 3.4.2 [new][bld][dl] (3.15 MB) │ │ ├─cli │ │ ├─glue │ │ ├─gtable 0.3.3 [new][bld][dl] (130.06 kB) │ │ │ ├─cli │ │ │ ├─glue │ │ │ ├─lifecycle │ │ │ └─rlang │ │ ├─isoband 0.2.7 [new][bld][cmp][dl] (1.59 MB) │ │ ├─lifecycle │ │ ├─MASS │ │ ├─mgcv 1.8-42 │ │ │ ├─nlme 3.1-162 │ │ │ │ └─lattice 0.20-45 -> 0.21-8 [upd][bld][cmp][dl] (589.33 kB) │ │ │ └─Matrix 1.5-3 -> 1.5-4 [upd][bld][cmp][dl] (2.18 MB) │ │ │ └─lattice │ │ ├─rlang │ │ ├─scales 1.2.1 [new][bld][dl] (270.61 kB) │ │ │ ├─farver 2.1.1 [new][bld][cmp][dl] (1.27 MB) │ │ │ ├─labeling 0.4.2 [new][bld][dl] (10.16 kB) │ │ │ ├─lifecycle │ │ │ ├─munsell 0.5.0 [new][bld][dl] (182.65 kB) │ │ │ │ └─colorspace 2.1-0 [new][bld][cmp][dl] (2.12 MB) │ │ │ ├─R6 2.5.1 [new][bld][dl] (63.42 kB) │ │ │ ├─RColorBrewer 1.1-3 [new][bld][dl] (11.64 kB) │ │ │ ├─rlang │ │ │ └─viridisLite │ │ ├─tibble 3.2.1 [new][bld][cmp][dl] (565.98 kB) │ │ │ ├─fansi 1.0.4 [new][bld][cmp][dl] (482.06 kB) │ │ │ ├─lifecycle │ │ │ ├─magrittr │ │ │ ├─pillar 1.9.0 [new][bld][dl] (444.53 kB) │ │ │ │ ├─cli │ │ │ │ ├─fansi │ │ │ │ ├─glue │ │ │ │ ├─lifecycle │ │ │ │ ├─rlang │ │ │ │ ├─utf8 1.2.3 [new][bld][cmp][dl] (241.41 kB) │ │ │ │ └─vctrs │ │ │ ├─pkgconfig 2.0.3 [new][bld][dl] (6.08 kB) │ │ │ ├─rlang │ │ │ └─vctrs │ │ ├─vctrs │ │ └─withr 2.5.0 [new][bld][dl] (102.09 kB) │ └─gridExtra 2.3 [new][bld][dl] (1.06 MB) │ └─gtable ├─crayon ├─seewave ├─RCurl ├─fftw ├─knitr ├─rjson ├─rlang ├─sp 1.6-0 [new][bld][cmp][dl] (1.03 MB) │ └─lattice ├─igraph 1.4.2 [new][bld][cmp][dl] (3.31 MB) │ ├─magrittr │ ├─Matrix │ ├─pkgconfig │ ├─rlang │ └─cpp11 0.4.3 [new][bld][dl] (304.53 kB) └─Sim.DiffProc 4.8 [new][bld][cmp][dl] (1.12 MB) ├─Deriv 4.1.3 [new][bld][dl] (37.21 kB) └─MASS ```

If the recommendation to move some packages to move Imports instead of Depends, the following issues in the examples can be resolved, for example in feature_acoustic_data.R:

could not find function "writeWave" fixed by adding library(tuneR) at the top of the example

Minor suggested edits to the Description include:

Removing non-standard fields including "Date/Publication", "Repository", "Config/testthat/edition"
Adding the link to the repository under URL
Using line breaks to make the Description file easier

Relevant pull request:

https://github.com/maRce10/ohun/pull/16/

Tests

I suggested some edits to the test scripts to better take advantage of the variety and precision of available functions from testthat. Often, the tests were expect_true where you set up a logical statement, eg. nrow(DT) == 3. This works but when it fails, there is nothing informative about the error. Alternatively, if you set this test using expect_length then you would get a numeric difference between expectation and result. This might make it easier to identify why or how the test is failing.

I also suggest using a skip_on_windows function.

Writing tests can be really hard (and rewarding!) - I found these resources helpful:

Specifically in the first link "avoid testing simple code [...] test what you're not sure about, what's fragile, ...". This is a great opportunity for you to put your deep understanding about this kind of analysis to work, because you know when results don't look right, beyond the basic "is it a data.frame", etc. To this point, a lot of tests are just length checks - do you have any other expectations from the functions other than that they will return the expected length?

I've adjusted some of the tests where I noticed an alternative function that might be more informative. Here's the pull request:

https://github.com/maRce10/ohun/pull/17

Note: if packages are moved from Depends, it may require them to be explicitly loaded with library() in the tests where their functions are used.

R

I don't really understand why custom functions stop2 and warning2 are being used. They return the error or warning message, without the call being printed. Would this possibly make it harder for a new user to troubleshoot, if where the message originates from? stop2 is used 105 times across 14 files and warning2 is used 3 times across 3 files.

For example with stop2:

R: label_detection(reference = matrix(lbh_reference[-1, ]), detection = lbh_reference)
Error: 'reference' is not of a class 'data.frame' or 'selection_table'

and with stop:

R: label_detection(reference = matrix(lbh_reference[-1, ]), detection = lbh_reference)
Error in label_detection(reference = matrix(lbh_reference[-1, ]), detection = lbh_reference) : 
  'reference' is not of a class 'data.frame' or 'selection_table'

Related, warnings are suppressed in R/feature_reference.R 3 times and in R/optimize_energy_detector.R 2 times.

Consider using seq_len instead of 1:nrow, 1:length, 1:ncol, ...

For example:

x <- data.frame(LETTERS[1:5])

1:nrow(x)
#> [1] 1 2 3 4 5

seq_len(nrow(x))
#> [1] 1 2 3 4 5

x <- data.frame()

1:nrow(x)
#> [1] 1 0

seq_len(nrow(x))
#> integer(0)

Related to dependencies described above under Description:

internal functions from another package are used in 12 cases, though given all 12 are from warbleR and the maintainer of ohun is also the maintainer of warbleR this seems likely ok

There is mixed used of importFrom and ::. If packages listed under Depends are moved to Imports, I would suggest using the :: syntax because it helps a reader understand if functions are from in ohun and, if not, which package they come from.

There is commented code in

optimize_template_detector L226 - 231
split_acoustic_data L265-267

Related to @sammlapp's review, the ohun could be improved by refactoring some functions. For example, there are nested functions in the following scripts:

detect_FUN within energy_detector
internal_feature_reference within feature_reference
env_ohun_int within get_envelopes
XC_FUN and spc_FUN within template_correlator

I will defer to @sammlapp's review regarding the "flat" package design to avoid repeating those comments again.

Please let me know what your questions are or where you need clarification in my review or pull requests. Looking forward to the continued progress on ohun!

All the best.

(Edits: typos/clarity)

robitalec commented 1 year ago

Thank you @maRce10 for merging those pull requests. I responded to one of your comments on the edit/vignette PR with a suggestion for a possible CSS alternative to explicit line breaks throughout. Let me know what other questions you have or clarifications you need from the review.

jhollist commented 1 year ago

Thank you @sammlapp and @robitalec for the reviews. @maRce10 glad to see you have started on the revisions. If you have any questions for me, just holler. I will update the status of the review shortly. Ping me here when you have the revisions done and we will keep this one moving forward!

maRce10 commented 1 year ago

Thank you @robitalec for all those great suggestions!

jhollist commented 1 year ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/568#issuecomment-1436327280 time 10

ropensci-review-bot commented 1 year ago

Logged review for sammlapp (hours: 10)

jhollist commented 1 year ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/568#issuecomment-1572923447 time 15

ropensci-review-bot commented 1 year ago

Logged review for robitalec (hours: 15)

ropensci-review-bot commented 1 year ago

@maRce10: please post your response with @ropensci-review-bot submit response <url to issue comment> if you haven't done so already (this is an automatic reminder).

Here's the author guide for response. https://devguide.ropensci.org/authors-guide.html

maRce10 commented 1 year ago

Apologies for the delay. Many thanks to both reviewers. I am really grateful for all your time and effort. I have made changes and responded to your comments/suggestions below, quoting the commit the change was made in.

Response to @sammlapp

@ropensci-review-bot submit response https://github.com/ropensci/software-review/issues/568#issuecomment-1436327280

comment: Vignette(s): demonstrating major functionality that runs successfully locally - I wasn't able to see or access the vignette in R studio with vignette()

response: I have added a link to each of the 3 vignettes to README maRce10/ohun@f857949224ace5f75b1747b913ce98886676745e

comment: I don't typically use R, so some things are not obvious to me. Where can I find the full API documentation? I opened the /docs/index.html to view the docs by rendering the HTML locally, but presumably they are online somewhere. I would recommend linking to the documentation near the top of the README.

response: I added a link to the package documentation to the top of README maRce10/ohun@1ffc441207235c1663e75a02f17a89c426e11290

comment: The use of the word "reference" to mean "audio annotations" or "labels" is unfamiliar to me, I'm not sure if this word is typically used in the related warbleR package.

response: The use of the term reference to refer to the ground truth annotations seems to be common in the field (check for instance Mesaros, A., Heittola, T., Virtanen, T., & Plumbley, M. D. (2021). Sound event detection: A tutorial. IEEE Signal Processing Magazine, 38(5), 67-83). I prefer "reference" because you can also compare annotations from different software to check agreement. So it is not always manual annotations vs computer generated annotations.

comment: The package should give the user some expectation for under what circumstances the detection methods are likely to be effective. Although it mentions that high SNR is required for amplitude thresholding and stereotyped sounds are necessary for template matching, there are other requirements of the data for these approaches to be effective.

response: I added that time-frequency overlaps with other sounds can affect performance. I am not aware of any other main requirement. maRce10/ohun@d85c0e0d0b2b2310ad83f4c2ad687e3606b5fb95

comment: to improve documentation and function/variable/argument names to better match the action that is actually taken by a function;

_response: The function filter_detection() has been renamed consensusdetection() maRce10/ohun@c1e108d2dd65b1c834ff767222ff5634e8f9742d

comment: to consider a moderate refactor of code to increase modularity, reduce redundancy, and aim to follow the Single Responsibility Principle;

_response: a modular approach was used for checking arguments. Note that all task that are shared by different functions are carried out by a unique function. For instance, diagnosing detection performance in optimize_template_detector() or optimize_energy_detector() is done by calling diagnose_detection() and summarize_diagnostic(). diagnose_detection() also calls label_detection() which can also be called by users. get_templates() is called within template_detector(). Now all functions call checkarguments(). maRce10/ohun@c1e108d2dd65b1c834ff767222ff5634e8f9742d

comment: increase code readability by breaking complex logical lines into separate lines with additional comments, plus comments describing the general purpose of large blocks of code.

_response: I simplify the code in diagnose_detection() and summarizediagnostic() maRce10/ohun@69d81da42cc5b831779be1ebb3aaa2e794464877e. Note that functions that deal with envelopes tend to avoid duplicating envelope objects to avoid using to much memory_

comment: Template-based cross correlation is implemented in monitoR, while energy-based detection is implemented in bioacoustics (the implementation seems more generic and featured than the one in ohun). I wonder if the proposed value of Ohun is mostly in its specific ability to interact with the formats of annotations produced by warbleR. If so, the top-level documentation and README could clarify this purpose.

response: monitoR offers no way to evaluate detection performance. Ohun does, and can even be applied for optimizing detections.

comment: Specifically, when I started using the program without having any experience with warbleR, it was not clear to me how to create annotation tables or generally what the utility of most of the package functions are. Writing a description of the characteristics of data that this package expects as input would be helpful. For instance, "This package provides utilities for comparing detection and annotations of audio events described by frequency and time boxes. Creating annotations containing such frequency-time boxes [is possible in the warbleR package?]."

response: I added a paragraph similar to that proposed by the reviewer to README. maRce10/ohun@d4edd0df7249bffc59a9c7263ad7176a5c2d80cb Note that all argument descriptions clearly state the format of the annotations and they do not need to be created using warbleR. For instance the description of "reference" in all functions:_

Data frame or 'selection.table' (following the warbleR package format) with the reference selections (start and end of the sound events) that will be used to evaluate the performance of the detection, represented by those selections in 'detection'. Must contained at least the following columns: "sound.files", "selec", "start" and "end".

comment: In the language of rOpenSci software recommendations, this is a sign that the functions are not "modular". In other places it might be seen as a violation of the Single Responsibility Principle (SRP). For example, the label_specto function violates SRP because it doesn't just label a spectrogram: it computes a spectrogram, plots it, then overlays labels, and might additionally comput and plot an amplitude signal. As another example, I get_envelopes.R and energy_detector.R seem to both implement the process of getting an amplitude envelope, rather than using shared modular functionality specific to this purpose. Though it would take a significant refactoring effort, I believe creating a more modular code base that aims to achieve the SRP could improve the package and user experience overall.

_response: The function labelspectro() was created for making the graphs in the vignette. This is stated in the description of the function in bold font ("Created for graphs included in the vignette, and probably only useful for that or for very short recordings"). So it is not call by any other function or calls any other ohun function. It just for the vignette. Not sure how to make this more evident. maRce10/ohun@f857949224ace5f75b1747b913ce98886676745e

comment: In general, the code is difficult to read because of the formatting and smushing lots of logic into a single functional "line" (which is broken into many lines anyways due to formatting). Separating out logical steps into sequential lines, with comments where appropriate, would make the code easier to read/understand/maintain. However, it would take a major refactoring effort.

_response: As mentioned above I simplify the code in diagnose_detection() and summarizediagnostic() which were the most cumbersome maRce10/ohun@69d81da42cc5b831779be1ebb3aaa2e794464877e. Note that functions that deal with envelopes tend to avoid duplicating envelope objects to avoid using to much memory_

comment: Consider adding units to variable names eg hold_time_ms

response: I think that is a good suggestion although it is extremely uncommon in R packages so I would rather not include units in names to keep match what R users are expecting.

split_acoustic_data.R:

comment: documentation should list supported sound file types

response: I added supported formats to all function working on sound files maRce10/ohun@63f2be131f0b5e0a3be583892816657ba567cb66

comment: documentation should list required columns of table

_response: The argument description already mentioned that: `X = 'selection_table' object or a data frame with columns

' for sound file name (sound.files), selection number (selec), and start and end time of signal

' (start and end).`_

comment: documentation should say what will happen with incomplete audio clip at end of file

response: Not sure I follow. I think there is only when thing that can happen: create the short clip (not really incomplete)

comment: comments denoting the purpose of larger sections of code are helpful, eg # INPUT VALIDATION or # Create the audio clips

response: I added more comments maRce10/ohun@05daadb5a7bf3ae60bec57c8c8c98d20d85d4884

summarize_diagnostic.R:

comment: The code style makes it difficult to review the logic or infer the operation performed by much of this code. For example, lines 131-141 of summarize_diagnostic.R. In general I'm not reading this code base to check for logical errors, but I think that the code style will make it difficult for others (or the authors) to debug, extend, understand or modify the code. In this example, separating the logical operation into a few separate and sequential lines of code could make it easier to digest, and would provide opportunities to add comments for any individual steps whose purposes are not obvious. (For example in this code, a comment could explain why we would want a weighted mean of true positive durations.)

response: as mention above I simplify the code to make it easier to understand (and debug) maRce10/ohun@69d81da42cc5b831779be1ebb3aaa2e794464877e

*comment: Input validation is redundant across many functions, eg validating the cores argument. Rather than repeating code in several modules, use modular helper functions. That way, if you change something it is consistent across the codebase. _response:** I added the internal function checkarguments() which is now used by all other functions to validate arguments maRce10/ohun@9fdd16338cf bda7cb096e9f3d6f737d6517fe714

comment: Comments should explain not just what's being done but also why (rather than re-stating the code in English). For example, instead of # set bottom and top freq if not supplied you could explain the bottom and top frequency are needed to make a spectrogram. if they're not supplied, we set them to 0 and Ns=sample_rate/2. Another example, in template_detector.R L177 "# if no detections" is not a helpful comment - the comment should explain what will happen and why.

response: More comments were added. maRce10/ohun@05daadb5a7bf3ae60bec57c8c8c98d20d85d4884 and maRce10/ohun@4e3449d7d27eb697ec7e0be83af11c9607a5d45a

comment: Since I'm not an R programmer, I might not understand conventions for writing R modules. However, I don't think it's a good idea to define your helper functions inside another function. For example L218 of template_correlator.R begins a definition of a function to create a spectrogram. Even if this is not part of the API (its a hidden or internal function), I think it should be defined outside of the template_correlator function (1) because it doesn't make sense that it would be re-defined each time template_correlator is called, and (2) because it makes reading the code for template_correlator unnecessarily long.

_response: I moved the internal function out of template_correlator() maRce10/ohun@7c4709280fdb2f68a84d4463a086c4724272040c. Same thing for energydetector() maRce10/ohun@4ca47bd993b95cecccfd09c490d3605137b1605d

comment: I'm not sure how namespaces work in R but it would be helpful for a code reader to know where functions are coming from. For instance, get_templates.R calls find_templates which is not explicitly imported or defined anywhere in the file. It apparently can be accessed because it exists in another .R file in the package, or because all functions from the package have been imported into the default namespace.

_response: Internal functions are common in R. They can be called using namespace:::function. All internal functions in ohun are found in internalfunctions.R

Notes on the Getting Started tutorial page

comment: The "main features" description is jargon-y and could be written in broader language. For example, "The use of signal detection theory indices to evaluate detection performance" -> "Evaluate detection accuracy with precision, recall, and other metrics". As is, I feel like it obscures the functionality of the package. As another example, I wasn't sure what "Optimize detection routines based on reference annotations" meant - after reading further, I think it basically means "Adjust detection parameters to optimize accuracy on a labeled set of recordings"

response: I have included "Adjust detection parameters to optimize accuracy " into the package description (maRce10/ohun@7617753447c4e23d97377a77286818cef84444e8). However, "Evaluate detection accuracy with precision, recall, and other metrics" assumes that users know what those metrics are. Most biologists working on bioacoustics are not familiar with those terms.

comment: "Template-based detection" It's not true that template-based detection "doesn’t depend on the signal-to-noise ratio" - high background noise or low signal level will decrease the accuracy of this approach

response: I reworded this: "As it is less affected by signal-to-noise ratio it's more robust to higher levels of background noise" maRce10/ohun@3fd70a73dc78ab861f8d4cfd0f894386f9ccbf66

comment: outputs (eg, tables) in the Getting Started tutorial are very hard to interpret in the current form, could they be rendered as human-readable tables?

response: I don't think humans cannot read them (I can). I agree they could more printed in a more friendly way but that would require adding more dependencies (e.g. kntr, kableExtra)

comment: An example of how to analyze a large set of files, for instance all files in a folder-subfolder structure, would be helpful. I had to do some googling to figure out how to do "globbing" ie find all paths mathing a pattern.

response: I agree, but the amount of example data that can be included is very limited for R packages.

Notes on API documentation and variable/function names

comment: Throughout the codebase, I recommend more informative variable and argument names: for example, in the function merge_overlaps, to a new user, pb is not interpreted as "progress bar"

response: I try to follow the same variable name use b other packages (including those creating progress bars)

comment: Throughout the documentation: why are arguments that are numbers/strings called "numeric vectors of length 1" and "character vectors of length 1"? This makes me thing I should pass c(0) or [0] instead of 0, for instance, as an argument.

response: c(1) and 1 are equivalent and are both numeric vectors of length 1 (0 is not used as an index in R). Not sure how else I can call them.

get_envelopes :

comment: smooth argument - should state the specific function/implementation of smoothing used

response: I added that is done using the function envelope from warbleR maRce10/ohun@6a2e11e1e8131411ac8c10de71ed090a57abc4cc

comment: paths and files arguments: using the current path as default is fragile and does not match the conventions used elsewhere in the package. I recommend making the argument(s) mandatory. It is also unclear that specifying a folder but not files means that the function should attempt to analyze any/all audio files in the folder (only a specific file extension? how does it know which files to analyze or not?). Is that what happens if files=NULL?_

response: I agree that making path mandatory would be safer. However many R users like to set the working directory (using setwd()) before running analyses. That's why is common for functions to look into the current working directory by default. All 5 functions with the argument files use the same default value (NULL). I added that: "If not supplied the function will work on all sound files (in the supported format) in 'path'." maRce10/ohun@a00e687465eb57285f529544123a76718edc2629

energy_detector:

comment: again, I don't think making files optional and defaulting to all audio in the working directory is a good idea.

response: Agree, but again it's friendly for most R users.

label_spectro:

comment: can you provide an example of how to use this function with an audio file? If I want to run it on anything other than a simulated array of numbers or the two built-in audio files, I need some prior line of code to load the audio file,

response: As mentioned above, I made this function just to get a nice looking visualization of the data in the vignette. I don't think is very helpful for most real data. I have this info in bold in the description: "Created for graphs included in the vignette, and probably only useful for that or for very short recordings". However, people seem to get confused. I thought about just having it inside the vignette without including it as a function in the package. However users always ask how those figures are made. So that's why I included it as a function.

comment: bp argument (in several functions): how is the signal bandpassed? Should state specifically, for example, a 9th order Butterworth bandpass filter

response: I added that: "Bandpass is done using the function code{\link[seewave]{ffilter}}, which applies a short-term Fourier transformation to first create a spectrogram in which the target frequencies are filtered and then is back transform into a wave object using a reverse Fourier transformation." maRce10/ohun@59a984f0f5b14f6d06c219b376ad628577edbb6e

feature_reference:

comment: "Extract quantitative features of references" - from the function name and short description, I expected this to extract acoustic features of the annotated audio events themselves, rather than summary statistics of the annotations per se. Perhaps the description could be "Summarize temporal and frequency dimensions of annotations and gaps" or something similar. Ideally, the function's name would reflect its functionality, something like summarize_annotation_boxes.

_response: I addopted the suggested description and added an alternative name for the function: summarizereference() maRce10/ohun@e0f0c7591a4da491d6b29f386523d8a00ae21151

comment: gap times: I tried using this summary to choose a hold.time argument for energy_detector, but the min, max, and mean are not very informative, because what matters is the typical gap time between close-together annotations. Maybe the median or 25th percentile, or some other statistic, would be more informative.

response: Yeah that is a tricky thing to get and depends a lot on duty cycle and signal duration. But I don't think there is a single metric that can get a good estimate. min and mean still give you a sense of the range of gap values.

feature_acoustic_data

comment: Similarly, feature_acoustic_data sounds like it would measure some acoustic feature. Since the function returns information about the file types, I would call it something like summarize_audio_file_characteristics

_response: I added an alternative name for the function: summarize_acousticdata() maRce10/ohun@6a54b30e4505fa4586ccc5220c3f796530f598b9

filter_detection

comment: I'm confused as to what this function is doing. The documentation says it "removes ambiguous detections (split and merged detections)", but the detailed documentation should explain what is meant by split and merged detections. I'm also confused whether this is meant to be applied to human-created annotations, automatically-generated detections, or can be used for either purpose.

_response: I added an alternative name to the function: consensus_detection(). I also added that: "Useful when several detections match the same reference as in the case of template detection with multiple templates (see \code{\link{templatedetector}})." maRce10/ohun@a6e8e3aa58329fc10dd782f0bb59bf4cf52c2d99

optimize_template_detector

comment: this function compares performance across parameter values, it does not itself optimize the algorithm, so its name is a bit misleading. Perhaps template_parameter_sweep is closer. I'm not sure what paramter values it sweeps over after reading the documentation, since it requires the user to have already run template_correlator. Also, I think the part of the documentation that says "Those parameters can be later used for performing a more efficient detection using optimize_template_detector." must be a typo, perhaps it means "using template_correlator" instead?

_response: Thanks I fixed the typo. Naming functions is always hard. To me optimize_template_detector tells me more about what is going on that template_parametersweep.

label_spectro

comment: this is a plotting function, so it would be better named something like plot_spec_and_labels

response: Again naming functions is a tricky an subjective thing. In addition, as I said above, this function is mostly used to create spectrograms in the vignette.

get_templates

comment: can you explain what features are being used to create this principle component analysis of template similarity? Also, after understanding what this function does, it seems like it would be better named something like select_representative_templates or select_representative_annotations since it chooses representative templates from a set of annotations.

_response: I expanded the explanation in the function documentation: "the function will estimate it by measuring several acoustic features using the function \code{\link[warbleR]{spectro_analysis}} (features related to energy distribution in the frequency and time domain as well as features of the dominant frequency contours, see \code{\link[warbleR]{spectro_analysis}} for more details) and summarizing it with Principal Component Analysis (after z-transforming parameters) using the function \code{\link[stats]{prcomp}}" ( maRce10/ohun@293efe7c7ce38cb92eb478930362e76058152d07). I think gettemplates suggests users what the function does. They should always read the documentation to get a full sense of what a function does.

split_acoustic_data

comment: it should be possible, if not required, to list the audio files to split. It seems like I must split every file within a folder in the current implementation.

response: Explained above. Behavior is due to common habits of R users.

label_detection

comment: function name does not match what it does. It seems this function evaluates detections against a set of annotations/labels/"references" so perhaps it should be called evaluate_detections or compare_annotations_and_detections

_response: It add labels in a new column to each detection. To me the name makes sense. evaluate_detections sounds more like diagnosedetections which is another function.

template_correlator

comment: it's unclear to me how to use the files argument. I don't know what 'X' is as referred to in the documentation, and I would expect to simply be able to pass a vector of file paths.

_response: Explained above. Behavior is due to common habits of R users. "There is no 'X' argument in templatecorrelator()

full_spectrograms

comment: it was surprising to me that this function automatically saved image files to the current working directory, rather than asking for a save dir or displaying them in the console. This doesn't seem like a good practice, because I might not want to save the results and if I do, I'd like to choose where they go.

response: That function is from another package.

comment: Other note: I would be very surprised if template matching is faster than amplitude-based detection, as claimed in the "Improving detection speed" section. Amplitude-based detection is about the lightest task you can perform on a time signal while 2d cross correlation is relatively heavy.

response: That have been my experience in R. It seems that getting and manipulating envelopes in R is particularly slow.

Errors during usage

comment: When I try to run get_templates on my own file set, I get this error:

Error in prcomp.default(spectral_parameters[, 2:27], scale. = TRUE) : 
  cannot rescale a constant/zero column to unit variance

I tried to run get_templates on a set of local files, but it seems somehow one of the features has no variance and this is causing the PCA to fail. I wasn't able to resolve this.

response: I added code to remove zero-variance variables from the PCA input data maRce10/ohun@a16ff199944a45cc8598c76c65db1305ab7acf96

comment: Running template_correlator then template_detector on the results produced a table containing NA values for many rows in the start, end, and scores columns. I did not expect this, and it caused errors when I tried to run other analyses on these detections.

response: That can happen when no detections were found for some sound files. Just remove the NAs.

comment: Once I ran template_correlator and template_detector, I wasn't sure how to inspect my detections although it seemed like I should be able to use full_spectrograms or some other package function. I eventually realized that if I used na.omit(selections) to drop rows with NA values, I could run full_spectrograms to look at the results. I wonder if there is a way to more selectively choose to look at some of the detections alone, rather than having to view the entire audio files (imagine if I had a 1 hour audio file with 3 1-second detections).

_response: full_spectrograms is not a ohun function. I added a new function plotdetection() to visually inspect detections maRce10/ohun@4bca6b2a90ce563fee3df8589d3744da96048dc3

comment: When I first ran label_spectro, I got the message "Error in loadNamespace(x) : there is no package called ‘viridis’". I'm not sure if this suggests a missing dependency. I resolved the issue by running install.packages("viridis")

response: I added a checking step to make sure users have viridis installed maRce10/ohun@d6a4110fe3d2ad59f7130cc2ef9228f58b51bb49

w <- readWave("~/sample_audio/birds_10s.wav")

label_spectro(wave = w@left,
              f=w@samp.rate,
              env = TRUE)

Error in label_spectro(wave = w@left, f = w@samp.rate, env = TRUE, fastdisp = TRUE) : 
  trying to get slot "samp.rate" from an object of a basic class ("integer") with no slots

response: I added the error message maRce10/ohun@9fdd16338cfbda7cb096e9f3d6f737d6517fe714

comment: Add a more helpful error message for accidental use of Hz instead of kHz: If my detection table has Hz instead of kHz and I run get_templates, the error message says "Caution! This is a null spectrum. The spectral entropy is null!"

response: There is no easy way to tell if users mean Hz or kHz unfortunately

comment: When I run label_spectro with env = TRUE, I get an error message:

response: Not sure what is the problem. To me it runs and it is checked by testthat. The full name of the argument is "envelope"

Response to @robitalec

A statement of need

comment: Please consider adding a section to the README that 1) clarifies how ohun fits into a broader workflow of working with sound event and 2) if any other packages or approaches are available that share similar functionality to ohun. For 1), this could be adding few sentences describing what typical steps (and related packages) would come before and after using ohun. You could also add the workflow from the vignette to the README in this section. Consider a new user looking at the Reference page on the website or the list of available functions in ohun... how do they fit together?

response: I added that "The implementation of detection diagnostics that can be applied to both built in detection methods and to those obtained from other software packages makes the package ohun an useful tool for conducting direct comparisons of the performance of different routines. In addition, the compatibility of 'ohun' with data formats already used by other sound analysis R packages (e.g. seewave, warbleR) enables the integration of 'ohun' into more complex acoustic analysis workflows in a popular programming enviroment within the research community." maRce10/ohun@78658c766b13557c522c04fabff62c0b7e62b674

Installation instructions

comment: The dependency seewave brings system dependencies that might not be immediately obvious to a new user of ohun. It might be helpful to include a link to seewave's installing and troubleshooting system requirements page, for example: https://rug.mnhn.fr/seewave/inst.html

response: I added "Further system requirements due to the dependency seewave may be needed. Take a look a this link for instruction on how to install/troubleshoot these external dependencies." maRce10/ohun@b237f5875eb86743451a6674acfd0b7bf9ec97c7

Vignette(s)

comment: It might be worth splitting this vignette into multiple vignettes. At its current length, I find it a bit hard to digest all the moving pieces and options. I personally find it easier to learn and use R packages if there are many, shorter vignettes that describe the different major functions or steps offered. You could have maybe four vignettes: introduction, energy based detection, template based detection, and diagnosing/improving. The introductory could expand on how ohun integrates with other R packages in this field, and provide the background given in sections "Automatic sound event detection" and "Signal detection theory applied to bioacoustics". Then the following vignettes using the sections already written from the current vignette. This might just be my opinion - so if you feel like it is better for ohun users to have everything in one place, let me know.

Throughout the vignette, the warnings and messages were not shown using the chunk options warnings=FALSE and messages=FALSE. I would recommend leaving all messages and warnings in the rendered vignette, because a user may be surprised and confused to see them appear locally, when they are not shown in the vignette.

response: I accepted the commit ti show warnings and messages in the vignette. I also split the vignette in 3: Introduction, energy based detection, template based detection. maRce10/ohun@1b9c95f1ed0e83f466014b8c2ced550a2e3421a3

Community guidelines

comment: Please consider also including the link to the repository (https://github.com/maRce10/ohun) in the URL field in the DESCRIPTION as it should have the most consistently up to date information for the package, for example in the case that the website is not updated.

response: Done (accepted the commit)

Functionality

Packaging guidelines

comment: Looks good: package name, snake_case, object_verb function naming, citation, and code metadata. Consider using this following options for keeping the code metadata up to date and for keeping the CITATION file up to date. Code style improved with a recent commit using the styler package. Before that, logical statements with if else were not easy to read. It is not clear whether or not functions in ohun are pipeable (see the second point here). If you think that is a relevant option, consider including an example in the vignette that shows this.

response: They are not planned to be pipeable.

comment: As mentioned above, the README could use more information about how ohun fits into the broader ecosystem, comparing ohun to other packages that provide related functionality and details on the system requirements for seewave. The README is also missing a brief demonstration of functionality (see the sixth point here). The long form examples in the vignettes are useful for a user that is already intending to use ohun but shorter examples in the README help new users understand what ohun is offering.

response: I added more details about how it fits into the broader ecosystem following another comment above. I agree that some examples in the README might be helpful. However, it's is hard to pick up a short example that can fit in the README as sound event detection tend to require several steps

comment: Documentation looks good with all functions having accompanying examples, detailed return values and references included in the vignette. There is good use of @seealso to link users to related functions in ohun. Only one URL needed editing (according to urlchecker), see below under Additional comments/README.

You could consider using vdiffr for testing plots. Currently the tests for functions that produce plots are just testing that they return NULL. This isn't a very useful test, and while testing plots is not straightforward or intuitive, you could see if vdiffr helps! More detailed comments on testing below under Additional comments/Tests.

response: I added vdiffr tests maRce10/ohun@333817cf7934917d398cc33949fba2f5491b0cf5

Additional comments

comment:The following sections have further details from above, and sometimes an accompanying pull request. I found it easier to work through the review by looking at the source code locally, and making changes as I noticed things. I also felt like it would be easier to highlight specific lines of code by just editing them, instead of referring to eg. Lines 83-85 in some file. No pressure to accept any of these pull requests directly, and I hope that you find it a clear and useful approach.

Repository

_response: I ACCEPTED ALL PULL REQUESTS. THANK YOU!

comment:The repository size is ~ 120 mb.I recommend that you delete, at least, the following files:

.Rproj.user/
.Rhistory
..Rcheck/
..pdf
.Rdata
testing/
docs/ (in this case since you are using the pkgdown workflow with the gh-pages branch)

response: pull request accepted

comment: Some of these files were presumably added before they were listed in the .gitignore file. I have deleted them in the following pull request:

maRce10/ohun#12 README

response: pull request accepted

comment: Consider stating explicitly which package provides the parallel processing for ohun in the sentence starting "All functions allow the parallelization of tasks..."

response: Done: "All functions allow the parallelization of tasks (using the packages parallel and pbapply)..." maRce10/ohun@2c332d42c7ff2b893191d139cd7c2ec2e6b59135

The badge for R-CMD-check seems to point to an old workflow. The current workflow (tic) shows failing tests, whereas the R-CMD-check shows passing test. I would recommend removing this old badge if it is no longer relevant.

response: Done maRce10/ohun@46a1eaf00215ca3b94fab3c3a5f246a88456f03d

comment: A couple minor edits to the README are in the following pull request:

maRce10/ohun#13

response: pull request accepted

comment: It might be helpful to users if the items in NEWS were accompanied with references to the relevant pull requests and issues (eg. #5). For example: https://github.com/tidyverse/ggplot2/blob/main/NEWS.md

_response: OK.

comment: Vignette As mentioned above, consider splitting the single vignette into multiple vignettes:

Get started vignette
    Background and theory
    How does ohun fit in
    What other packages does ohun interface with
    Overview of functions provided by ohun
Energy based detection
Template based detection
FAQ / best practices / advice / tips

response: As mentioned I also split the vignette in 3: introduction, energy based detection, template based detection. maRce10/ohun@1b9c95f1ed0e83f466014b8c2ced550a2e3421a3

comment: Minor proposed edits to the vignette are in the following pull request:

maRce10/ohun#14 Contributing response: pull request accepted

The CONTRIBUTING.md document looks like it's based on the template from https://contributing.md/. Because of this, there were some places that links to sections didn't work, and it often still referred to the CONTRIBUTING.md version.

response: pull request accepted

comment: There isn't a code of conduct in the repository to link to, so I removed that dead link. But please consider including one (see the rOpenSci pages for more details: Contributing Guide, Code of Conduct).

response: pull request accepted

Relevant pull request:

maRce10/ohun#15 Documentation

comment: I propose removing the "last modification date" from the R/ files since this information is redundant and more error prone than the information provided by Git. This change is included in the pull request under R.

response: Done maRce10/ohun@88fdf9140b5c909243b41681527bfadb1bdfff4a

DESCRIPTION

comment: I suggest moving packages under "Depends" to "Imports". The case of adding packages to the Depends field does not seem like a universal yes/no clear cut decision but from my understanding and reading around this topic, I feel like for the ohun package it is not necessary to include packages in the Depends field.

From https://devguide.ropensci.org/building.html#pkgdependencies

Use Imports instead of Depends for packages providing functions from other packages.

For an example where it might be relevant to list a package in Depends:

From https://r-pkgs.org/dependencies-mindset-background.html#sec-dependencies-imports-vs-depends

A more classic example of Depends is how the censored package depends on the parsnip and survival packages. Parsnip provides a unified interface for fitting models, and censored is an extension package for survival analysis. Censored is not useful without parsnip and survival, so it makes sense to list them in Depends.

In the case of ohun, some of the packages listed under Depends also list some packages under Depends:

From warbleR
    tuneR, seewave (>= 2.0.1), NatureSounds
From tuneR
    None
From seewave
    None
From NatureSounds
    knitr

The full dependency tree is here: Click to expand

If the recommendation to move some packages to move Imports instead of Depends, the following issues in the examples can be resolved, for example in feature_acoustic_data.R:

could not find function "writeWave" fixed by adding library(tuneR) at the top of the example

response: pull request accepted

comment: Minor suggested edits to the Description include:

Removing non-standard fields including "Date/Publication",
"Repository", "Config/testthat/edition"
Adding the link to the repository under URL
Using line breaks to make the Description file easier

Relevant pull request:

maRce10/ohun#16 Tests

response: pull request accepted

comment: I suggested some edits to the test scripts to better take advantage of the variety and precision of available functions from testthat. Often, the tests were expect_true where you set up a logical statement, eg. nrow(DT) == 3. This works but when it fails, there is nothing informative about the error. Alternatively, if you set this test using expect_length then you would get a numeric difference between expectation and result. This might make it easier to identify why or how the test is failing.

response: Added a few more tests maRce10/ohun@333817cf7934917d398cc33949fba2f5491b0cf5

comment: I also suggest using a skip_on_windows function.

response: pull request accepted

comment: Writing tests can be really hard (and rewarding!) - I found these resources helpful:

https://r-pkgs.org/testing-design.html#what-to-test
https://devguide.ropensci.org/building.html#testing
https://mtlynch.io/good-developers-bad-tests/

Specifically in the first link "avoid testing simple code [...] test what you're not sure about, what's fragile, ...". This is a great opportunity for you to put your deep understanding about this kind of analysis to work, because you know when results don't look right, beyond the basic "is it a data.frame", etc. To this point, a lot of tests are just length checks - do you have any other expectations from the functions other than that they will return the expected length?

I've adjusted some of the tests where I noticed an alternative function that might be more informative. Here's the pull request:

maRce10/ohun#17

response: pull request accepted

Note: if packages are moved from Depends, it may require them to be explicitly loaded with library() in the tests where their functions are used. R

I don't really understand why custom functions stop2 and warning2 are being used. They return the error or warning message, without the call being printed. Would this possibly make it harder for a new user to troubleshoot, if where the message originates from? stop2 is used 105 times across 14 files and warning2 is used 3 times across 3 files.

For example with stop2:

R: label_detection(reference = matrix(lbh_reference[-1, ]), detection = lbh_reference) Error: 'reference' is not of a class 'data.frame' or 'selection_table'

and with stop:

R: label_detection(reference = matrix(lbh_reference[-1, ]), detection = lbh_reference) Error in label_detection(reference = matrix(lbh_reference[-1, ]), detection = lbh_reference) : 'reference' is not of a class 'data.frame' or 'selection_table'

_response: I copied this from the package brms. I like the message to be less wordy. But most important, having the function name is not useful when the error comes from an internal function, which is often the case. RElated to this, I added more detailed argument evaluations with the new internal function checkarguments() maRce10/ohun@9fdd16338cfbda7cb096e9f3d6f737d6517fe714

comment:Consider using seq_len instead of 1:nrow, 1:length, 1:ncol, ...

For example:

x <- data.frame(LETTERS[1:5])

1:nrow(x)

> [1] 1 2 3 4 5

seq_len(nrow(x))

> [1] 1 2 3 4 5

x <- data.frame()

1:nrow(x)

> [1] 1 0

seq_len(nrow(x))

> integer(0)

response: I did when earlier during the first default ROpenSci revision

Related to dependencies described above under Description:

internal functions from another package are used in 12 cases, though given all 12 are from warbleR and the maintainer of ohun is also the maintainer of warbleR this seems likely ok

response: OK

comment:There is mixed used of importFrom and ::. If packages listed under Depends are moved to Imports, I would suggest using the :: syntax because it helps a reader understand if functions are from in ohun and, if not, which package they come from.

response: Done.

There is commented code in

optimize_template_detector L226 - 231
split_acoustic_data L265-267

response: Removed maRce10/ohun@9fdd16338cfbda7cb096e9f3d6f737d6517fe714

Related to @sammlapp's review, the ohun could be improved by refactoring some functions. For example, there are nested functions in the following scripts:

detect_FUN within energy_detector
internal_feature_reference within feature_reference
env_ohun_int within get_envelopes
XC_FUN and spc_FUN within template_correlator

_response: I move those functions to "internalfunctions.R" maRce10/ohun@7c4709280fdb2f68a84d4463a086c4724272040c. maRce10/ohun@4ca47bd993b95cecccfd09c490d3605137b1605d

Thank you for the very useful comments. Could you please add yourself to the package DESCRIPTION? (do not know your full name :) )

jhollist commented 1 year ago

Thanks for your responses to the reviews, @maRce10!

@sammlapp and @robitalec could you both take a look and let me know what you think? You can use the reviewer approval template to indicate your response: https://devdevguide.netlify.app/approval2template.html

maRce10 commented 1 year ago

@robitalec I split the vignette in 3 but now the tarball is 15 MB. Do you have any suggestion on how to shrink the package a bit?

robitalec commented 1 year ago

Thanks @maRce10 for the detailed response, and for accepting those PRs. I am grateful that approach worked well for you too, I found it easier to look at logical chunks of changes as PRs personally.

Reviewer Response

Following the same headers as above:

A statement of need

It appears that while these changes have been added to the README.Rmd, they are not present in the README.md (and therefore not on the GitHub or pkgdown home page). Consider using a pre-commit hook to make sure these files stay synchronized (see ?usethis::use_readme_rmd())

Installation instructions

same as above under "A statement of need", these changes are not reflected in the README, GitHub page, or website
Packaging guidelines

Nice to see the {vdiffr} use.

Should {vdiffr} be added to the Suggests list instead of Depends like {testthat}? If so, you can remove the @importFrom call for vdffir::expect_doppelganger.

Relatedly though, the files under tests/testthat/_snaps are quite large:

get-templates.svg 36.9 MB
get-templates-env.svg 38.2 MB

Is there any way to reduce the size of these SVGs?

Additional comments

Some files I deleted in the accepted PR have resurfaced/are still present in the repository:

.Rproj.user
.Rhistory,
docs/

stop2, warning2

Ok for removing the call. My initial concern was that it would be harder for a user to follow the traceback, but I see now that the .call argument doesn't impact that. Since a user can still troubleshoot using the traceback, this should be fine.

stop2 <- function(...) {
    stop(..., call. = FALSE)
}
test <- function() {
    stop2('test error')
}
test()
#> Error: test error
traceback()
#> 3: stop(..., call. = FALSE) at #2
#> 2: stop2("test error") at #1
#> 1: test()

New comments

Tests

Two directories are in the Rbuildignore (https://github.com/maRce10/ohun/blob/master/.Rbuildignore#L4-L5):

testing (doesn't seem to be used)
tests (added in this commit)

This means that the tests aren't being run on CRAN

eg. https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-debian-clang/ohun-00check.html

Compare this to eg. {dplyr}

https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-debian-clang/dplyr-00check.html

and we can see the section "checking tests"...

Surprised CRAN didn't notice there were no tests in the submitted package, but maybe it isn't a requirement of CRAN packages I'm not sure. It would be good to have tests running on CRAN.

Vignettes

If we run usethis::use_vignette("test") we get the following changes:

✔ Setting active project to '/home/alecr/Documents/Local-git/ohun'
✔ Adding 'inst/doc' to '.gitignore'
✔ Adding '*.html', '*.R' to 'vignettes/.gitignore'
✔ Writing 'vignettes/test.Rmd'
• Modify 'vignettes/test.Rmd'

Most importantly, a file is written vignettes/.gitignore and to it two lines are added (ignoring rendered HTML and .R from the repo)

*.html
*.R

You could consider deleting files that correspond to this wildcard then adding this new .gitignore file.

Two options I can think of for reducing the size of your vignettes:

Adjust your output options

For example, the original output options for intro_to_ohun.Rmd (8 MB)

output:
  rmarkdown::html_document:
    css: extra.css
    theme: null
    self_contained: yes
    toc: true
    toc_depth: 3
    toc_float:
      collapsed: false
      smooth_scroll: true

8 MB -> 7.1 MB using theme: null and removing toc_float

output:
  rmarkdown::html_document:
    css: extra.css
    self_contained: yes
    toc: true
    toc_depth: 3

7.1 MB -> 3.4 MB dropping css (might be due to fonts)

output:
  rmarkdown::html_document:
    theme: null
    self_contained: yes
    toc: true
    toc_depth: 3

3.4 MB --> 2.5 MB switching from output type html_document to html_vignette

output:
  rmarkdown::html_vignette:
    self_contained: yes
    toc: true
    toc_depth: 3

You could keep these changes from CRAN, then apply the CSS independently to your site?

Or you could consider using articles instead of vignettes: https://r-pkgs.org/vignettes.html#sec-vignettes-article

Here is my author chunk from other packages.

    c(person(given = "Alec L.",
             family = "Robitaille",
             role = ,
             email = "robit.alec@gmail.com",
             comment = c(ORCID = "0000-0002-4706-1762"))

Let me know what you think of these additional comments and responses. Thank you!

jhollist commented 1 year ago

@maRce10 Can you take a look at @robitalec responses and revise/respond to each?

@sammlapp could you take a look at the revisions and let me know what you think? You can use the reviewer approval template to indicate your response: https://devdevguide.netlify.app/approval2template.html

Thanks!

maRce10 commented 1 year ago

Hi @robitalec

Thanks again! I added you as a contributor and move vdiffr to "suggests" (maRce10/ohun@50b7a55b12a2d13ce2865ccb2e0f6f4b8afea8f2), render the readme (maRce10/ohun@7f530964cedea39a091ab476f03c7079fe9716c0) and added a vignette gitignore file (maRce10/ohun@8bc198ac25938829bb1e0e4976a48ed12037e52b).

Will change the vignette size before submitting to CRAN. Any suggestion on how to keep the 2 versions of the vignettes (CRAN and site) without having to change manually the yaml header?

cheers

sammlapp commented 1 year ago

Reviewer Response

Final approval (post-review)

[x] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 10 (second review: 1)

jhollist commented 1 year ago

All,

I think we are very close to finishing this up!

@sammlapp thank you for the reviewer response! @robitalec could you provide a final reviewer response based on the most recent changes? The link to that form is https://devdevguide.netlify.app/approval2template.

@maRce10 Instead of listing the reviewers as contributors, I think it best to have them as reviewers. The role in that case would be "rev". For the purposes of rOpenSci, the vignettes do not need to adhere to the size suggestions for CRAN. Although I agree with @robitalec suggestion that you reduce their size prior to submission to CRAN. I like the suggestion to consider using these only as articles on the pkgdown site. Another place to look for more on that is https://usethis.r-lib.org/reference/use_vignette.html. The decision on that is yours and can take place after we complete the review and prior to submission to CRAN.

Once we get these last few things resolved, I think we are good to go!

robitalec commented 1 year ago

Thanks @maRce10! Looks good, thanks for considering the suggestions.

Thanks for clarifying @jhollist that the vignettes do not need to adhere to CRAN size suggestions.

Reviewer Response

Final approval (post-review)

[X] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 17

maRce10 commented 1 year ago

Awesome! Is there anything else you need from me end at this point?

jhollist commented 1 year ago

Just make the change in the DESCRIPTION on @robitalec role from ctb to rev, and go ahead and add @sammlapp as rev as well. Once that is done, we are good to go!

maRce10 commented 1 year ago

OK I just did it maRce10/ohun@cfa533cd16bb650274de0a7dc45a5b886c91b74d

jhollist commented 1 year ago

@ropensci-review-bot approve ohun

ropensci-review-bot commented 1 year ago

Approved! Thanks @maRce10 for submitting and @sammlapp, @robitalec for your reviews! :grin:

To-dos:

[ ] Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You will need to enable two-factor authentication for your GitHub account. This invitation will expire after one week. If it happens write a comment @ropensci-review-bot invite me to ropensci/<package-name> which will re-send an invitation.
[ ] After transfer write a comment @ropensci-review-bot finalize transfer of <package-name> where <package-name> is the repo/package name. This will give you admin access back.
[ ] Fix all links to the GitHub repo to point to the repo under the ropensci organization.
[ ] Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
[ ] If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
- deactivate the automatic deployment you might have set up
- remove styling tweaks from your pkgdown config but keep that config file
- replace the whole current pkgdown website with a redirecting page
- replace your package docs URL with https://docs.ropensci.org/package_name
- In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar, https://github.com/ropensci/foobar
[ ] Skim the docs of the pkgdown automatic deployment, in particular if your website needs MathJax.
[ ] Fix any links in badges for CI and coverage to point to the new repository URL.
[ ] Increment the package version to reflect the changes you made during review. In NEWS.md, add a heading for the new version and one bullet for each user-facing change, and each developer-facing change that you think is relevant.
[ ] We're starting to roll out software metadata files to all rOpenSci packages via the Codemeta initiative, see https://docs.ropensci.org/codemetar/ for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.
[ ] You can add this installation method to your package README install.packages("<package-name>", repos = "https://ropensci.r-universe.dev") thanks to R-universe.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent).

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @ropensci/blog-editors in your reply. They will get in touch about timing and can answer any questions.

We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved.

Last but not least, you can volunteer as a reviewer via filling a short form.

jhollist commented 1 year ago

Thank you all for your hard work on this. Great to get it approved! I don't have much to add. The important details are above. @maRce10 if you have any questions, feel free to ping my in this issue or you can reach me via email hollister . jeff epa . gov.

Congratulations!

maRce10 commented 1 year ago

@ropensci-review-bot invite me to ropensci/ohun

ropensci-review-bot commented 1 year ago

Invitation sent!

maRce10 commented 1 year ago

@ropensci-review-bot finalize transfer of ohun

ropensci-review-bot commented 1 year ago

Transfer completed. The ohun team is now owner of the repository and the author has been invited to the team

ropensci / software-review

ohun #568

Archive: TBD Version accepted: TBD Language: en

Scope

Technical checks

Publication options

Code of conduct

Package Review

Documentation

Functionality

Review Comments

Documentation

Functionality

Additional comments

Repository

README

NEWS

Vignette

Contributing

Documentation

DESCRIPTION

Tests

R

Response to @sammlapp

split_acoustic_data.R:

' for sound file name (sound.files), selection number (selec), and start and end time of signal

' (start and end).`_

summarize_diagnostic.R:

Notes on the Getting Started tutorial page

Notes on API documentation and variable/function names

get_envelopes :

energy_detector:

label_spectro:

feature_reference:

feature_acoustic_data

filter_detection

optimize_template_detector

label_spectro

get_templates

split_acoustic_data

label_detection

template_correlator

full_spectrograms

Errors during usage

Response to @robitalec

A statement of need

Installation instructions

Vignette(s)

Community guidelines

Packaging guidelines

Additional comments

DESCRIPTION

> [1] 1 2 3 4 5

> [1] 1 2 3 4 5

> [1] 1 0

> integer(0)

Reviewer Response

A statement of need

Installation instructions

Packaging guidelines

Additional comments

stop2, warning2

New comments

Tests

Vignettes

Reviewer Response

Final approval (post-review)

Reviewer Response

Final approval (post-review)