whedon commented 3 years ago

Submitting author: @tobiasschoch (Tobias Schoch) Repository: https://github.com/tobiasschoch/wbacon Version: v0.5 Editor: @fboehm Reviewer: @msalibian, @aalfons Archive: 10.5281/zenodo.4895167

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/6218ee213e8b00273645233145e96cfa"><img src="https://joss.theoj.org/papers/6218ee213e8b00273645233145e96cfa/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/6218ee213e8b00273645233145e96cfa/status.svg)](https://joss.theoj.org/papers/6218ee213e8b00273645233145e96cfa)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@msalibian & @aalfons, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @fboehm know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Review checklist for @msalibian

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@tobiasschoch) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @aalfons

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@tobiasschoch) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

whedon commented 3 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @msalibian, @aalfons it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon commented 3 years ago

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.23 s (190.0 files/s, 37362.7 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
TeX                              4            280            173           3276
C                                8            302            696           1307
R                               15            115            122            850
HTML                             1             95              5            739
Markdown                         3             39              0            139
C/C++ Header                     9             23             17            132
Rmd                              1            119            159             39
YAML                             1              1              0             18
DOS Batch                        1              0              2              2
SVG                              1              0              0              1
-------------------------------------------------------------------------------
SUM:                            44            974           1174           6503
-------------------------------------------------------------------------------

Statistical information for the repository 'a9b8cecb1778454a282290cc' was
gathered on 2021/05/04.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Tobias Schoch                   36          6003           3526          100.00

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Tobias Schoch              2477           41.3          2.4               31.09

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon commented 3 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1016/S0167-9473(99)00101-2 is OK
- 10.1145/567806.567807 is OK
- 10.1137/1.9780898719604 is OK
- 10.1016/j.csda.2004.09.009 is OK
- 10.1080/01621459.1990.10474920 is OK
- 10.1198/jcgs.2009.0005 is OK

MISSING DOIs

- None

INVALID DOIs

- None

fboehm commented 3 years ago

@msalibian and @aalfons - we're now starting the review. You can see above your review checklist. Please check the boxes as you proceed through the review. For any boxes that you can't check right now, please leave comments in this thread. You may also choose to open issues in the package repository. If you do open issues, please be sure to reference them in this thread. Please let me know if you have any questions as you work through the review. Thanks again!

fboehm commented 3 years ago

@msalibian and @aalfons - how is the review going? Is there anything that I can assist with? Please feel free to check off the boxes above as you proceed through the review. Thanks!

msalibian commented 3 years ago

This is a nice contribution (both package and manuscript). I have some minor suggestions about the text, and noted that some items from the checklist above are missing.

In the text, when referring to "large datasets", please be specific if this is regarding the number of cases (rows), variables (columns), or both. In the robustness literature, computational effort ("complexity" in an informal way) usually grows exponentially with the number of columns, while a larger number of cases does not generally increase the cost faster than linearly.
There are references to the "BACON algorithms", it may be helpful for the reader if the Summary mentioned that the two algorithms correspond to either a multivariate location/scatter model, or a linear regression one.
line 12: "superior" is a relative term that requires a comparison (superior to what?), I suggest simply mentioning the method's popularity instead.
line 15: I believe the preferred term in R is "package" (rather than "library").
line 18: "etc.", I suggest listing all the implemented diagnostic methods / tools instead.
"Example usage": currently missing. It would be very useful to include an example illustrating the usage of the package (including how to set tuning parameters), and use it also to compare its performance against that of other BACON implementations.
"State of the field" and "Performance": although the author describes how this software compares to other implementations, a detailed example would be very useful (as I mention above). In particular, it is not currently clear if the "large datasets" for which this package is geared are "large" in terms of the number of cases (rows), the number of variables (columns), or both.
"Community guidelines" are missing.

tobiasschoch commented 3 years ago

@msalibian - Thank you for the very helpful remarks. Before I’m going to address the points raised, I have some questions for the editor.

@fboehm - I have the following questions:

I’m planning to do the following (while working through the issues raised by the reviewers):
- Make changes in the documents/ code (in the repository)
- Add a new version tag to the repository
- Call [at]whedon set [new version] as version
- Call [at]whedon generate pdf
- Is that right?
“Example usage”: The package has a vignette. In the vignette, the methods are applied to example data. Should I include the examples or some of the examples in the paper as well? Or should I just note in the paper that there is a vignette?
“Community guidelines”: These guidelines can be found in the README.md on https://github.com/tobiasschoch/wbacon. I put them there such that they show up on the GitHub site (this is the place where most people will “land” while browsing/ searching the internet). Should I include the “community guidelines” in the paper as well?
"Performance":
- I test my implementation against (the existing code) robustX::BACON; see folder test (focus: does my implementation give the same results?).
- In addition, there are some benchmarks; see test/benchmark (used to “see” the impact of the OpenMP parallelization).
- However, there are yet no benchmarks, where I compare my implementation against robustX::BACON in terms of performance/ speed. I’m planning to add such a benchmark. Then, I note in the paper that such benchmark is available. Is that ok?

fboehm commented 3 years ago

@tobiasschoch - Thanks for your questions. Below I try to answer them.

Don't worry about making the tagged release or interacting with whedon. I'll guide you through that when it's time. For now, just concentrate on your first bullet point. If you want to see how the pdf looks after your changes are made, you can tell whedon to "generate pdf", but don't create a new release until I ask you to do so.
I'm not sure about this one. I think the paper.md might be better off with a mention of the vignette. You probably don't need to include examples in the paper.md file. @msalibian - do you have thoughts on this?
I think that including guidelines in the README.md is sufficient. However, you might also want to add a CONTRIBUTING.md file to the repository. For example, see this: https://github.com/tidyverse/dplyr/blob/master/.github/CONTRIBUTING.md. You don't need to include community guidelines in the paper.md. There is a function in the R package usethis that can create such a file from a template: usethis::use_tidy_contributing. You might want to edit the resulting file.
Yes, I think that mentioning in paper.md that you have the tests and benchmarks in the package is sufficient. You might also include unit tests, possibly by using the R package testthat. @msalibian - do you agree? Do you have more specific comments to add on this point?

fboehm commented 3 years ago

@aalfons - how is the review going? Please let me know if you encounter any difficulties.

msalibian commented 3 years ago

@fboehm @tobiasschoch About the above:

2 - I don't have a strong opinion on this. A reference to the vignette would probably be enough. Although in that case I think it would be helpful to include in the paper a few sentences describing / quantifying the performance gain that can be expected with this implementation. Something like "When the sample size is larger than XXX, using this implementation often results in a speed gain of YYY% compared with such and such..."

4 - Agreed.

aalfons commented 3 years ago

@fboehm The only difficulty is finding time. I'll get to it by the end of the week. My apologies for the delay, I was not aware that it needs to be done so quickly.

tobiasschoch commented 3 years ago

@fboehm @msalibian @aalfons

I'm greatful for all the hints and comments. Now I understand how the review process works at JOSS.
Frederick, I don't mind if the review takes more time than usual. The reviewers are busy professors (so am I). I prefer a detailed review to a quick review. We should give aalfons more time if he needs it.
Next, I'm going to address the points raised by msalibian.

fboehm commented 3 years ago

@aalfons - I apologize if I implied that I expected the review to be done by now. That wasn't my intention. I merely wanted to ask if you'd gotten started or encountered difficulties. Sometimes reviewers are unsure how to start the review, especially since we use a nontraditional format at JOSS. If any questions arise, please let me know. Thanks again!!

fboehm commented 3 years ago

@tobiasschoch - that all sounds good. Please feel free to address the issues that @msalibian raised as the next step. Thanks!!

whedon commented 3 years ago

:wave: @msalibian, please update us on how your review is going (this is an automated reminder).

whedon commented 3 years ago

:wave: @aalfons, please update us on how your review is going (this is an automated reminder).

aalfons commented 3 years ago

The package seems to provide useful functionality and is well documented, and the manuscript describes the functionality and the need for it nicely. I still have some comments, some of which I realized afterwards are also brought up by @msalibian.

Installation instructions: I received the following error message when trying to install the package:

In file included from fitwls.c:20: ./fitwls.h:5:10: fatal error: 'omp.h' file not found

The latest version of R for Mac now rely only on the compilers shipped with Apple's XCode developer platform, and unfortunately it seems that Apple has disabled OpenMP support there by default, see https://mac.r-project.org/openmp/ . I was not aware of that.

I have of course fixed OpenMP support now as per instructions on the above website, and the package installed as per the documentation. But there may be other users who run into the same issue and give up on your package.

I believe the recommended way to include OpenMP in the C++ header files is the following:

ifdef _OPENMP

include

endif

This is a simple fix, so please go ahead and implement it. While users without OpenMP installed then do not have access to parallel computing when using your package, at least they can install and use it.

Performance: line 36: What size of the data set (number of observations, number of variables) is approximately necessary for implementation inefficiencies to matter?
Automated tests: There are some scripts in the tests/ directory that seem to reproduce the results of a paper and contain some speed tests of the computational performance, but I did not find any unit tests that the functions produce the correct output and so on. On the other hand, the documentation contains some examples, and it can be automatically checked whether these produce warnings or errors via R CMD check. So in that sense, there would be some minimal automated testing of the functionality. @fboehm Could you please provide some guidance on this point?
State of the field: The authors mention only other packages that implement the same algorithms. Maybe it would be good to also mention some other popular packages that provide other algorithms for anomaly detection? (Although this could quickly go out of hand, as there are many algorithms for this purpose.)

General comments:

line 12: the statement that the BACON algorithms are "superior" would imply that they are always better than other anomaly detection algorithms. Please rephrase this.
lines 15-17: Even though the function to load a package is called library(), which causes some confusion about the terminology, the preferred term is "package" in R. Please change this throughout the paper.
line 66/67: How should one proceed if the diagnostic tools indicate that the "good" observations violate the structure that is required by the algorithm?

tobiasschoch commented 3 years ago

I modified the paper according to @msalibian suggestions (I'm going to address the points raised by @aalfons later)

Issue 1

Reviewer: In the text, when referring to "large datasets", please be specific if this is regarding the number of cases (rows), variables (columns), or both. In the robustness literature, computational effort ("complexity" in an informal way) usually grows exponentially with the number of columns, while a larger number of cases does not generally increase the cost faster than linearly.
Answer: In the revised manuscript, I now point out that the time complexity of the algorithms is dominated by the number of variables/ columns. Therefore, the implemented parallelization over the columns is meaningful. See also the new chapter on benchmarking.

Issue 2

Reviewer: There are references to the "BACON algorithms", it may be helpful for the reader if the Summary mentioned that the two algorithms correspond to either a multivariate location/scatter model, or a linear regression one.
Answer: Yes, I now mention the two methods.

Issue 3

Reviewer: line 12: "superior" is a relative term that requires a comparison (superior to what?), I suggest simply mentioning the method's popularity instead.
Answer: Yes, I deleted the ambiguous term “superior”.

Issue 4

Reviewer: line 15: I believe the preferred term in R is "package" (rather than "library").
Answer: Yes, I now talk about “package” not “library”.

Issue 5

Reviewer: line 18: "etc.", I suggest listing all the implemented diagnostic methods / tools instead.
Answer: I chose to stick with the (rather unspecific) “etc.” instead of listing all methods as this would have made the summary bulky. However, I now added a chapter called “Illustration” where I show an application of some of the methods and refer the reader to the vignette to learn more.

Issue 6

Reviewer: "Example usage": currently missing. It would be very useful to include an example illustrating the usage of the package (including how to set tuning parameters), and use it also to compare its performance against that of other BACON implementations.
- Answer: I added a small example on robust regression. This is meant as a kind of teaser to get readers interested in the vignette.

Issue 7

Reviewer: "State of the field" and "Performance": although the author describes how this software compares to other implementations, a detailed example would be very useful (as I mention above). In particular, it is not currently clear if the "large datasets" for which this package is geared are "large" in terms of the number of cases (rows), the number of variables (columns), or both.
Answer: I added a chapter on “benchmarking”, where I compare my implementation with the one in the robustX package (reference) in terms of computation time for data sets in various sizes. From the benchmarking, we can learn that my implementation outperforms the reference on medium to large data sets.

Issue 8

Reviewer: "Community guidelines" are missing.
Answer: I already had the file “CONTRIBUTING.md” in the base directory of the package. Now I added a chapter on "community guidelines" to the paper.

tobiasschoch commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

tobiasschoch commented 3 years ago

@aalfons - Thank you for the very helpful remarks. Your comments reached me while I was responding to the points raised by msalibian. I will now address the points you have raised.

Issue 1

Reviewer: Installation instructions: I received the following error message when trying to install the package [...].

Answer: Yes, indeed I forgot to add the (conditional) include guards. This is now fixed. I added the include guards in the C header files.

Issue 2

Reviewer: Performance: line 36: What size of the data set (number of observations, number of variables) is approximately necessary for implementation inefficiencies to matter?

Answer: In the mean time I have updated the paper. In the "Benchmarking" section, I compare my implementation with the reference implementation robustX in terms of computation time for data sets in various sizes. From the benchmarking, we can learn that my implementation outperforms the reference on medium to large data sets.

Issue 3

Reviewer: Automated tests: There are some scripts in the tests/ directory that seem to reproduce the results of a paper and contain some speed tests of the computational performance, but I did not find any unit tests that the functions produce the correct output and so on. On the other hand, the documentation contains some examples, and it can be automatically checked whether these produce warnings or errors via R CMD check. So in that sense, there would be some minimal automated testing of the functionality. @fboehm Could you please provide some guidance on this point?

Answer: I am a believer in test-driven development. I distinguish between two (idealized) types of tests:

Unit test: Test what the code is. The purpose of unit testing is to isolate the smallest testable parts of the code base and verify whether they function properly in isolation.
Functional test: Test what the code does. A tester is not concerned with the actual code, rather s/he wants to verify the output with the expected output.

As a pragmatic programmer I take the following view: Some functions are suitable for unit testing, others less so.

I do have unit tests for the functions wquantile and wselect. Because these functions are also used in other R packages, they reside in their own GitHub repo called "wquantile". There, you can find unit tests based on the C language CHEAT test library of Guillermo Freschi and Sampsa Kiiskinen; see https://github.com/tobiasschoch/wquantile/blob/master/tests/test.c.
The vast majority of code in the "wbacon" repo is tested under the "functional testing" paradigm for the following reason: The BACON algorithms are based on the basic "building blocks": mean, covariance matrix, linear regression, etc. In my package, these building blocks are computed with the help of the LAPACK and BLAS subroutines. More precisely,
- the covariance matrix is based on dsyrk => computes the lower triangular scatter matrix (see wbacon.c).
- the regression coefficients are computed with dgels; the residuals are computed with dgemv (see fitwls.c).
- ...

We could, in principle, follow the unit-test paradigm and write separate unit tests for the covariance matrix, regression, and so on. However, this would ultimately boil down to testing the subroutines in BLAS and LAPACK. Neither is my package the place to test these subroutines, nor would such a testing strategy increase confidence in my functions. Of course, that does not rule out the possibility that I mixed up the order of the arguments when calling the subroutines. But the compiler (and valgrind) would have told me if I had actually done it. So I decided to take a "functional test" approach and test the ensemble of building blocks. In total, the BACON algorithms are tested on 159 different (real) test sets to match the results of the (reference implementation) in the robustX package.

Issue 4

Reviewer: State of the field: The authors mention only other packages that implement the same algorithms. Maybe it would be good to also mention some other popular packages that provide other algorithms for anomaly detection? (Although this could quickly go out of hand, as there are many algorithms for this purpose.)

Answer: Yes, this could indeed go out of hand quickly... I focused on the BACON algorithms and did not intend to write a review article on multivariate outlier detection. @fboehm: Shall I discuss other methods?

Issue 5

Reviewer: line 12: the statement that the BACON algorithms are "superior" would imply that they are always better than other anomaly detection algorithms. Please rephrase this.

Answer: Yes, this has been fixed; see response to points raised by msalibian.

Issue 6

Reviewer: lines 15-17: Even though the function to load a package is called library(), which causes some confusion about the terminology, the preferred term is "package" in R. Please change this throughout the paper.

Answer: Yes, this has been fixed; see response to points raised by msalibian.

Issue 7

Reviewer: line 66/67: How should one proceed if the diagnostic tools indicate that the "good" observations violate the structure that is required by the algorithm?

Answer: Well... all I can do (see also vignette) is to quote the developpers of the BACON algorithms: “Although the algorithms will often do something reasonable even when these assumptions are violated, it is hard to say what the results mean.” Billor et al. (2000, p. 290). It is better to be "safe than sorry" and apply another method. I think I have made it sufficiently clear when the method can and cannot be used; see vignette.

fboehm commented 3 years ago

Thank you for addressing the issues raised by the reviewers. @aalfons and @msalibian - do you feel that the authors have made sufficient changes to meet the requirements? Please feel free to continue to check boxes from the checklist once you're satisfied.

msalibian commented 3 years ago

@fboehm Thanks @tobiasschoch for addressing my concerns and suggestions. I have no pending items in the check list. I am satisfied with the new version of the manuscript, and only have a few minor suggestions about writing style, which I leave here in case they are helpful.

The quote on lines 80-82 seems out of place, there is no reference to it in the text either before or after it;
Line 83, "tools to study potentially outlying observations", I'd say "tools to identify potentially...";
Line 96, "we study the BACON algorithm", I'd instead say: "we illustrate the use of the BACON algorithm";
Line 116, "distances / discrepancies", are these distances (as in Mahalanobis distances), or standardized residuals? The fact that the reference distribution is a Student's T makes me suspect they are the latter. It would be useful to make the text as precise as possible;
Lines 138-139, it'd be appropriate to include here that this plot is also included in the plot method for lmrob objects in package robustbase;
Line 162: wBACON_reg_reg(), should this be wBACON_reg() instead?
Line 177, "easily outperforms", a less informal phrase would be to just use "outperforms", or "clearly outperforms", etc.;
Lines 188-189, I'm not sure it is necessary to include in the manuscript the suggestion to search existing issues in the software repository before submitting a new one.

aalfons commented 3 years ago

@fboehm I'm also satisfied with the new version of the manuscript and have no more unchecked items. Thanks @tobiasschoch for addressing all the comments.

I also have a few minor suggestions regarding the text to add to those of @msalibian which I leave here in the same spirit in case they are helpful:

Line 16: "R statistical software" -> "statistical software R"
Line 168: "... of the project GitHub repository" - it might be useful to include the link here.
Line 208: "Compute environment" -> "Computing environment"

tobiasschoch commented 3 years ago

@fboehm @aalfons @msalibian - Thank you for the very helpful comments (and the typos you spotted). I have followed all your suggestions and modified the paper accordingly.

@msalibian - I have clarified the statement about the "distances / discrepancies". The BACON algorithm for robust regression uses what Billor et al. (2000) call discrepancies: that is, on the set of "good" observations (in the current iteration), the discrepancies are defined as the scaled residuals. For the "bad" observations, the discrepancies are taken to be the scaled (out-of-sample) prediction error. Then, for the next iteration, all observations whose discrepancies are smaller (in absolute value) than the cutoff value (which is defined as a rather "special" Student t-quantile) are selected into the new set of "good" observations. This procedure is repeated until convergence. From the point of rigorous statistical reasoning, the t-quantile cutoff value suggested by Billor et al. (2000) is weird or questionable... but it works well in practice...

tobiasschoch commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

fboehm commented 3 years ago

@tobiasschoch - the reviewers have recommended your manuscript for publication. Thank you to @msalibian and @aalfons for thorough reviews.

fboehm commented 3 years ago

Before we proceed, I have a few small suggestions for the paper.md.

Lines 78-79: This sentence feels awkward. Can you rewrite it to flow a little better while still conveying the message? Perhaps something like this: "We recommend that the user examine both the data structure and the "good" observations to verify that the assumptions hold."
Line 84: Replace "library" with "package".
Section "Community guidelines": I think that this should be removed from the manuscript. I erred in overlooking your above discussion about this. I apologize for that. What you currently have in the README.md is sufficient.
Just a point to consider - it seems that you don't define "breakdown point". Is this something that you want to define in the manuscript?

Once you've addressed these minor points, we will proceed towards publication. Thanks again, @tobiasschoch !

tobiasschoch commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

tobiasschoch commented 3 years ago

@fboehm - Thank you for proofreading the paper. Highly appreciated!

Lines 78-79: This sentence feels awkward. Can you rewrite it to flow a little better while still conveying the message? Perhaps something like this: "We recommend that the user examine both the data structure and the "good" observations to verify that the assumptions hold." Answer: Yes, I have modified the sentence.
Line 84: Replace "library" with "package". Answer: Yes, I have replaced "library" with "package". 3 . Section "Community guidelines": I think that this should be removed from the manuscript. I erred in overlooking your above discussion about this. I apologize for that. What you currently have in the README.md is sufficient. Answer: No problem. I have removed the Section "Community guidelines" from the paper.
Just a point to consider - it seems that you don't define "breakdown point". Is this something that you want to define in the manuscript? Answer: Thank you for pointing this out. I have added a footnote that explains what the breakdown point is.

See new pdf above this post. In addition, I updated the package version. Do I need to call [at]whedon set [new version] as version now?

fboehm commented 3 years ago

@tobiasschoch - thanks so much! We now need you to archive the package, for example, with zenodo. Please report the archive doi, along with the new version number, here. I'll then tell whedon to set the archive doi and version number. I believe that authors don't have permissions to do so via whedon.

tobiasschoch commented 3 years ago

@fboehm - Thank you very much, Frederick! The version is tagged "v0.5" and the DOI is

and the target URL is https://zenodo.org/badge/latestdoi/258328250

@aalfons and @msalibian Thank you for reviewing the paper!

fboehm commented 3 years ago

@whedon set version as v0.5

whedon commented 3 years ago

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@whedon commands

fboehm commented 3 years ago

@whedon set v0.5 as version

whedon commented 3 years ago

OK. v0.5 is the version.

fboehm commented 3 years ago

@whedon check references

whedon commented 3 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1016/S0167-9473(99)00101-2 is OK
- 10.1002/0470055464 is OK
- 10.1145/567806.567807 is OK
- 10.1137/1.9780898719604 is OK
- 10.1002/9781119214656 is OK
- 10.1016/j.csda.2004.09.009 is OK
- 10.1080/01621459.1990.10474920 is OK
- 10.1198/jcgs.2009.0005 is OK

MISSING DOIs

- None

INVALID DOIs

- None

fboehm commented 3 years ago

@whedon set 10.5281/zenodo.4895167 as archive

whedon commented 3 years ago

OK. 10.5281/zenodo.4895167 is the archive.

fboehm commented 3 years ago

@whedon accept

whedon commented 3 years ago

To recommend a paper to be accepted use @whedon recommend-accept

fboehm commented 3 years ago

@whedon recommend-accept

whedon commented 3 years ago

Attempting dry run of processing paper acceptance...

whedon commented 3 years ago

:wave: @openjournals/joss-eics, this paper is ready to be accepted and published.

Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/2356

If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/2356, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true

whedon commented 3 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1016/S0167-9473(99)00101-2 is OK
- 10.1002/0470055464 is OK
- 10.1145/567806.567807 is OK
- 10.1137/1.9780898719604 is OK
- 10.1002/9781119214656 is OK
- 10.1016/j.csda.2004.09.009 is OK
- 10.1080/01621459.1990.10474920 is OK
- 10.1198/jcgs.2009.0005 is OK

MISSING DOIs

- None

INVALID DOIs

- None

fboehm commented 3 years ago

@tobiasschoch - I just recognized that the title of the archive doesn't match the title of the manuscript. Can you please fix this?

openjournals / joss-reviews

[REVIEW]: wbacon: Weighted BACON algorithms for multivariate outlier nomination (detection) and robust linear regression #3238

Status

Reviewer instructions & questions

Review checklist for @msalibian

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @aalfons

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

ifdef _OPENMP

include

endif

Issue 1

Issue 2

Issue 3

Issue 4

Issue 5

Issue 6

Issue 7

Issue 8

Issue 1

Issue 2

Issue 3

Issue 4

Issue 5

Issue 6

Issue 7