[REVIEW]: Fast Resampling and Monte Carlo Methods in Python

editorialbot commented 1 year ago

Submitting author: !--author-handle-->@mdhaber@jbytecode<!--end-editor-- Reviewers: @SaranjeetKaur, @kose-y Archive: 10.5281/zenodo.8031631

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/59d971c467460f42be7168b87a1dfbd4"><img src="https://joss.theoj.org/papers/59d971c467460f42be7168b87a1dfbd4/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/59d971c467460f42be7168b87a1dfbd4/status.svg)](https://joss.theoj.org/papers/59d971c467460f42be7168b87a1dfbd4)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@coatless & @SaranjeetKaur, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @jbytecode know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @SaranjeetKaur

📝 Checklist for @kose-y

editorialbot commented 1 year ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 1 year ago

Software report:

github.com/AlDanial/cloc v 1.88  T=5.24 s (485.0 files/s, 163896.3 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Python                          960          80209         144387         221351
C                               313          13419          33524          77926
Fortran 77                      409           3798          71804          73352
reStructuredText                403           9312           6466          29912
C/C++ Header                    135           3937           7263          18535
Cython                          124           5953           9618          17108
C++                              30           1680           2011          10682
Meson                            88            397            192           4878
JSON                              5             15              0           3303
YAML                             25            217            293           1723
TeX                               5            141            161           1289
diff                              2             44            572            731
INI                               4            221              0            507
Pascal                            3            115              0            466
Bourne Shell                      9             59             89            264
Markdown                          9             64              0            201
SVG                               1              4              0            133
make                              4             40             30            111
TOML                              1             19             46            110
CSS                               1             31             20            106
MATLAB                            5             42             45             94
R                                 1              5             12             67
Bourne Again Shell                2             14             26             48
HTML                              2              5              0             21
Dockerfile                        1              5             31             16
Unity-Prefab                      1              0              0              2
--------------------------------------------------------------------------------
SUM:                           2543         119746         276590         462936
--------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository

editorialbot commented 1 year ago

Wordcount for paper.md is 866

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.2307/2331554 is OK
- 10.1038/s41592-019-0686-2 is OK
- 10.1214/aoms/1177729437 is OK
- 1544-6115.1585 is OK

MISSING DOIs

- None

INVALID DOIs

- None

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jbytecode commented 1 year ago

Dear @coatless, Dear @SaranjeetKaur

This is the review thread. Firstly, type

@editorialbot generate my checklist

to generate your own checklist. In that checklist, there are 23 check items. Whenever you complete the corresponding task, you can check off them.

Please write your comments as separate posts and do not modify your checklist descriptions.

The review process is interactive so you can always interact with the authors, reviewers, and the editor. You can also create issues and pull requests in the target repository. Please do mention this thread's URL in the issues so we can keep tracking what is going on out of our world.

Please do not hesitate to ask me about anything, anytime.

Thank you in advance!

jbytecode commented 1 year ago

@coatless, @SaranjeetKaur - could you please generate your task list and update your status? thank you in advance.

jbytecode commented 1 year ago

@coatless, @SaranjeetKaur - After three weeks from assigning as reviewers, I am still failed to get at least a life signal from you (Are you fine?). Please generate your task lists to start your review. If you are not available, please ping me, so I can find another reviewers to proceed. Thank you in advance.

jbytecode commented 1 year ago

@coatless - no worries, thank you for the response

@SaranjeetKaur - We are waiting to hear from you also.

Thank you in advance.

SaranjeetKaur commented 1 year ago

@jbytecode - I am looking at this thread now

SaranjeetKaur commented 1 year ago

Review checklist for @SaranjeetKaur

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/mdhaber/scipy?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@mdhaber) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[ ] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

mdhaber commented 1 year ago

Since the paper is about specific contributions to a much larger library (as discussed in https://github.com/openjournals/joss-reviews/issues/5047#issuecomment-1377678757), I thought it might be helpful to point out some relevant files.

License: SciPy's license file is available at https://github.com/scipy/scipy/blob/main/LICENSE.txt.

Contributions: The "blame" of scipy.stats._resampling.py and scipy.stats.tests.test_resampling.py reveal some of my contributions to these features over the past two years.

Installation: Installation instructions for the whole library are at https://scipy.org/install/. The list of dependencies is available in the environment.yml file. Please use the latest version of SciPy (1.10.1) for best results.

Documentation: The scipy.stats API reference includes both input/output documentation and examples. More realistic usage examples are available in the scipy.stats tutorials.

Tests: Automated tests are in scipy.stats.tests.test_resampling.py. Continuous integration test results can be seen in any pull request to SciPy, or tests can be run locally using import scipy.stats; scipy.stats.test(verbose=2).

Community Guidlines: Community guidelines are available in the Developer Documentation.

Functionality: One way of verifying the functionality would be to follow the examples in the API reference. For example, the documentation of bootstrap shows that the true (population) value of the statistic lies within the 90% confidence interval approximately 900 times out of 1000 trials. The documentation of permutation_test and monte_carlo_test each compare the p-value produced by the function against the p-value of a comparable hypothesis test function. Many similar examples of real-world use cases are available in the tutorials. Of course, the functionality was also verified when the PRs introducing the functionally were reviewed by SciPy maintainers. The original PRs were: https://github.com/scipy/scipy/pull/13371 https://github.com/scipy/scipy/pull/13899 https://github.com/scipy/scipy/pull/14576 Additional functionality has been added over the past few years; the relevant PRs can be found by reviewing the "blame" of stats.stats._resampling.py.

Performance: The paper claims that these methods are "fast" because "the functions take advantage of vectorized user code, avoiding slow Python loops". There is an easy way to verify this. All three functions accept an argument statistic (callable) and a parameter vectorized (boolean). All of the examples show how the function can be used with a vectorized statistic; that is, one that accepts an $N$ dimensional array as input and returns the statistic of each slice defined by the parameter axis. To see that the functions take advantage of vectorized statistics to improve performance, simply replace statistic with a function that does not have an axis argument, and ensure that vectorized is not passed as an argument.

For example, the relevant parts of the `scipy.stats.bootstrap` example are: ```python3 import numpy as np rng = np.random.default_rng() from scipy.stats import norm dist = norm(loc=2, scale=4) # our "unknown" distribution data = dist.rvs(size=100, random_state=rng) from scipy.stats import bootstrap data = (data,) # samples must be in a sequence # np.std is vectorized %timeit bootstrap(data, statistic=np.std, confidence_level=0.9, random_state=rng) # 12.2 ms ± 2.01 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` If we redefine the statistic so that it is not vectorized: ```python3 # remove the axis argument so that the statistic can only operate on one 1d sample at a time %timeit bootstrap(data, statistic=lambda x: np.std(x), confidence_level=0.9, random_state=rng) # 266 ms ± 14.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` As an independent check, a naive implementation of the (percentile) bootstrap would be: ```python3 n_resamples = 9999 # default for `bootstrap` def naive_bootstrap(data): bootstrap_distribution = [] for j in range(9999): i = rng.integers(len(data), size=len(data)) bootstrap_distribution.append(np.std(data[i])) return np.percentile(bootstrap_distribution, [5, 95]) %timeit naive_bootstrap(data[0]) # 265 ms ± 1.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ```

I hope this helps!

~~@jbytecode would it be appropriate for me to also suggest how some functionality and performance claims might be verified?~~ Done!

jbytecode commented 1 year ago

@mdhaber - of course, addition to this, you can provide some reproducible test benchmark if exists. thank you.

mdhaber commented 1 year ago

Thanks @jbytecode I added notes on how one might begin to verify the functionality and performance claims. I'd be happy to provide other examples if it would help!

jbytecode commented 1 year ago

@coatless, @SaranjeetKaur - Could you please update your status and inform us on how is your review going? Thank you in advance.

SaranjeetKaur commented 1 year ago

Hi @jbytecode - I have reviewed it for general checks and am yet to test the software functionality and documentation part.

jbytecode commented 1 year ago

@SaranjeetKaur - Thank you for responding and updating your status.

jbytecode commented 1 year ago

@coatless, @SaranjeetKaur - Could you please update your status and tell us how is your review going? Thank you in advance.

jbytecode commented 1 year ago

@coatless, @SaranjeetKaur - Is it possible to get a life signal? Could you please update your status? Thank you in advance.

jbytecode commented 1 year ago

Dear @coatless and @SaranjeetKaur,

I am so sorry if I am bothering you. We are a little bit ahead of normal review times. Moreover, I can only reach the reviewers by email which in turn does not give me a significant result.

Please declare your availability and set a deadline in a few days. If you are not able to review this manuscript let me know, so I'll find new reviewers.

Thank you in advance.

SaranjeetKaur commented 1 year ago

@mdhaber - it might be helpful to add link to the references in the section "Statement of need" (that is, for all the 3 questions, when you are sharing examples which were introduced in the Summary)

Also when you are describing the state of the ecosystem before the release of SciPy 1.9.0 and how it was partially met by tutorials, blog posts, medium.com, and niche packages - share some references here too, if possible.

mdhaber commented 1 year ago

Thanks @SaranjeetKaur; I've attempted to address your comments in mdhaber/scipy#101. Please let me know if that looks good to you.

SaranjeetKaur commented 1 year ago

@mdhaber - I have reviewed the rendered versions and those look good to me. Thanks for your work!

SaranjeetKaur commented 1 year ago

@jbytecode - Apologies that this has taken longer than I expected! I have completed the review of this work and it looks good to me now. What happens next?

mdhaber commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jbytecode commented 1 year ago

@SaranjeetKaur - thank you for finishing your review. we are now just waiting @coatless to finish their tasks.

@coatless - could you please update your status? review process of this manuscript is a little bit out of normal time intervals. thank you in advance

jbytecode commented 1 year ago

@coatless - May I kindly ask you to come back and finalize your review? Thank you in advance.

jbytecode commented 1 year ago

Hello everyone! I had contacted the esteemed @coatless, who was assigned as a reviewer for this submission, once via email and received a response, fulfilling part of the refereeing process. However, despite sending several emails and calling from here, I have not received a response again. In this situation, I would like to inform that if I do not receive feedback by Friday, May 5th, 2023, I will have to look for a new reviewer.

jbytecode commented 1 year ago

@turgeonmaxime, @kose-y - Hi again, I had invited you to review this submission, but unfortunately, I did not receive a response from you in a timely manner. Currently, one of our reviewers is unresponsive, and I am in urgent need of finding a replacement. Would one of you be willing to assist in reviewing this submission for JOSS now?

kose-y commented 1 year ago

Yes, I am available.

jbytecode commented 1 year ago

@editorialbot add @kose-y as reviewer

editorialbot commented 1 year ago

@kose-y added to the reviewers list!

jbytecode commented 1 year ago

@kose-y - thank you! could you please generate your tasklist by typing

@editorialbot generate my checklist

jbytecode commented 1 year ago

@editorialbot remove @coatless from reviewers

editorialbot commented 1 year ago

@coatless removed from the reviewers list!

jbytecode commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

mdhaber commented 1 year ago

Thanks very much @kose-y. I thought I'd point out https://github.com/openjournals/joss-reviews/issues/5092#issuecomment-1445240137, which I wrote to facilitate review.

kose-y commented 1 year ago

Review checklist for @kose-y

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/mdhaber/scipy?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@mdhaber) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

jbytecode commented 1 year ago

@kose-y - Just pinging, sorry if I am bothering. How is your review going? Could you please update your status and inform us? Thank you in advance!

kose-y commented 1 year ago

I have completed software paper and general checks part, moving on to the software functionality and documentation part.

jbytecode commented 1 year ago

Dear @kose-y

Could you please update your status?

Thank you in advance.

kose-y commented 1 year ago

I will finish it by this weekend.

kose-y commented 1 year ago

I have completed the review. Looks good to me for publication.

jbytecode commented 1 year ago

@kose-y - thank you so much for your contributions

jbytecode commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.2307/2331554 is OK
- 10.1038/s41592-019-0686-2 is OK
- 10.1214/aoms/1177729437 is OK
- 1544-6115.1585 is OK

MISSING DOIs

- None

INVALID DOIs

- None

jbytecode commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

openjournals / joss-reviews