openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
699 stars 36 forks source link

[REVIEW]: Fast Resampling and Monte Carlo Methods in Python #5092

Closed editorialbot closed 1 year ago

editorialbot commented 1 year ago

Submitting author: !--author-handle-->@mdhaber<!--end-author-handle-- (Matt Haberland) Repository: https://github.com/mdhaber/scipy Branch with paper.md (empty if default branch): joss_resampling Version: v1.10.0-joss-article Editor: !--editor-->@jbytecode<!--end-editor-- Reviewers: @SaranjeetKaur, @kose-y Archive: 10.5281/zenodo.8031631

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/59d971c467460f42be7168b87a1dfbd4"><img src="https://joss.theoj.org/papers/59d971c467460f42be7168b87a1dfbd4/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/59d971c467460f42be7168b87a1dfbd4/status.svg)](https://joss.theoj.org/papers/59d971c467460f42be7168b87a1dfbd4)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@coatless & @SaranjeetKaur, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @jbytecode know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @SaranjeetKaur

📝 Checklist for @kose-y

editorialbot commented 1 year ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 1 year ago
Software report:

github.com/AlDanial/cloc v 1.88  T=5.24 s (485.0 files/s, 163896.3 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Python                          960          80209         144387         221351
C                               313          13419          33524          77926
Fortran 77                      409           3798          71804          73352
reStructuredText                403           9312           6466          29912
C/C++ Header                    135           3937           7263          18535
Cython                          124           5953           9618          17108
C++                              30           1680           2011          10682
Meson                            88            397            192           4878
JSON                              5             15              0           3303
YAML                             25            217            293           1723
TeX                               5            141            161           1289
diff                              2             44            572            731
INI                               4            221              0            507
Pascal                            3            115              0            466
Bourne Shell                      9             59             89            264
Markdown                          9             64              0            201
SVG                               1              4              0            133
make                              4             40             30            111
TOML                              1             19             46            110
CSS                               1             31             20            106
MATLAB                            5             42             45             94
R                                 1              5             12             67
Bourne Again Shell                2             14             26             48
HTML                              2              5              0             21
Dockerfile                        1              5             31             16
Unity-Prefab                      1              0              0              2
--------------------------------------------------------------------------------
SUM:                           2543         119746         276590         462936
--------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 1 year ago

Wordcount for paper.md is 866

editorialbot commented 1 year ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.2307/2331554 is OK
- 10.1038/s41592-019-0686-2 is OK
- 10.1214/aoms/1177729437 is OK
- 1544-6115.1585 is OK

MISSING DOIs

- None

INVALID DOIs

- None
editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jbytecode commented 1 year ago

Dear @coatless, Dear @SaranjeetKaur

This is the review thread. Firstly, type

@editorialbot generate my checklist

to generate your own checklist. In that checklist, there are 23 check items. Whenever you complete the corresponding task, you can check off them.

Please write your comments as separate posts and do not modify your checklist descriptions.

The review process is interactive so you can always interact with the authors, reviewers, and the editor. You can also create issues and pull requests in the target repository. Please do mention this thread's URL in the issues so we can keep tracking what is going on out of our world.

Please do not hesitate to ask me about anything, anytime.

Thank you in advance!

jbytecode commented 1 year ago

@coatless, @SaranjeetKaur - could you please generate your task list and update your status? thank you in advance.

jbytecode commented 1 year ago

@coatless, @SaranjeetKaur - After three weeks from assigning as reviewers, I am still failed to get at least a life signal from you (Are you fine?). Please generate your task lists to start your review. If you are not available, please ping me, so I can find another reviewers to proceed. Thank you in advance.

jbytecode commented 1 year ago

@coatless - no worries, thank you for the response

@SaranjeetKaur - We are waiting to hear from you also.

Thank you in advance.

SaranjeetKaur commented 1 year ago

@jbytecode - I am looking at this thread now

SaranjeetKaur commented 1 year ago

Review checklist for @SaranjeetKaur

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

mdhaber commented 1 year ago

Since the paper is about specific contributions to a much larger library (as discussed in https://github.com/openjournals/joss-reviews/issues/5047#issuecomment-1377678757), I thought it might be helpful to point out some relevant files.

License: SciPy's license file is available at https://github.com/scipy/scipy/blob/main/LICENSE.txt.

Contributions: The "blame" of scipy.stats._resampling.py and scipy.stats.tests.test_resampling.py reveal some of my contributions to these features over the past two years.

Installation: Installation instructions for the whole library are at https://scipy.org/install/. The list of dependencies is available in the environment.yml file. Please use the latest version of SciPy (1.10.1) for best results.

Documentation: The scipy.stats API reference includes both input/output documentation and examples. More realistic usage examples are available in the scipy.stats tutorials.

Tests: Automated tests are in scipy.stats.tests.test_resampling.py. Continuous integration test results can be seen in any pull request to SciPy, or tests can be run locally using import scipy.stats; scipy.stats.test(verbose=2).

Community Guidlines: Community guidelines are available in the Developer Documentation.

Functionality: One way of verifying the functionality would be to follow the examples in the API reference. For example, the documentation of bootstrap shows that the true (population) value of the statistic lies within the 90% confidence interval approximately 900 times out of 1000 trials. The documentation of permutation_test and monte_carlo_test each compare the p-value produced by the function against the p-value of a comparable hypothesis test function. Many similar examples of real-world use cases are available in the tutorials. Of course, the functionality was also verified when the PRs introducing the functionally were reviewed by SciPy maintainers. The original PRs were: https://github.com/scipy/scipy/pull/13371 https://github.com/scipy/scipy/pull/13899 https://github.com/scipy/scipy/pull/14576 Additional functionality has been added over the past few years; the relevant PRs can be found by reviewing the "blame" of stats.stats._resampling.py.

Performance: The paper claims that these methods are "fast" because "the functions take advantage of vectorized user code, avoiding slow Python loops". There is an easy way to verify this. All three functions accept an argument statistic (callable) and a parameter vectorized (boolean). All of the examples show how the function can be used with a vectorized statistic; that is, one that accepts an $N$ dimensional array as input and returns the statistic of each slice defined by the parameter axis. To see that the functions take advantage of vectorized statistics to improve performance, simply replace statistic with a function that does not have an axis argument, and ensure that vectorized is not passed as an argument.

For example, the relevant parts of the `scipy.stats.bootstrap` example are: ```python3 import numpy as np rng = np.random.default_rng() from scipy.stats import norm dist = norm(loc=2, scale=4) # our "unknown" distribution data = dist.rvs(size=100, random_state=rng) from scipy.stats import bootstrap data = (data,) # samples must be in a sequence # np.std is vectorized %timeit bootstrap(data, statistic=np.std, confidence_level=0.9, random_state=rng) # 12.2 ms ± 2.01 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` If we redefine the statistic so that it is not vectorized: ```python3 # remove the axis argument so that the statistic can only operate on one 1d sample at a time %timeit bootstrap(data, statistic=lambda x: np.std(x), confidence_level=0.9, random_state=rng) # 266 ms ± 14.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` As an independent check, a naive implementation of the (percentile) bootstrap would be: ```python3 n_resamples = 9999 # default for `bootstrap` def naive_bootstrap(data): bootstrap_distribution = [] for j in range(9999): i = rng.integers(len(data), size=len(data)) bootstrap_distribution.append(np.std(data[i])) return np.percentile(bootstrap_distribution, [5, 95]) %timeit naive_bootstrap(data[0]) # 265 ms ± 1.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ```

I hope this helps!

@jbytecode would it be appropriate for me to also suggest how some functionality and performance claims might be verified? Done!

jbytecode commented 1 year ago

@mdhaber - of course, addition to this, you can provide some reproducible test benchmark if exists. thank you.

mdhaber commented 1 year ago

Thanks @jbytecode I added notes on how one might begin to verify the functionality and performance claims. I'd be happy to provide other examples if it would help!

jbytecode commented 1 year ago

@coatless, @SaranjeetKaur - Could you please update your status and inform us on how is your review going? Thank you in advance.

SaranjeetKaur commented 1 year ago

Hi @jbytecode - I have reviewed it for general checks and am yet to test the software functionality and documentation part.

jbytecode commented 1 year ago

@SaranjeetKaur - Thank you for responding and updating your status.

jbytecode commented 1 year ago

@coatless, @SaranjeetKaur - Could you please update your status and tell us how is your review going? Thank you in advance.

jbytecode commented 1 year ago

@coatless, @SaranjeetKaur - Is it possible to get a life signal? Could you please update your status? Thank you in advance.

jbytecode commented 1 year ago

Dear @coatless and @SaranjeetKaur,

I am so sorry if I am bothering you. We are a little bit ahead of normal review times. Moreover, I can only reach the reviewers by email which in turn does not give me a significant result.

Please declare your availability and set a deadline in a few days. If you are not able to review this manuscript let me know, so I'll find new reviewers.

Thank you in advance.

SaranjeetKaur commented 1 year ago

@mdhaber - it might be helpful to add link to the references in the section "Statement of need" (that is, for all the 3 questions, when you are sharing examples which were introduced in the Summary)

Also when you are describing the state of the ecosystem before the release of SciPy 1.9.0 and how it was partially met by tutorials, blog posts, medium.com, and niche packages - share some references here too, if possible.

mdhaber commented 1 year ago

Thanks @SaranjeetKaur; I've attempted to address your comments in mdhaber/scipy#101. Please let me know if that looks good to you.

SaranjeetKaur commented 1 year ago

@mdhaber - I have reviewed the rendered versions and those look good to me. Thanks for your work!

SaranjeetKaur commented 1 year ago

@jbytecode - Apologies that this has taken longer than I expected! I have completed the review of this work and it looks good to me now. What happens next?

mdhaber commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jbytecode commented 1 year ago

@SaranjeetKaur - thank you for finishing your review. we are now just waiting @coatless to finish their tasks.

@coatless - could you please update your status? review process of this manuscript is a little bit out of normal time intervals. thank you in advance

jbytecode commented 1 year ago

@coatless - May I kindly ask you to come back and finalize your review? Thank you in advance.

jbytecode commented 1 year ago

Hello everyone! I had contacted the esteemed @coatless, who was assigned as a reviewer for this submission, once via email and received a response, fulfilling part of the refereeing process. However, despite sending several emails and calling from here, I have not received a response again. In this situation, I would like to inform that if I do not receive feedback by Friday, May 5th, 2023, I will have to look for a new reviewer.

jbytecode commented 1 year ago

@turgeonmaxime, @kose-y - Hi again, I had invited you to review this submission, but unfortunately, I did not receive a response from you in a timely manner. Currently, one of our reviewers is unresponsive, and I am in urgent need of finding a replacement. Would one of you be willing to assist in reviewing this submission for JOSS now?

kose-y commented 1 year ago

Yes, I am available.

jbytecode commented 1 year ago

@editorialbot add @kose-y as reviewer

editorialbot commented 1 year ago

@kose-y added to the reviewers list!

jbytecode commented 1 year ago

@kose-y - thank you! could you please generate your tasklist by typing

@editorialbot generate my checklist

jbytecode commented 1 year ago

@editorialbot remove @coatless from reviewers

editorialbot commented 1 year ago

@coatless removed from the reviewers list!

jbytecode commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

mdhaber commented 1 year ago

Thanks very much @kose-y. I thought I'd point out https://github.com/openjournals/joss-reviews/issues/5092#issuecomment-1445240137, which I wrote to facilitate review.

kose-y commented 1 year ago

Review checklist for @kose-y

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

jbytecode commented 1 year ago

@kose-y - Just pinging, sorry if I am bothering. How is your review going? Could you please update your status and inform us? Thank you in advance!

kose-y commented 1 year ago

I have completed software paper and general checks part, moving on to the software functionality and documentation part.

jbytecode commented 1 year ago

Dear @kose-y

Could you please update your status?

Thank you in advance.

kose-y commented 1 year ago

I will finish it by this weekend.

kose-y commented 1 year ago

I have completed the review. Looks good to me for publication.

jbytecode commented 1 year ago

@kose-y - thank you so much for your contributions

jbytecode commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.2307/2331554 is OK
- 10.1038/s41592-019-0686-2 is OK
- 10.1214/aoms/1177729437 is OK
- 1544-6115.1585 is OK

MISSING DOIs

- None

INVALID DOIs

- None
jbytecode commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left: