openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
694 stars 36 forks source link

[REVIEW]: bayes-toolbox: A Python package for Bayesian statistics #5526

Closed editorialbot closed 8 months ago

editorialbot commented 1 year ago

Submitting author: !--author-handle-->@hyosubkim<!--end-author-handle-- (Hyosub E. Kim) Repository: https://github.com/hyosubkim/bayes-toolbox Branch with paper.md (empty if default branch): Version: 0.1.1 Editor: !--editor-->@samhforbes<!--end-editor-- Reviewers: @alstat, @ChristopherLucas Archive: 10.5281/zenodo.7849408

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/1b7b8068a329b547e28d00da0ad790b2"><img src="https://joss.theoj.org/papers/1b7b8068a329b547e28d00da0ad790b2/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/1b7b8068a329b547e28d00da0ad790b2/status.svg)](https://joss.theoj.org/papers/1b7b8068a329b547e28d00da0ad790b2)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@alstat & @ChristopherLucas & @BrandonEdwards, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @samhforbes know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @alstat

📝 Checklist for @ChristopherLucas

📝 Checklist for @BrandonEdwards

editorialbot commented 1 year ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 1 year ago
Software report:

github.com/AlDanial/cloc v 1.88  T=0.12 s (209.4 files/s, 127733.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Jupyter Notebook                 8              0          12891           1007
Python                           3            287            374            507
YAML                             3              4              4            311
Markdown                        10            114              0            265
TeX                              1              4              0             49
TOML                             1              3              0             38
-------------------------------------------------------------------------------
SUM:                            26            412          13269           2177
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 1 year ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.31219/osf.io/ksfyr is OK
- 10.1146/annurev-psych-122216-011845 is OK
- 10.18637/jss.v035.i04 is OK
- 10.1016/c2012-0-00477-2 is OK
- 10.1080/00031305.2016.1154108 is OK

MISSING DOIs

- None

INVALID DOIs

- None
editorialbot commented 1 year ago

Wordcount for paper.md is 453

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

samhforbes commented 1 year ago

OK @hyosubkim, @alstat, @ChristopherLucas, @BrandonEdwards this is the review thread for the paper. All of our communications will happen here from now on.

As a reviewer, the first step is to create a checklist for your review by entering

@editorialbot generate my checklist

as the top of a new comment in this thread.

These checklists contain the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please link to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me if you have any questions/concerns.

ChristopherLucas commented 1 year ago

Review checklist for @ChristopherLucas

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

ChristopherLucas commented 1 year ago

@hyosubkim: I've only just started my review, but the Statement of Need, as written, seems like it could do a bit more to explain the package's contribution.

Looking at a few functions, many seem to wrap copied examples from the PyMC documentation (e.g., the function BEST() wraps from this notebook). The PyMC notebooks are a fantastic pedagogical resources, and the functions defined in bayes-toolbox hard-code many assumptions that perhaps the user ought to have to specify (can a user define a prior for any of the models implemented in bayes-toolbox without editing the source code?).

I also don't agree that being a good Bayesian has anything to do with the replication crisis (which is part of the current justification in the statement of need). In fact, the current justification makes me want to write a comment justifying frequentist statistics, which really doesn't seem like the sort of response a Statement of Need ought to trigger.

The statement of need also mentions the potential pedagogical benefit of the package, but to be honest, at least with the current level of documentation and the degree to which models are hardcoded in bayes-toolbox, I'd probably point students to the PyMC docs that you're wrapping, rather than to this package.

I don't mean to sound adversarial - I'm open to being convinced otherwise - but this is my present assessment of the paper and the evaluation of the "substantial scholarly effort" criteria.

hyosubkim commented 1 year ago

@ChristopherLucas : I appreciate your feedback and hope I can address some of your concerns here. First off, I definitely did NOT interpret your comments as being “adversarial”, but I do hope you’ve had a chance to go beyond the BEST function and notebook, and will examine the many other models provided as well as the documentation hosted here, as I think several of your comments are directly addressed there (e.g., choice of priors, value added by wrapping model functions, source material, etc.). I've also tried to address your initial comments below, with the hope that I can resolve any misunderstandings and we can collaboratively advance to the next stage of improving the repo and source code.

Getting to one of the major points of your critique, as I understand it, bayes-toolbox was never meant to supplant/replace what the PyMC developers have already provided. Rather, bayes-toolbox is intended to serve as a helpful adjunct to their materials as well as a useful library in its own right for any Python user wanting to execute Bayesian analogues of the most common frequentist tests with one-liners. Wrapping models inside functions, as bayes-toolbox has done, is meant to reduce friction for new users, and lower the bar to entry for those who want to explore Bayesian statistics with Python. Right now, Python users can choose between at least a couple of packages that allow for one-liners to be called in order to run classical/frequentist tests (e.g., Pingouin, SciPy). However, for Bayesian stats, there has only been Bambi, which is excellent, but it does require more advanced knowledge and familiarity with R-brms syntax. Therefore, the goal of bayes-toolbox is to fill an important gap in the Python/Bayesian community, by providing an easy-to-use module for less experienced users that makes it as simple to run Bayesian stats as it is to run frequentist stats. The PyMC developers also recognized this gap and expressed support for bayes-toolbox, which I assume is why I was invited to present on bayes-toolbox at the most recent PyMC Conference.

As far as wrapping PyMC examples, the main purposes of the BEST notebook provided in the “examples” directory is to illustrate simply the bayes-toolbox syntax (and outputs) and to show that what was previously many lines of code to run the BEST test is now reduced to a one-liner with bayes-toolbox. I explicitly state at the top of the notebook that it was adapted from the PyMC developers’ work. Indeed, much more of the original inspiration and source material for bayes-toolbox is from Jordi Warmenhoven’s Python port of the famous Kruschke textbook written for R users, both of whom I also acknowledge in several places.

Re: priors. Yes, the priors are hard-coded and require (for now) changing the source code if users want something different. This is acknowledged in the “Look before you leap” section of the documentation. The choice of priors is based on what was used in the Kruschke textbook. Thus, for new users or those with “prior paralysis”, bayes-toolbox implements the same diffuse, uninformative priors as Kruschke so as not to strongly influence posterior estimates, and to likely satisfy skeptical reviewers. I figured that users who know enough to change their priors in a principled manner will also find it easy to change the model scaffolding provided by bayes-toolbox or be better served by Bambi.

Re: your comment about whether “being a good Bayesian has anything to do with the replication crisis”, I certainly did not intend to trigger those using frequentist stats (I’m one of them, as I’ve only recently begun migrating towards more Bayesian approaches!). My point was that binary judgments about statistical “significance” or the lack thereof is problematic (as statisticians of all stripes have noted, including the ASA), and that quantifying our uncertainty using Bayesian inference is one very natural way to address this. However, I will gladly edit the language if the other reviewers also had a similar interpretation as you.

In terms of pedagogy, perhaps the question is not whether you would direct your students to the PyMC developers OR bayes-toolbox, but rather if bayes-toolbox, in conjunction with other PyMC-developed materials, would serve them well should they start learning and utilizing Bayesian stats in their own work, especially if they are learning from the Kruschke text and want to use Python, as opposed to R.

Apologies for the long response, but you gave me much to think about and address. I hope I’ve convinced you that bayes-toolbox is not a simple repackaging of what the PyMC developers have already published and that it is a unique scholarly contribution, as it provides a much-needed, streamlined Pythonic implementation of Bayesian stats, not to mention some models that are neither from the PyMC developers' materials or the Kruschke textbook (e.g., meta-analyses).

alstat commented 1 year ago

Review checklist for @alstat

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

BrandonEdwards commented 1 year ago

Review checklist for @BrandonEdwards

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

ChristopherLucas commented 1 year ago

@hyosubkim That response is incredibly helpful, thanks. I recommend putting a bit of that in the paper and perhaps somewhere in the docs, even, as it helps people know what they're looking at when they find the package.

hyosubkim commented 1 year ago

Thanks, @ChristopherLucas ! Very glad to hear it. Will definitely incorporate your feedback.

samhforbes commented 12 months ago

Hi @alstat, @ChristopherLucas, @BrandonEdwards, I hope all is progressing well. Please let me know if I can be of any help as you go about your reviews!

samhforbes commented 11 months ago

Hi @alstat, @ChristopherLucas, @BrandonEdwards, I am just checking in to see how things are progressing. Please let me know if there are any delays you anticipate.

ChristopherLucas commented 11 months ago

The examples are generally great, though I noticed that this example references a dataframe that is never loaded: https://hyosubkim.github.io/bayes-toolbox/getting-started/ (i.e., df is referenced but never defined).

ChristopherLucas commented 11 months ago

I'm done with my review. I might be missing it, but I think the paper hasn't been revised yet, per @hyosubkim's thorough response to my question above. I'd also add some proper tests before finalizing this package (this seems incomplete), but other than that, it seems good to go IMO. Great work!

hyosubkim commented 11 months ago

The examples are generally great, though I noticed that this example references a dataframe that is never loaded: https://hyosubkim.github.io/bayes-toolbox/getting-started/ (i.e., df is referenced but never defined).

@ChristopherLucas - Thanks for pointing this out. I grabbed some lines from the hierarchical regression example and added them to the "Example Syntax" section here to make it clear how to implement bayes-toolbox.

hyosubkim commented 11 months ago

I'm done with my review. I might be missing it, but I think the paper hasn't been revised yet, per @hyosubkim's thorough response to my question above. I'd also add some proper tests before finalizing this package (this seems incomplete), but other than that, it seems good to go IMO. Great work!

Thanks, @ChristopherLucas ! Originally, I was going to wait for all three reviews to come in before editing, but I've now gone ahead and added many of your suggested edits into the online documentation. You'll see that the "Statement of Need" and "Education" portions, in particular, have more details and some verbatim quotes from my original response to you. I will now start making similar changes to the paper itself, assuming @samhforbes is in agreement with that plan.

Edited: I've now edited the paper.md file on GitHub. To summarize, I've balanced my statement regarding the replication crisis by explicitly acknowledging some recommended frequentist approaches to improving the state of affairs. I've also added some statements to boost the motivation behind bayes-toolbox, as well as better contextualizing it within the scientific/Bayesian Python ecosystem, specifically analogizing it to statsmodels, pingouin, etc. and drawing contrasts with Bambi.

A second update: @ChristopherLucas - Thanks for the comment re: tests. I've added information regarding validating the functionality of bayes-toolbox in a new section of the documentation (see "Functionality and Testing" here). Basically, the models have all been tested against the data and known results from the Kruschke textbook and another Python port of the Kruschke text. Importantly, formal testing of intermediate computations occur throughout in the form of inline assert statements. The size of the very small test suite you linked to is due to the relative lack of "pure" functions within bayes-toolbox to actually test with pytest (i.e., most of the functions are statistical models). After sifting through the source code again, I believe there is good coverage, but I'm certainly open to suggestions.

hyosubkim commented 11 months ago

@samhforbes - I've attempted to address all of @ChristopherLucas 's comments. In summary, I've fixed the example he pointed out, revised the paper.md file on GitHub and online documentation to include the information requested (primarily from this reply https://github.com/openjournals/joss-reviews/issues/5526#issuecomment-1584041847), and clarified my implementation of automatic tests and validating functionality. I'll now wait to hear from you on how to proceed. Thanks!

samhforbes commented 11 months ago

Thanks @ChristopherLucas please look and see if you are happy with these changes, and if so, update your reviewer checklist accordingly. Thanks for your comments!

ChristopherLucas commented 11 months ago

I appreciate these revisions, @hyosubkim. Even in this revision, I still think the replication crisis discussion is distracting and unnecessary, and that it's difficult to develop your point in a satisfying way given the necessary brevity of the article. That said, I'm happy to sign off on this, especially if @samhforbes and the other reviewers are fine with this.

Great work, congrats on the project.

hyosubkim commented 11 months ago

Really appreciate your feedback, @ChristopherLucas . I agree with your point about the discussion around the replication crisis. It does seem out of scope for such a short paper and have now removed that entire paragraph (see here). I think it reads better now and focuses more on the positives of the package. Thanks for helping to improve both the software and paper.

hyosubkim commented 10 months ago

@samhforbes - Looks like @ChristopherLucas has signed off on his review. Not sure where @alstat and @BrandonEdwards are in the review process, but please let me know if there's anything else for me to do at this point or if I can further facilitate the review process.

samhforbes commented 10 months ago

Hi @hyosubkim it being summer things can be a bit slower than usual. That said @BrandonEdwards @alstat please do let me know if there are any issues.

hyosubkim commented 10 months ago

Hi @samhforbes - yes, I can appreciate that the summer is a slow time. However, I noticed that your previous probes on July 12 and a couple weeks ago to @alstat and @BrandonEdwards haven't received responses (regarding progress, needing more time, etc.), so I was starting to wonder if they still had bandwidth for, and interest in, reviewing.

Regardless, I'll sit tight. And I appreciate everyone volunteering their time for this.

alstat commented 10 months ago

Hi @hyosubkim and @samhforbes, I apologize, I just had a busy work schedule until last week, but I will continue reviewing this weekend. Thanks.

hyosubkim commented 10 months ago

No problem @alstat . Thanks for letting us know and I look forward to your comments!

hyosubkim commented 10 months ago

@alstat - Just so you know, I've edited the manuscript based on prior feedback. You can find the latest version here. Thanks.

samhforbes commented 9 months ago

Hi @alstat and @BrandonEdwards we are looking forward to the remainder of your review. Please let me know if there's any issues completing these soon!

samhforbes commented 9 months ago

Hi @alstat and @BrandonEdwards thanks for everything so far. I'd like to move forward with this submission as soon as we can. If possible please complete your review by next Wednesday, otherwise I am afraid we will have to proceed without the rest of your review.

alstat commented 8 months ago

Hi @samhforbes, I only have two requests for the package @hyosubkim. Although I am not sure if this is available in the docs, but I can't seem to find it.

  1. API documentations - while the functions have doc strings that can be accessed using help(func), I think it is better to have this in the web documentation as well so that users who would want to know the list of APIs can simply go through it in the web, instead of looking into the source and checking which of the functions are exported and can be used.
  2. Automated Test - while there is a test folder, I only saw two functions being tested, I'm not sure if this is all of the APIs for the bayes_toolbox. I suggest to use code coverage (see https://about.codecov.io/ or other alternative) if possible to assess how many percentage of the functions have been tested, an 80% codecov is a healthy one.

Let me know if you have questions, @hyosubkim. Thanks for your patience for my sched.

hyosubkim commented 8 months ago

Hi @alstat and @samhforbes . I appreciate your feedback @alstat and have made the changes you requested. Specifically, wrt your two requests:

  1. API documentation - I have added documentation for all of the functions on the bayes_toolbox webpage (see here).
  2. Automated Testing - I have written full test suites with pytest for the models/functions in bayes_toolbox (see here). I used Coverage.py to assess the percentage coverage. Currently, coverage of all models/functions is at 84% (see coverage badges here and here).

@alstat - Thanks for helping to improve bayes_toolbox! I hope you're satisfied with the changes I've made. @samhforbes - Please let me know if there's anything else, and I look forward to your editorial assessment!

alstat commented 8 months ago

@hyosubkim Perfect! The API documentation is well made (you can further improve this in the future by adding brief examples for each API if possible). This makes it easy for new users to understand the extent of the contribution of the package. Thanks! I'm satisfied with the changes and with the coverage of the tests. :)

hyosubkim commented 8 months ago

Thanks, @alstat!

samhforbes commented 8 months ago

Thanks for your review @alstat

samhforbes commented 8 months ago

@BrandonEdwards thanks for looking at the package so far. As mentioned I'd like to move forward with this, so we will proceed with the reviews completed already, but thanks again!

samhforbes commented 8 months ago

@editorialbot remove @BrandonEdwards from reviewers

editorialbot commented 8 months ago

@BrandonEdwards removed from the reviewers list!

samhforbes commented 8 months ago

@hyosubkim I'm happy to say two reviewers have recommended this, and I have also looked at this package in detail myself. On this basis I'm happy to move forward with this. There are a few items we need to do / check now - you'll see some author items in my post review checklist for you to complete.

samhforbes commented 8 months ago

Post-Review Checklist for Editor and Authors

Additional Author Tasks After Review is Complete

Editor Tasks Prior to Acceptance

hyosubkim commented 8 months ago

Post-Review Checklist for Editor and Authors

Additional Author Tasks After Review is Complete

* [x]  Double check authors and affiliations (including ORCIDs)

* [x]  Make a release of the software with the latest changes from the review and post the version number here. This is the version that will be used in the JOSS paper.  **version: 0.1.1**

* [x]  Archive the release on Zenodo/figshare/etc and post the DOI here.  **DOI: 10.5281/zenodo.7849408**

* [x]  Make sure that the title and author list (including ORCIDs) in the archive match those in the JOSS paper.

* [x]  Make sure that the license listed for the archive is the same as the software license.

Editor Tasks Prior to Acceptance

* [ ]  Read the text of the paper and offer comments/corrections (as either a list or a PR)

* [ ]  Check the references in the paper for corrections (e.g. capitalization)

* [ ]  Check that the archive title, author list, version tag, and the license are correct

* [ ]  Set archive DOI with `@editorialbot set <DOI here> as archive`

* [ ]  Set version with `@editorialbot set <version here> as version`

* [ ]  Double check rendering of paper with `@editorialbot generate pdf`

* [ ]  Specifically check the references with `@editorialbot check references` and ask author(s) to update as needed

* [ ]  Recommend acceptance with `@editorialbot recommend-accept`

Thanks @samhforbes ! I've checked off all the items on my to do list. The latest version is 0.1.1 and the DOI is 10.5281/zenodo.7849408. Please let me know if there's anything else.

hyosubkim commented 8 months ago

@editorialbot commands

editorialbot commented 8 months ago

Hello @hyosubkim, here are the things you can ask me to do:


# List all available commands
@editorialbot commands

# Get a list of all editors's GitHub handles
@editorialbot list editors

# Check the references of the paper for missing DOIs
@editorialbot check references

# Perform checks on the repository
@editorialbot check repository

# Adds a checklist for the reviewer using this command
@editorialbot generate my checklist

# Set a value for branch
@editorialbot set joss-paper as branch

# Generates the pdf paper
@editorialbot generate pdf

# Generates a LaTeX preprint file
@editorialbot generate preprint

# Get a link to the complete list of reviewers
@editorialbot list reviewers
hyosubkim commented 8 months ago

@editorialbot generate pdf

hyosubkim commented 8 months ago

@editorialbot generate pdf

editorialbot commented 8 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

samhforbes commented 8 months ago

@editorialbot set 0.1.1 as version

editorialbot commented 8 months ago

Done! version is now 0.1.1

samhforbes commented 8 months ago

@editorialbot set 10.5281/zenodo.7849408 as archive