bibat: a batteries-included Bayesian analysis template

pyOpenSci / software-submission

Submit your package for review by pyOpenSci here! If you have questions please post them here: https://pyopensci.discourse.group/

92 stars 36 forks source link

bibat: a batteries-included Bayesian analysis template #83

Closed teddygroves closed 9 months ago

teddygroves commented 1 year ago

Submitting Author: Teddy Groves (@teddygroves) All current maintainers: @teddygroves Package Name: bibat One-Line Description of Package: An interactive template for Bayesian statistical analysis projects Repository Link: https://github.com/teddygroves/bibat/ Version submitted: 0.1.0 Editor: @mjhajharia
Reviewer 1: @OriolAbril
Reviewer 2: @alizma
Archive:
Version accepted: 0.1.10 Date accepted (month/day/year): 07/05/2023 JOSS DOI:

Code of Conduct & Commitment to Maintain Package

[x] I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package after should it be accepted.
[x] I have read and will commit to package maintenance after the review as per the pyOpenSci Policies Guidelines.

Description

Bibat is a Python package providing a flexible interactive template for Bayesian statistical analysis projects.

It aims to make it easier to create software projects that implement a Bayesian workflow that scales to arbitrarily many inter-related statistical models, data transformations, inferences and computations. Bibat also aims to promote software quality by providing a modular, automated and reproducible project that takes advantage of and integrates together the most up to date statistical software.

Bibat comes with "batteries included" in the sense that it creates a working example project, which the user can adapt so that it implements their desired analysis. We believe this style of template makes for better usability and easier testing of Bayesian workflow projects compared with the alternative approach of providing an incomplete skeleton project.

Scope

Please indicate which category or categories. Check out our package scope page to learn more about our scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
- [x] Workflow automation
For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
- Who is the target audience and what are scientific applications of this package?

The target audience is people who want to write software implementing a Bayesian workflow as described in this paper and are willing to do so using Python scientific libraries, Stan, cmdstanpy, pydantic, pandera and make, as well as perhaps pytest, sphinx, quarto and github actions.

Are there other Python packages that accomplish the same thing? If so, how does yours differ?

I am not aware of any interactive template that specifically targets a Python-based Bayesian workflow that scales to arbitrarily many models and data transformations.

In addition, bibat is unusual compared to other interactive templates because it is 'batteries-inclued', providing a full working example project rather than an incomplete skeleton

If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted: https://github.com/pyOpenSci/software-submission/issues/82

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] uses an OSI approved license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions.
[x] contains a tutorial with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration setup, such as GitHub Actions CircleCI, and/or others.

Publication Options

[x] Do you wish to automatically submit to the Journal of Open Source Software? If so:

JOSS Checks

[x] The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
[x] The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
[x] The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
[x] The package is deposited in a long-term repository with the DOI: 10.5281/zenodo.7688421

NB I submitted an old version of this package to JOSS and it was judged to be out of scope in terms of substantial scholarly effort. See here for the relevant issue in the joss-reviews repository : https://github.com/openjournals/joss-reviews/issues/4760. The reviewers did not definitively assess whether the package was within JOSS's scope in terms of not being a "minor utility" but noted that they had a query about this.

I think that bibat in its current form is within JOSS's scope in both of these respects but I imagine they would want to assess this separately.

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

[x] Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Confirm each of the following by checking the box.

[x] I have read the author guide.
[x] I expect to maintain this package for at least 2 years and can help find a replacement for the maintainer (team) if needed.

Please fill out our survey

[x] Last but not least please fill out our pre-review survey. This helps us track submission and improve our peer review process. We will also ask our reviewers and editors to fill this out.

P.S. *Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The [editor template can be found here][Editor Template].

The [review template can be found here][Review Template].

NickleDave commented 1 year ago

Hi @teddygroves just updating you that we have found a guest editor that is a good fit for your submission; we are onboarding them and expect to start the review late this week or early next week.

In the meantime I will get the EiC checks done.

Thank you for your patience.

NickleDave commented 1 year ago

Hi again @teddygroves overall I can tick off most of the boxes for editor checks.

There is one issue we do need you to address before we can proceed.

[ ] Add API docs. Even though the main use of the package is through the CLI, you want to document the internal API for yourself, for users, and for potential contributors
- Not strictly required but I would import your modules in bibat/init.py and then define an __all__ so that it's clear what "belongs" to bibat vs other things you import, e.g. importlib. A bit confusing to do >>> dir(bibat) and see importlib.

I also ran into a couple of minor issues that I describe in comments below but those are just there for feedback.

Please get those API docs added so we can go ahead with the review! Thank you.

Editor in Chief checks

Hi there! Thank you for submitting your package for pyOpenSci review. Below are the basic checks that your package needs to pass to begin our review. If some of these are missing, we will ask you to work on them before the review process begins.

Please check our Python packaging guide for more information on the elements below.

[x] Installation The package can be installed from a community repository such as PyPI (preferred), and/or a community channel on conda (e.g. conda-forge, bioconda).
- [x] The package imports properly into a standard Python environment import package-name.
- I am able to import in Python 3.9 without fail (but see comments below)
- Might be worth adding a conda-forge package (see comments below)
[x] Fit The package meets criteria for fit and overlap.
- yes we consider this in scope under the category of reproducibility. This tool helps standardize computional projects / research compendia that perform Bayesian analyses, IIUC only through Stan (related libraries like PyMC are not currently included AFAICT)
[x] Documentation The package has sufficient online documentation to allow us to evaluate package function and scope without installing the package. This includes:
- [x] User-facing documentation that overviews how to install and start using the package.
- [x] Short tutorials that help a user understand how to use the package and what it can do for them.
- [ ] API documentation (documentation for your code's functions, classes, methods and attributes): this includes clearly written docstrings with variables defined using a standard docstring format. We suggest using the Numpy docstring format.
[x] Core GitHub repository Files
- [x] README The package has a README.md file with clear explanation of what the package does, instructions on how to install it, and a link to development instructions.
- [x] Contributing File The package has a CONTRIBUTING.md file that details how to install and contribute to the package.
  - contributor covenant
- [x] Code of Conduct The package has a Code of Conduct file.
- [x] License The package has an OSI approved license. NOTE: We prefer that you have development instructions in your documentation too.
- MIT
[x] Issue Submission Documentation All of the information is filled out in the YAML header of the issue (located at the top of the issue template).
[x] Automated tests Package has a testing suite and is tested via GitHub actions or another Continuous Integration service.
[x] Repository The repository link resolves correctly.
[x] Package overlap The package doesn't entirely overlap with the functionality of other packages that have already been submitted to pyOpenSci.
[ ] Archive (JOSS only, may be post-review): The repository DOI resolves correctly.
[ ] Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

[x] Initial onboarding survey was filled out We appreciate each maintainer of the package filling out this survey individually. :raised_hands: Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. :raised_hands:

Editor comments

Installation

As noted the package imports correctly in Python 3.9, but I'm surprised I was able to install in a Python 3.7 environment. I first tried to install in a Python 3.7 environment accidentally and got an error: ModuleNotFoundError: No module named 'importlib.metadata', which of course makes sense since this was added after Python 3.7, but I would expect the pip install bibat to fail because of the requires_python option in setup.cfg. It shows up properly as >=3.9 on PyPI. Not sure if there's a way to fix this so pip install bibat fails in a Python 3.7 venv

Documentation

I noticed the docs page is very slow to load -- it hangs a while the first time I click and then I have to refresh. Not clear how you are hosting the docs--GitHub pages? Might be worth switching to ReadTheDocs?

Other notes

Relying on make means that Windows users will have a harder time using bibat
Similarly pip installing requirements.txt is likely to lead to issues on some platforms, e.g. for me I could not run the first make analysis to test that setup worked (as in the vignette) on Mac OS, because of a conflict with SciPy:

$  make analysis
. .venv/bin/activate && (\
      python -m pip install --upgrade pip; \
      python -m pip install -r requirements.txt --quiet; \
      install_cmdstan ; \
    )
Requirement already satisfied: pip in /Users/davidnicholson/opt/anaconda3/lib/python3.7/site-packages (21.3.1)
Collecting pip
  Using cached pip-23.0.1-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.3.1
    Uninstalling pip-21.3.1:
      Successfully uninstalled pip-21.3.1
Successfully installed pip-23.0.1
ERROR: Cannot install -r requirements.txt (line 1) and scipy because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
/bin/sh: install_cmdstan: command not found

An alternative would be to use nox with conda environments, which might be more platform agnostic. And/or a workflow tool like snakemake. Nox can have its own bootstrapping issues (see comment near end of https://github.com/pyOpenSci/software-submission/issues/68#issuecomment-1426357532) but it is mostly platform-agnostic and lets one write pure Python.

teddygroves commented 1 year ago

Thanks a lot for the checks and suggestions and for finding a reviewer!

I will add API documentation and follow the __init__.py convention that you suggest as soon as possible, and also check if there is a way to fail sooner for unsupported Python versions.

The documentation page is hosted on readthedocs. I'll have a look and see if I've done anything that makes it slow to load.

I'll have to think a bit about your points re make and pip/requirements.txt, thanks for the pointers to alternatives.

NickleDave commented 1 year ago

Perfect, thank you @teddygroves

teddygroves commented 1 year ago

I've now fixed __init__.py as suggested above, added automatic documentation for the cli and all exposed python functions and added some extra code to setup.py in order to prevent installation on unsupported python versions.

I checked for possible causes of slow loading documentation but couldn't find much - just a largish (1.3M) html file. I also don't experience slow loading myself. I'll continue to keep an eye on this.

I think the main issue (api documentation) is now addressed. The other suggestions were to look at possible alternatives to make and requirements.txt - I'll look at those as soon as possible!

NickleDave commented 1 year ago

Perfect, thank you @teddygroves.

We are very fortunate that @mjhajharia has generously agreed to act as guest editor for this review. They will follow up here next, tagging in reviewers to start the review process.

mjhajharia commented 1 year ago

update: @alizma and @oriolabril have agreed to be reviewers for the package, things will be started soon!

mjhajharia commented 1 year ago

Hi! Reviews are due in 2 weeks, here's the template that you can post as a comment to this issue. Feel free to ask anything that might be ambiguous.

OriolAbril commented 1 year ago

Package Review

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README.
[x] Installation instructions: for the development version of the package and any non-standard dependencies in README.
- wasn't able to locate the dev instructions in the readme nor contributing guide
[x] Vignette(s) demonstrating major functionality that runs successfully locally.
[x] Function Documentation: for all user-facing functions.
- the section appears to be empty? https://bibat.readthedocs.io/en/latest/#command-line-interface
[x] Examples for all user-facing functions.
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING.
- It might also be a good idea to link to the code of conduct from CONTRIBUTING.md and https://bibat.readthedocs.io/en/latest/#contributing
[x] Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.
- I saw some comments above when skimming the previous discussion. I don't understand why both setup.py and setup.cfg are needed, pyproject.toml alone should do the same job. The pyopensci packaging guide also recommends using pyproject.toml

Example of working pyproject.toml python requirement

I maintain [xarray-einstats](https://github.com/arviz-devs/xarray-einstats) which has a minimum python required in its pyproject.toml file. Today I have executed the following commands: ``` ❯ conda create -n test python==3.7 Retrieving notices: ...working... done Collecting package metadata (current_repodata.json): done Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: done ==> WARNING: A newer version of conda exists. <== current version: 22.11.1 latest version: 23.3.1 Please update conda by running $ conda update -n base -c defaults conda Or to minimize the number of packages updated during conda update use conda install conda=23.3.1 ## Package Plan ## environment location: /home/oriol/bin/miniconda3/envs/test added / updated specs: - python==3.7 The following packages will be downloaded: package | build ---------------------------|----------------- libffi-3.2.1 | he1b5a44_1007 47 KB conda-forge openssl-1.0.2u | h516909a_0 3.2 MB conda-forge pip-23.1.2 | pyhd8ed1ab_0 1.3 MB conda-forge python-3.7.0 | hd21baee_1006 31.5 MB conda-forge readline-7.0 | hf8c457e_1001 391 KB conda-forge setuptools-67.7.2 | pyhd8ed1ab_0 569 KB conda-forge sqlite-3.28.0 | h8b20d00_0 1.9 MB conda-forge wheel-0.40.0 | pyhd8ed1ab_0 54 KB conda-forge ------------------------------------------------------------ Total: 39.0 MB The following NEW packages will be INSTALLED: _libgcc_mutex conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge _openmp_mutex conda-forge/linux-64::_openmp_mutex-4.5-2_gnu bzip2 conda-forge/linux-64::bzip2-1.0.8-h7f98852_4 ca-certificates conda-forge/linux-64::ca-certificates-2022.12.7-ha878542_0 libffi conda-forge/linux-64::libffi-3.2.1-he1b5a44_1007 libgcc-ng conda-forge/linux-64::libgcc-ng-12.2.0-h65d4601_19 libgomp conda-forge/linux-64::libgomp-12.2.0-h65d4601_19 libstdcxx-ng conda-forge/linux-64::libstdcxx-ng-12.2.0-h46fd767_19 libzlib conda-forge/linux-64::libzlib-1.2.13-h166bdaf_4 ncurses conda-forge/linux-64::ncurses-6.3-h27087fc_1 openssl conda-forge/linux-64::openssl-1.0.2u-h516909a_0 pip conda-forge/noarch::pip-23.1.2-pyhd8ed1ab_0 python conda-forge/linux-64::python-3.7.0-hd21baee_1006 readline conda-forge/linux-64::readline-7.0-hf8c457e_1001 setuptools conda-forge/noarch::setuptools-67.7.2-pyhd8ed1ab_0 sqlite conda-forge/linux-64::sqlite-3.28.0-h8b20d00_0 tk conda-forge/linux-64::tk-8.6.12-h27826a3_0 wheel conda-forge/noarch::wheel-0.40.0-pyhd8ed1ab_0 xz conda-forge/linux-64::xz-5.2.6-h166bdaf_0 zlib conda-forge/linux-64::zlib-1.2.13-h166bdaf_4 Proceed ([y]/n)? y Downloading and Extracting Packages Preparing transaction: done Verifying transaction: done Executing transaction: done # # To activate this environment, use # # $ conda activate test # # To deactivate an active environment, use # # $ conda deactivate ❯ conda activate test ❯ pip install xarray-einstats==0.5.1 ERROR: Ignored the following versions that require a different python version: 0.3.0 Requires-Python >=3.8; 0.4.0 Requires-Python >=3.8; 0.5.0 Requires-Python >=3.8; 0.5.1 Requires-Python >=3.8 ERROR: Could not find a version that satisfies the requirement xarray-einstats==0.5.1 (from versions: 0.1, 0.2.0, 0.2.1, 0.2.2) ERROR: No matching distribution found for xarray-einstats==0.5.1 ``` I am not a packaging expert, but I remember ipython had an installation bug not long ago due to combining multiple installation files. Using only pyproject.toml might simplify things and work like it does with xarray-einstats.

Readme file requirements The package meets the readme requirements below:

[x] Package has a README.md file in the root directory.

The README should include, from top to bottom:

[x] The package name
[x] Badges for:
- [x] Continuous integration and test coverage,
- [x] Docs building (if you have a documentation website),
- [x] A repostatus.org badge,
- [x] Python versions supported,
- [x] Current package version (on PyPI / Conda).
- [x] Preferred citation
[x] Short description of package goals.
[x] Package installation instructions
- All points below are in the README, but in a different order
[x] Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
- [x] Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
[x] Link to your documentation website.
[x] If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
- I am not aware of any similar package, so I think this is not necessary as of now, still wanted to check before deleting the check point
[x] Citation information
- Extended citation info is missing, there is only the zenodo badge

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
- currently attempting to run the vignette, see https://github.com/teddygroves/bibat/issues/36 for details and progress
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests:
- [x] All tests pass on the reviewer's local machine for the package version submitted by the author. Ideally this should be a tagged version making it easy for reviewers to install.
- [x] Tests cover essential functions of the package and a reasonable range of inputs and conditions.
[x] Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
[x] Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines. A few notable highlights to look at:
- [x] Package supports modern versions of Python and not End of life versions.
- [x] Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

[x] The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

[x] A short summary describing the high-level functionality of the software
[x] Authors: A list of authors with their affiliations
[x] A statement of need clearly stating problems the software is designed to solve and its target audience.
[x] References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

[x] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 26/04 -> 1h 40' (significant time spent going over reviewing guidelines too)

EDIT (May 15th): 2h more going over the code and paper, only thing left is reproducing the vignette

EDIT (May 19th): 1:30h setting up everything again from scratch and running the vignette close to completion

EDIT (May 24th): ~1h running the vignette from scratch successfully after the fixes and improvements to it

teddygroves commented 1 year ago

Just recording here that in the latest version of bibat (0.1.4) some of the comments above are now addressed.

Specifically:

readthedocs page now shows information about the api and command line interface
setup.py and setup.cfg replaced by pyproject.toml
readme.rst now has all required badges except test coverage: I was just wondering if anyone has a recommendation as there seem to be quite a few options.
bibtex citation info provided

Regarding dev installation instructions: there are instructions for pip installing the latest commit from the main branch, but there is no separate dev branch so that isn't linked. I'm not sure what is the best practice here but I'm very open to recommendations! Edit: I misunderstood, instructions for installing with development dependencies now added.

alizma commented 1 year ago

Bibat Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README.
[x] Installation instructions: for the development version of the
[x] Vignette(s) demonstrating major functionality that runs successfully locally.
[x] Function Documentation: for all user-facing functions. Comment: Sections for The CLI, Python functions, and Wizarding module are incomplete on the official ReadTheDocs.
[x] Examples for all user-facing functions.
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING.
[x] Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements The package meets the readme requirements below:

[x] Package has a README.md file in the root directory.

The README should include, from top to bottom:

[x] The package name
[x] Badges for:
- [x] Continuous integration and test coverage,
- [x] Docs building (if you have a documentation website),
- [x] A repostatus.org badge,
- [x] Python versions supported,
- [x] Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

[x] Short description of package goals.
[x] Package installation instructions
[x] Any additional setup required to use the package (authentication tokens, etc.) Comment: CmdStan has a history of requiring extra effort to install for some users. Since it is a dependency here, including information about CmdStan's requirements would be a good idea (specifically the C++ compiler and make versions). Maybe linking to or drawing inspiration from the official CmdStan docs would be helpful.
[x] Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
- [x] Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
[x] Link to your documentation website.
[x] If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
[x] Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

[x] Package documentation is clear and easy to find and use.
[x] The need for the package is clear
[x] All functions have documentation and associated examples for use
[x] The package is easy to install

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
[x] Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines. A few notable highlights to look at:
- [x] Package supports modern versions of Python and not End of life versions.
- [x] Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

[x] The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

[x] A short summary describing the high-level functionality of the software
[x] Authors: A list of authors with their affiliations
[x] A statement of need clearly stating problems the software is designed to solve and its target audience.
[x] References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

[x] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: ~4

Review Comments

Overall, this package shows a lot of promise as a foundation for statistical analysis codes and provides a reusable, extendable base for the Bayesian workflow. Moreover, the utilities for generating reports are very nice. Here are a few general comments and things which could be improved in addition to other issue-specific comments above:

For the sake of consistency and to reduce the barrier of entry for potential contributors, adding a pre-commit to be used in a developer environment would be useful. This requires the addition of a single file which sets up local checks prior to a commit and ensures that everything locally matches with CI requirements.
It's not clear which versions of Python are usable. Python 3.9 works fine but some users might prefer to use a separate version. Are there any version-critical dependencies? More on this point, this would also involve extending the existing CI to confirming that builds work for other Python and dependency version combinations.
The additional examples you provide that are based on the code that Bibat generates will be very useful for users, and you give a great example of transforming the set up code to a custom scenario.
When first running investigate.ipynb, it doesn't seem that arviz is being shipped as a requirement, meaning the default notebook cannot be run on first set-up.

Documentation

The current documentation is what seems to require the most attention. Specifically, some sections are entirely incomplete (as mentioned earlier) and a generic documentation for the generated environment are not provided. It suffices to automatically generate these for the default environment and include them on the overall documentation page. Additionally, the documentation should be split up into separate pages. For cleanliness, having a separate page for more details on the CLI, one for the default provided functions, the editing example you provided, and so on. Additionally, given the tools for automatically generating reports, it would be nice to have a generic example for doing this, especially for people who might not be familiar with using these tools.

The comments above with regards to default function documentation on the main documentation page apply equally well to the default documentation. Since projects based on this set-up will likely have problem-specific commentaries on each function, this is an additional important feature.

You also seem to be missing a linter check in your tests (use black or something similar). Also I'm not too sure how to assess the test coverage in your CI, but maybe you could check a few other things besides running the commands for the default example?

teddygroves commented 1 year ago

Thanks very much for the comments @alizma! I think they are now just about all addressed, and the package is overall a lot better as a result.

Pre-commit is a much nicer solution for linting than what I had previously.

Test coverage is now displayed in a readme badge.

Information about installing Cmdstan, as well as some lists of other dependencies, has been added to the readme and documentation.

The documentation has been split into multiple pages and generally improved. In particular, it is now explained at greater length how to document a bibat project, and the example Sphinx documentation includes automatic documentation generation for the data_preparation module.

A general problem that I don't really know what to do about is documenting and testing the example project code. Due to use of templates it generally isn't valid code, which makes automation difficult. What I've done is check that the example project runs from end to end without any errors, but there is definitely more that could be done.

Please let me know what you think, especially about the last point!

alizma commented 1 year ago

Hi @teddygroves, these all look like great improvements, nice work! I'll update my initial review to reflect the changes.

With regards to additional testing, since sections of the provided code are templated, maybe you could write tests with common combinations of argument types or instead implement functional tests (or behavior-based testing) in addition to your integration tests? With regards to functional testing, maybe you could have tests that ensure that bibat generates code which guarantee certain outputs when using different configuration files, for example. I imagine that giving the user freedom to choose which configuration file to use for initialization may lead to some simple-to-catch errors in the resulting code. Overall, since the goal is to provide boilerplate code with properties you guarantee to be true, writing tests that ensure these properties in different use cases is crucial. Practically, this could include simple tests which check that classes have correct attributes, functions have the right signatures and that they expected output for particular inputs, etc... But it's probably fine as is since your integration tests check the basic properties already. In any case, the more testing you have, the easier it will be for automatically check that bibat preserves features between versions, so it's really up to your discretion where you think more testing will be most useful.

All in all, I think you've done great work and I'm excited to see the projects that people build on top of bibat! Thanks for your time and please let me know if you have any additional questions.

OriolAbril commented 1 year ago

Checked everything on my list after managing to run the vignette successfully. I recommend approving the package (starting from version >0.1.5, that is, from the next version based on current status of main branch).

As it has already been commented, testing can be extended, adding a 2nd case of end-to-end workflow for example, but I think it is good enough already. Given the inherent interactivity of the library and workflow, my recommendation would go more along the lines of aiming to rerun one of the vignettes manually before each minor release. (note I work mainly with visualization, so I am probably biased towards keeping a human in the loop and not relying 100% on automated checks)

mjhajharia commented 1 year ago

Hi! Firstly, thanks to @teddygroves for the great addition to software reviewed by PyOpenSci. Also, thanks a ton @alizma and @OriolAbril for taking the time to review this and suggest helpful improvements. Here are my informal reviewer 3 notes, most of these are optional, I hope you find them helpful, but they're mostly not big enough to be strict hindrances.

I would recommend mentioning the requirements.txt or preferred python environment and such things for contributing in the contributing guide, it's not a major thing, but a good practice, not super relevant in this case as I imagine right now most people trying to contribute would have some experience because of this being a niche package, but it's nice for making it easier for newcomers as a standard practice. The tutorial on that page seems to be slightly confounding as it is a mix of using bibat for your own project versus actually contributing to core bibat code. I would also recommend conda because cmdstanpy can sometimes be a nuisance, but generally be ambivalent to provide free choice to users ofcourse!
this looks very nice!
Code of conduct link is broken.
I recommend a separate page for CLI documentation instead of putting it on the Reference page.
There's a broken link to this URL on the vignettes page, similar link here
Generally, I think the documentation still feels somewhat unclear for somebody who isn't completely invested in what the package does already, for example - prior, posterior modes. I understand the purpose but even looking at the code it's slightly unclear what do they exactly do within the scripts. The example report is very nice, I just think some more details about the API would help a lot!
Optional nits regarding typos
Documenting your analysis has a typo in the line "Note that, unlike, bibat does not attempt to install or set up quarto: see quarto’s ‘getting started’ page for official installation instructions."
There are a couple of proofreading issues in the documentation, which isn't a big deal, but in case you want to submit it to other places as well, it might be nice to have a quick run through the writing.

Accessibility

I recommend using an accessibility checker on the website and looking through some issues from this report for hints

Color Contrast

Improper use of preformatted text element

Line Height
The line-height for paragraphs should be at least 1.5 times the font size.

URL colours

https://bibat.readthedocs.io/en/latest/_static/report.html

teddygroves commented 1 year ago

Thanks @mjhajharia for the latest comments. This review process has been a really fantastic learning experience for me!

The 'fix broken links' PR above addresses some of the typos and broken links.

Regarding the modes, I think this part of bibat is hard to understand at least partly because the underlying code isn't the best. I've started to address this in the 'Refactor to add...' PR. The change requires updating the documentation and vignette so it will take a bit longer to arrive but should definitely be worth it.

Thanks for suggesting using an accessibility checker - I hadn't thought of doing this. Could you possibly tell me which one you used to generate the image above (or which one you recommend in general)?

teddygroves commented 1 year ago

I've now done most of the things recommended, including:

clarified the contributing instructions
fixed broken readme and vignette links
put cli and api docs on different pages
proofread the documentation
redesigned and redocumented fitting modes to make them more intuitive and extendable.

That leaves improving the documentation page's accessibility with a checker. I've made an issue for this and will start working on it as soon as possible.

teddygroves commented 1 year ago

I've done a few things to improve the accessibility of bibat's documentation:

added automatic accessibility auditing with lighthouse and AccessLint
switched the documentation website to a new theme that scores better in the lighthouse audit and qualitatively seems like it has better contrast.
added an accessibility page to the documentation following this guide

I've closed issue #60 from above as I think there has been an improvement but I'd still love to know the checker that @mjhajharia used as I think it noticed some things that the ones I added misssed.

teddygroves commented 1 year ago

Just checking here if there is something that someone is waiting for me to do.

I think almost all of the reviewers' major concerns have now been addressed. There are still some live issues that arose from the reviews but I think they are relatively minor. If any of these are particularly pressing I'd appreciate a pointer so I can prioritise fixing them!

lwasser commented 1 year ago

hey @teddygroves 👋 ... pinging @mjhajharia here on this so they can check in on where the review is and provide a bit of guidance. thank you for all of the work that you've done on this package as a part of the review!!

mjhajharia commented 1 year ago

Hi @lwasser, thanks for the ping, and apologies for the delay. Hi @teddygroves, thank you for the PRs I went through them and everything looks great, this is really fantastic and I really appreciate you taking time to incorporate the optional feedback!

Thanks for suggesting using an accessibility checker - I hadn't thought of doing this. Could you possibly tell me which one you used to generate the image above (or which one you recommend in general)?

I think you're good to go in terms of accessibility now! For that report I used SiteImprove Accessibility checker extension, I'm not sure it's the best out there but it's easy to use and get a quick overview and points out major things in case this is helpful for you later!

In general though, if you follow standard accessibility guidelines that is a good place to be in for documentation, so thanks a ton for doing that!

Again sorry for the delay, had some personal emergencies, but this looks good to go as far as I am concerned, all the necessary things are resolved and the documentation looks. I am very happy to see this page, I've been meaning to make a PR in PyOpenSci docs about this as well and I think it would be super useful.

So this is good to go as far as I'm concerned, thankyou for the fantastic project @teddygroves!

teddygroves commented 1 year ago

Great, thanks @mjhajharia for the checker advice (Siteimprove looks really nice) and for the review generally!

I guess it's time for someone with the right privileges to change the label?

lwasser commented 1 year ago

hey there @teddygroves !! 👋 congratulations on bibat being accepted into the pyOpenSci ecosystem! I'm going to help wrap things up here. there are a few last things that need to happen! And now for the formal acceptance proceedings

🎉 bibat has been approved by pyOpenSci! Thank you @teddygroves for submitting bibat and many thanks to @OriolAbril and @alizma for reviewing this package! 😸

Author Wrap Up Tasks

There are a few things left to do to wrap up this submission:

[x] Activate Zenodo watching the repo if you haven't already done so.
[x] Tag and create a release to create a Zenodo version and DOI. Please let me (@lwasser ) know when you do this so i can add the archive link to the top of this review!
[x] Add the badge for pyOpenSci peer-review to the README.md of . The badge should be [![pyOpenSci](https://tinyurl.com/y22nb8up)](https://github.com/pyOpenSci/software-review/issues/issue-number).
[x] Please fill out the post-review survey. All maintainers and reviewers should fill this out.
[x] Add bibat to the pyOpenSci website. @teddygroves , please open a pr to update this file: to add your package and if you wish, add your name to the list of contributors (i'm working on automating this but it's not quite done yet!).
[x] if you have time and are open to being listed on our website, please add yourselves to this file via a pr so we can list you on our website as contributors!

Finally - Teddy, if you are interested - we welcome a blog post that showcases bibat's functionality. this helps us promote your great work supporting the scientific Python ecosystem. This is of course optional but if you wish to do this - you can check out a few of our other blogs posts including: pandera and moving pandas that have been published. these blogs are some of our most viewed content on the website!

Editor Final Checks

Please complete the final steps to wrap up this review. Editor, please do the following:

[x] Make sure that the maintainers filled out the post-review survey
[x] Invite the maintainers to submit a blog post highlighting their package. Feel free to use / adapt language found in this comment to help guide the author.
[x] Change the status tag of the issue to 6/pyOS-approved6 🚀🚀🚀.

If you have any feedback for us about the review process please feel free to share it here. We are always looking to improve our process and documentation in the peer-review-guide.

teddygroves commented 1 year ago

Hi @lwasser, that's great news!

I think I've now done all the steps. The zenondo link for bibat version 0.1.10 is https://zenodo.org/record/8124312. I've added the badge and filled in the survey. The pr for updating packages.yml and contributors.yml is here: https://github.com/pyOpenSci/pyopensci.github.io/pull/216. I didn't know how to fill in some of the fields, e.g. github_image_id - I guess we can discuss that under the pr if necessary.

I'd love to contribute a blog post - for that do I just submit a pr for a new file here?

Thanks a lot @lwasser, @mjhajharia @OriolAbril and @alizma - I really appreciated your time and attention and enjoyed the review process!

lwasser commented 1 year ago

@teddygroves thank you! and welcome to pyopensci!! yes please submit your pr / blog post to that link - you can look at the moving pandas and pandera blogs to get a sense of what others have done. but there are no requirements! please just showcase what your awesome tool provides!

before this is closed i'll go through all of the checks above to ensure it's all done!! i hope to do that in the next 3 days - i'm traveling right now but will followup soon!

lwasser commented 1 year ago

ok @teddygroves i'm going through this all today. your package will be live on our website by the end of the day tomorrow at the latest!

thank you everyone for supporting this review!! @mjhajharia for leading the review as editor! @OriolAbril @alizma for your reviews and @teddygroves for submitting to us and doing the work via peer review!

all - if anyone would like to join our slack group please either email me at leah at pyopensci.org OR respond here if i can grab your email from your github profile!! closing this issue now! ✨

lwasser commented 1 year ago

reopening as this is going through JOSS!

lwasser commented 9 months ago

this issue can be closed as JOSS did deem this review out of scope . Many thanks everyone for your work here on the pyOpenSci review side of things!! Everyone involved in this review is welcome to join our slack group. if you haven't received an invitation feel free to reach out to @kierisi !! closing this now!!

pyOpenSci / software-submission