pyOpenSci / software-submission

Submit your package for review by pyOpenSci here! If you have questions please post them here: https://pyopensci.discourse.group/
92 stars 36 forks source link

bibat: a batteries-included Bayesian analysis template #83

Closed teddygroves closed 9 months ago

teddygroves commented 1 year ago

Submitting Author: Teddy Groves (@teddygroves) All current maintainers: @teddygroves Package Name: bibat One-Line Description of Package: An interactive template for Bayesian statistical analysis projects Repository Link: https://github.com/teddygroves/bibat/ Version submitted: 0.1.0 Editor: @mjhajharia
Reviewer 1: @OriolAbril
Reviewer 2: @alizma
Archive: DOI
Version accepted: 0.1.10 Date accepted (month/day/year): 07/05/2023 JOSS DOI:


Code of Conduct & Commitment to Maintain Package

Description

Bibat is a Python package providing a flexible interactive template for Bayesian statistical analysis projects.

It aims to make it easier to create software projects that implement a Bayesian workflow that scales to arbitrarily many inter-related statistical models, data transformations, inferences and computations. Bibat also aims to promote software quality by providing a modular, automated and reproducible project that takes advantage of and integrates together the most up to date statistical software.

Bibat comes with "batteries included" in the sense that it creates a working example project, which the user can adapt so that it implements their desired analysis. We believe this style of template makes for better usability and easier testing of Bayesian workflow projects compared with the alternative approach of providing an incomplete skeleton project.

Scope

The target audience is people who want to write software implementing a Bayesian workflow as described in this paper and are willing to do so using Python scientific libraries, Stan, cmdstanpy, pydantic, pandera and make, as well as perhaps pytest, sphinx, quarto and github actions.

I am not aware of any interactive template that specifically targets a Python-based Bayesian workflow that scales to arbitrarily many models and data transformations.

In addition, bibat is unusual compared to other interactive templates because it is 'batteries-inclued', providing a full working example project rather than an incomplete skeleton

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication Options

JOSS Checks

NB I submitted an old version of this package to JOSS and it was judged to be out of scope in terms of substantial scholarly effort. See here for the relevant issue in the joss-reviews repository : https://github.com/openjournals/joss-reviews/issues/4760. The reviewers did not definitively assess whether the package was within JOSS's scope in terms of not being a "minor utility" but noted that they had a query about this.

I think that bibat in its current form is within JOSS's scope in both of these respects but I imagine they would want to assess this separately.

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Confirm each of the following by checking the box.

Please fill out our survey

P.S. *Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The [editor template can be found here][Editor Template].

The [review template can be found here][Review Template].

NickleDave commented 1 year ago

Hi @teddygroves just updating you that we have found a guest editor that is a good fit for your submission; we are onboarding them and expect to start the review late this week or early next week.

In the meantime I will get the EiC checks done.

Thank you for your patience.

NickleDave commented 1 year ago

Hi again @teddygroves overall I can tick off most of the boxes for editor checks.

There is one issue we do need you to address before we can proceed.

I also ran into a couple of minor issues that I describe in comments below but those are just there for feedback.

Please get those API docs added so we can go ahead with the review! Thank you.

Editor in Chief checks

Hi there! Thank you for submitting your package for pyOpenSci review. Below are the basic checks that your package needs to pass to begin our review. If some of these are missing, we will ask you to work on them before the review process begins.

Please check our Python packaging guide for more information on the elements below.



Editor comments

Installation

Documentation

Other notes

$  make analysis
. .venv/bin/activate && (\
      python -m pip install --upgrade pip; \
      python -m pip install -r requirements.txt --quiet; \
      install_cmdstan ; \
    )
Requirement already satisfied: pip in /Users/davidnicholson/opt/anaconda3/lib/python3.7/site-packages (21.3.1)
Collecting pip
  Using cached pip-23.0.1-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.3.1
    Uninstalling pip-21.3.1:
      Successfully uninstalled pip-21.3.1
Successfully installed pip-23.0.1
ERROR: Cannot install -r requirements.txt (line 1) and scipy because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
/bin/sh: install_cmdstan: command not found
teddygroves commented 1 year ago

Thanks a lot for the checks and suggestions and for finding a reviewer!

I will add API documentation and follow the __init__.py convention that you suggest as soon as possible, and also check if there is a way to fail sooner for unsupported Python versions.

The documentation page is hosted on readthedocs. I'll have a look and see if I've done anything that makes it slow to load.

I'll have to think a bit about your points re make and pip/requirements.txt, thanks for the pointers to alternatives.

NickleDave commented 1 year ago

Perfect, thank you @teddygroves

teddygroves commented 1 year ago

I've now fixed __init__.py as suggested above, added automatic documentation for the cli and all exposed python functions and added some extra code to setup.py in order to prevent installation on unsupported python versions.

I checked for possible causes of slow loading documentation but couldn't find much - just a largish (1.3M) html file. I also don't experience slow loading myself. I'll continue to keep an eye on this.

I think the main issue (api documentation) is now addressed. The other suggestions were to look at possible alternatives to make and requirements.txt - I'll look at those as soon as possible!

NickleDave commented 1 year ago

Perfect, thank you @teddygroves.

We are very fortunate that @mjhajharia has generously agreed to act as guest editor for this review. They will follow up here next, tagging in reviewers to start the review process.

mjhajharia commented 1 year ago

update: @alizma and @oriolabril have agreed to be reviewers for the package, things will be started soon!

mjhajharia commented 1 year ago

Hi! Reviews are due in 2 weeks, here's the template that you can post as a comment to this issue. Feel free to ask anything that might be ambiguous.

OriolAbril commented 1 year ago

Package Review

Documentation

The package includes all the following forms of documentation:

Example of working pyproject.toml python requirement I maintain [xarray-einstats](https://github.com/arviz-devs/xarray-einstats) which has a minimum python required in its pyproject.toml file. Today I have executed the following commands: ``` ❯ conda create -n test python==3.7 Retrieving notices: ...working... done Collecting package metadata (current_repodata.json): done Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: done ==> WARNING: A newer version of conda exists. <== current version: 22.11.1 latest version: 23.3.1 Please update conda by running $ conda update -n base -c defaults conda Or to minimize the number of packages updated during conda update use conda install conda=23.3.1 ## Package Plan ## environment location: /home/oriol/bin/miniconda3/envs/test added / updated specs: - python==3.7 The following packages will be downloaded: package | build ---------------------------|----------------- libffi-3.2.1 | he1b5a44_1007 47 KB conda-forge openssl-1.0.2u | h516909a_0 3.2 MB conda-forge pip-23.1.2 | pyhd8ed1ab_0 1.3 MB conda-forge python-3.7.0 | hd21baee_1006 31.5 MB conda-forge readline-7.0 | hf8c457e_1001 391 KB conda-forge setuptools-67.7.2 | pyhd8ed1ab_0 569 KB conda-forge sqlite-3.28.0 | h8b20d00_0 1.9 MB conda-forge wheel-0.40.0 | pyhd8ed1ab_0 54 KB conda-forge ------------------------------------------------------------ Total: 39.0 MB The following NEW packages will be INSTALLED: _libgcc_mutex conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge _openmp_mutex conda-forge/linux-64::_openmp_mutex-4.5-2_gnu bzip2 conda-forge/linux-64::bzip2-1.0.8-h7f98852_4 ca-certificates conda-forge/linux-64::ca-certificates-2022.12.7-ha878542_0 libffi conda-forge/linux-64::libffi-3.2.1-he1b5a44_1007 libgcc-ng conda-forge/linux-64::libgcc-ng-12.2.0-h65d4601_19 libgomp conda-forge/linux-64::libgomp-12.2.0-h65d4601_19 libstdcxx-ng conda-forge/linux-64::libstdcxx-ng-12.2.0-h46fd767_19 libzlib conda-forge/linux-64::libzlib-1.2.13-h166bdaf_4 ncurses conda-forge/linux-64::ncurses-6.3-h27087fc_1 openssl conda-forge/linux-64::openssl-1.0.2u-h516909a_0 pip conda-forge/noarch::pip-23.1.2-pyhd8ed1ab_0 python conda-forge/linux-64::python-3.7.0-hd21baee_1006 readline conda-forge/linux-64::readline-7.0-hf8c457e_1001 setuptools conda-forge/noarch::setuptools-67.7.2-pyhd8ed1ab_0 sqlite conda-forge/linux-64::sqlite-3.28.0-h8b20d00_0 tk conda-forge/linux-64::tk-8.6.12-h27826a3_0 wheel conda-forge/noarch::wheel-0.40.0-pyhd8ed1ab_0 xz conda-forge/linux-64::xz-5.2.6-h166bdaf_0 zlib conda-forge/linux-64::zlib-1.2.13-h166bdaf_4 Proceed ([y]/n)? y Downloading and Extracting Packages Preparing transaction: done Verifying transaction: done Executing transaction: done # # To activate this environment, use # # $ conda activate test # # To deactivate an active environment, use # # $ conda deactivate ❯ conda activate test ❯ pip install xarray-einstats==0.5.1 ERROR: Ignored the following versions that require a different python version: 0.3.0 Requires-Python >=3.8; 0.4.0 Requires-Python >=3.8; 0.5.0 Requires-Python >=3.8; 0.5.1 Requires-Python >=3.8 ERROR: Could not find a version that satisfies the requirement xarray-einstats==0.5.1 (from versions: 0.1, 0.2.0, 0.2.1, 0.2.2) ERROR: No matching distribution found for xarray-einstats==0.5.1 ``` I am not a packaging expert, but I remember ipython had an installation bug not long ago due to combining multiple installation files. Using only pyproject.toml might simplify things and work like it does with xarray-einstats.

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 26/04 -> 1h 40' (significant time spent going over reviewing guidelines too)

EDIT (May 15th): 2h more going over the code and paper, only thing left is reproducing the vignette

EDIT (May 19th): 1:30h setting up everything again from scratch and running the vignette close to completion

EDIT (May 24th): ~1h running the vignette from scratch successfully after the fixes and improvements to it

teddygroves commented 1 year ago

Just recording here that in the latest version of bibat (0.1.4) some of the comments above are now addressed.

Specifically:

Regarding dev installation instructions: there are instructions for pip installing the latest commit from the main branch, but there is no separate dev branch so that isn't linked. I'm not sure what is the best practice here but I'm very open to recommendations! Edit: I misunderstood, instructions for installing with development dependencies now added.

alizma commented 1 year ago

Bibat Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: ~4


Review Comments

Overall, this package shows a lot of promise as a foundation for statistical analysis codes and provides a reusable, extendable base for the Bayesian workflow. Moreover, the utilities for generating reports are very nice. Here are a few general comments and things which could be improved in addition to other issue-specific comments above:

Documentation

The current documentation is what seems to require the most attention. Specifically, some sections are entirely incomplete (as mentioned earlier) and a generic documentation for the generated environment are not provided. It suffices to automatically generate these for the default environment and include them on the overall documentation page. Additionally, the documentation should be split up into separate pages. For cleanliness, having a separate page for more details on the CLI, one for the default provided functions, the editing example you provided, and so on. Additionally, given the tools for automatically generating reports, it would be nice to have a generic example for doing this, especially for people who might not be familiar with using these tools.

The comments above with regards to default function documentation on the main documentation page apply equally well to the default documentation. Since projects based on this set-up will likely have problem-specific commentaries on each function, this is an additional important feature.

You also seem to be missing a linter check in your tests (use black or something similar). Also I'm not too sure how to assess the test coverage in your CI, but maybe you could check a few other things besides running the commands for the default example?

teddygroves commented 1 year ago

Thanks very much for the comments @alizma! I think they are now just about all addressed, and the package is overall a lot better as a result.

Pre-commit is a much nicer solution for linting than what I had previously.

Test coverage is now displayed in a readme badge.

Information about installing Cmdstan, as well as some lists of other dependencies, has been added to the readme and documentation.

The documentation has been split into multiple pages and generally improved. In particular, it is now explained at greater length how to document a bibat project, and the example Sphinx documentation includes automatic documentation generation for the data_preparation module.

A general problem that I don't really know what to do about is documenting and testing the example project code. Due to use of templates it generally isn't valid code, which makes automation difficult. What I've done is check that the example project runs from end to end without any errors, but there is definitely more that could be done.

Please let me know what you think, especially about the last point!

alizma commented 1 year ago

Hi @teddygroves, these all look like great improvements, nice work! I'll update my initial review to reflect the changes.

With regards to additional testing, since sections of the provided code are templated, maybe you could write tests with common combinations of argument types or instead implement functional tests (or behavior-based testing) in addition to your integration tests? With regards to functional testing, maybe you could have tests that ensure that bibat generates code which guarantee certain outputs when using different configuration files, for example. I imagine that giving the user freedom to choose which configuration file to use for initialization may lead to some simple-to-catch errors in the resulting code. Overall, since the goal is to provide boilerplate code with properties you guarantee to be true, writing tests that ensure these properties in different use cases is crucial. Practically, this could include simple tests which check that classes have correct attributes, functions have the right signatures and that they expected output for particular inputs, etc... But it's probably fine as is since your integration tests check the basic properties already. In any case, the more testing you have, the easier it will be for automatically check that bibat preserves features between versions, so it's really up to your discretion where you think more testing will be most useful.

All in all, I think you've done great work and I'm excited to see the projects that people build on top of bibat! Thanks for your time and please let me know if you have any additional questions.

OriolAbril commented 1 year ago

Checked everything on my list after managing to run the vignette successfully. I recommend approving the package (starting from version >0.1.5, that is, from the next version based on current status of main branch).

As it has already been commented, testing can be extended, adding a 2nd case of end-to-end workflow for example, but I think it is good enough already. Given the inherent interactivity of the library and workflow, my recommendation would go more along the lines of aiming to rerun one of the vignettes manually before each minor release. (note I work mainly with visualization, so I am probably biased towards keeping a human in the loop and not relying 100% on automated checks)

mjhajharia commented 1 year ago

Hi! Firstly, thanks to @teddygroves for the great addition to software reviewed by PyOpenSci. Also, thanks a ton @alizma and @OriolAbril for taking the time to review this and suggest helpful improvements. Here are my informal reviewer 3 notes, most of these are optional, I hope you find them helpful, but they're mostly not big enough to be strict hindrances.

Accessibility

I recommend using an accessibility checker on the website and looking through some issues from this report for hints

Color Contrast

image

Improper use of preformatted text element

image

Line Height
The line-height for paragraphs should be at least 1.5 times the font size.

image

URL colours

image

image image

https://bibat.readthedocs.io/en/latest/_static/report.html

teddygroves commented 1 year ago

Thanks @mjhajharia for the latest comments. This review process has been a really fantastic learning experience for me!

The 'fix broken links' PR above addresses some of the typos and broken links.

Regarding the modes, I think this part of bibat is hard to understand at least partly because the underlying code isn't the best. I've started to address this in the 'Refactor to add...' PR. The change requires updating the documentation and vignette so it will take a bit longer to arrive but should definitely be worth it.

Thanks for suggesting using an accessibility checker - I hadn't thought of doing this. Could you possibly tell me which one you used to generate the image above (or which one you recommend in general)?

teddygroves commented 1 year ago

I've now done most of the things recommended, including:

That leaves improving the documentation page's accessibility with a checker. I've made an issue for this and will start working on it as soon as possible.

teddygroves commented 1 year ago

I've done a few things to improve the accessibility of bibat's documentation:

I've closed issue #60 from above as I think there has been an improvement but I'd still love to know the checker that @mjhajharia used as I think it noticed some things that the ones I added misssed.

teddygroves commented 1 year ago

Just checking here if there is something that someone is waiting for me to do.

I think almost all of the reviewers' major concerns have now been addressed. There are still some live issues that arose from the reviews but I think they are relatively minor. If any of these are particularly pressing I'd appreciate a pointer so I can prioritise fixing them!

lwasser commented 1 year ago

hey @teddygroves 👋 ... pinging @mjhajharia here on this so they can check in on where the review is and provide a bit of guidance. thank you for all of the work that you've done on this package as a part of the review!!

mjhajharia commented 1 year ago

Hi @lwasser, thanks for the ping, and apologies for the delay. Hi @teddygroves, thank you for the PRs I went through them and everything looks great, this is really fantastic and I really appreciate you taking time to incorporate the optional feedback!

Thanks for suggesting using an accessibility checker - I hadn't thought of doing this. Could you possibly tell me which one you used to generate the image above (or which one you recommend in general)?

I think you're good to go in terms of accessibility now! For that report I used SiteImprove Accessibility checker extension, I'm not sure it's the best out there but it's easy to use and get a quick overview and points out major things in case this is helpful for you later!

In general though, if you follow standard accessibility guidelines that is a good place to be in for documentation, so thanks a ton for doing that!

Again sorry for the delay, had some personal emergencies, but this looks good to go as far as I am concerned, all the necessary things are resolved and the documentation looks. I am very happy to see this page, I've been meaning to make a PR in PyOpenSci docs about this as well and I think it would be super useful.

So this is good to go as far as I'm concerned, thankyou for the fantastic project @teddygroves!

teddygroves commented 1 year ago

Great, thanks @mjhajharia for the checker advice (Siteimprove looks really nice) and for the review generally!

I guess it's time for someone with the right privileges to change the label?

lwasser commented 1 year ago

hey there @teddygroves !! 👋 congratulations on bibat being accepted into the pyOpenSci ecosystem! I'm going to help wrap things up here. there are a few last things that need to happen! And now for the formal acceptance proceedings


🎉 bibat has been approved by pyOpenSci! Thank you @teddygroves for submitting bibat and many thanks to @OriolAbril and @alizma for reviewing this package! 😸

Author Wrap Up Tasks

There are a few things left to do to wrap up this submission:

Finally - Teddy, if you are interested - we welcome a blog post that showcases bibat's functionality. this helps us promote your great work supporting the scientific Python ecosystem. This is of course optional but if you wish to do this - you can check out a few of our other blogs posts including: pandera and moving pandas that have been published. these blogs are some of our most viewed content on the website!

Editor Final Checks

Please complete the final steps to wrap up this review. Editor, please do the following:

If you have any feedback for us about the review process please feel free to share it here. We are always looking to improve our process and documentation in the peer-review-guide.

teddygroves commented 1 year ago

Hi @lwasser, that's great news!

I think I've now done all the steps. The zenondo link for bibat version 0.1.10 is https://zenodo.org/record/8124312. I've added the badge and filled in the survey. The pr for updating packages.yml and contributors.yml is here: https://github.com/pyOpenSci/pyopensci.github.io/pull/216. I didn't know how to fill in some of the fields, e.g. github_image_id - I guess we can discuss that under the pr if necessary.

I'd love to contribute a blog post - for that do I just submit a pr for a new file here?

Thanks a lot @lwasser, @mjhajharia @OriolAbril and @alizma - I really appreciated your time and attention and enjoyed the review process!

lwasser commented 1 year ago

@teddygroves thank you! and welcome to pyopensci!! yes please submit your pr / blog post to that link - you can look at the moving pandas and pandera blogs to get a sense of what others have done. but there are no requirements! please just showcase what your awesome tool provides!

before this is closed i'll go through all of the checks above to ensure it's all done!! i hope to do that in the next 3 days - i'm traveling right now but will followup soon!

lwasser commented 1 year ago

ok @teddygroves i'm going through this all today. your package will be live on our website by the end of the day tomorrow at the latest!

thank you everyone for supporting this review!! @mjhajharia for leading the review as editor! @OriolAbril @alizma for your reviews and @teddygroves for submitting to us and doing the work via peer review!

all - if anyone would like to join our slack group please either email me at leah at pyopensci.org OR respond here if i can grab your email from your github profile!! closing this issue now! ✨

lwasser commented 1 year ago

reopening as this is going through JOSS!

lwasser commented 9 months ago

this issue can be closed as JOSS did deem this review out of scope . Many thanks everyone for your work here on the pyOpenSci review side of things!! Everyone involved in this review is welcome to join our slack group. if you haven't received an invitation feel free to reach out to @kierisi !! closing this now!!