openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
721 stars 38 forks source link

[REVIEW]: Dynamax: A Python package for probabilistic state space modeling with JAX #7069

Open editorialbot opened 3 months ago

editorialbot commented 3 months ago

Submitting author: !--author-handle-->@slinderman<!--end-author-handle-- (Scott Linderman) Repository: https://github.com/probml/dynamax Branch with paper.md (empty if default branch): paper Version: v0.1.4 Editor: !--editor-->@osorensen<!--end-editor-- Reviewers: @thomaspinder, @gdalle Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/aaec4098e71833c94a74dbe1ff785d9e"><img src="https://joss.theoj.org/papers/aaec4098e71833c94a74dbe1ff785d9e/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/aaec4098e71833c94a74dbe1ff785d9e/status.svg)](https://joss.theoj.org/papers/aaec4098e71833c94a74dbe1ff785d9e)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@thomaspinder & @gdalle, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @osorensen know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @gdalle

📝 Checklist for @thomaspinder

editorialbot commented 3 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 3 months ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.15 s (764.3 files/s, 298530.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          67           3037           4303           8814
Jupyter Notebook                30              0          24675           3907
Markdown                        10             99              0            282
TeX                              1             25              0            199
reStructuredText                 2            152            208            153
YAML                             5             20              7            147
DOS Batch                        1              8              1             26
make                             1              4              7              9
TOML                             1              1              0              4
-------------------------------------------------------------------------------
SUM:                           118           3346          29201          13541
-------------------------------------------------------------------------------

Commit count by author:

   344  Scott Linderman
   178  Peter G. Chang
   135  xinglong
   131  Kevin P Murphy
    83  karalleyna
    79  gileshd
    69  Gerardo Duran-Martin
    60  petergchang
    26  Caleb Weinreb
    18  libby
    14  kostastsa
    13  slinderman
    10  Elizabeth DuPre
     8  Kevin Murphy
     6  andrewwarrington
     6  davidzoltowski
     6  patel-zeel
     4  Ravin Kumar
     3  Aleyna Kara
     2  Yixiu Zhao
     2  Zeel B Patel
     2  partev
     1  Collin Schlager
     1  Jake VanderPlas
     1  Jason Davies
     1  RaulPL
     1  Xinglong
     1  Xinglong Li
     1  xinglong-li
editorialbot commented 3 months ago

Paper file info:

📄 Wordcount for paper.md is 960

✅ The paper includes a Statement of need section

editorialbot commented 3 months ago

License info:

✅ License found: MIT License (Valid open source OSI approved license)

editorialbot commented 3 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1146/annurev-neuro-092619-094115 is OK
- 10.1017/CBO9781139344203 is OK
- 10.48550/arXiv.2305.16543 is OK
- 10.48550/arXiv.2306.03291 is OK
- 10.48550/arXiv.2305.19535 is OK
- 10.1038/s41592-024-02318-2 is OK
- 10.25080/majora-92bf1922-011 is OK
- 10.1017/cbo9780511790492 is OK
- 10.1016/j.tree.2007.10.009 is OK
- 10.1198/073500102753410408 is OK
- 10.3402/tellusa.v56i5.14462 is OK
- 10.1145/355656.355657 is OK
- 10.1109/TAC.2020.2976316 is OK
- 10.1109/TSP.2021.3103338 is OK

MISSING DOIs

- No DOI given, and none found for title: Probabilistic Machine Learning: Advanced Topics
- No DOI given, and none found for title: JAX: composable transformations of Python+NumPy pr...
- No DOI given, and none found for title: PyHSMM: Bayesian inference in HSMMs and HMMs
- No DOI given, and none found for title: Code Companion for Bayesian Filtering and Smoothin...
- No DOI given, and none found for title: SSM: Bayesian Learning and Inference for State Spa...
- No DOI given, and none found for title: JSL: JAX State-Space models (SSM) Library
- No DOI given, and none found for title: hmmlearn
- No DOI given, and none found for title: Structural Time Series (STS) in JAX

INVALID DOIs

- None
editorialbot commented 3 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

gdalle commented 3 months ago

Review checklist for @gdalle

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

thomaspinder commented 3 months ago

Review checklist for @thomaspinder

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

osorensen commented 2 months ago

👋 @gdalle, @thomaspinder, could you please update us on how it's going with your reviews?

thomaspinder commented 2 months ago

@osorensen expecting to be done by September 1st.

gdalle commented 2 months ago

Haven't started it yet so I will probably need two more weeks.

thomaspinder commented 2 months ago

Dynamax Review

Firstly, I'd like to congratulate the authors and contributors of Dynamax for creating a nicely designed package in JAX. Dynamax enables practitioners to easily fit state-space models (SSM), whilst simultaneously allowing researchers to implement their own custom SSM approaches. The repo's corresponding paper is well written and provides a clear and concise summary of the paper. Finally, I enjoyed reading the documentation of Dynamax; the large number of examples and use-cases is great for new users of the package.

I have broken my review up into two sections. The first section lists the blocking issues I have that prevent me from being able to mark each item of the reviewer's checklist as complete. In the second section, I suggest some things that I would advise the authors to do in Dynamax; however, I do not consider these as blocking concerns.

Significant issues

Failing tests

For me, running

git clone git@github.com:probml/dynamax.git
cd dynamax
pip install -e '.[dev]'
pytest dynamax

as per the docs, threw errors on a Mac M1. I was using Python 3.10 and a fresh virtual environment.

On this topic though, I see that you're Github testing workflow uses a pinned Python version and machine. It could be good to run your tests on all supported Python versions and a Mac and Linux machine. I have opened a PR with a suggestion that reflects this comment.

Docstring coverage

The docstrings within Dynamax are inconsistent. Taking, for example, dynamax.generalized_gaussian_ssm.inference as an example file, there are no docstrings for the methods or arguments of the public classes, yet there are comprehensive docstrings for the private functions. This is unconventional, and I would encourage the authors to more widely document the classes, methods, and functions of Dynamax so as to make the package easier for people to use. Additionally, typing such as the below from dynamax.generalized_gaussian_ssm.models is not particularly descriptive.

:param initial_mean: $m$
:param initial_covariance: $S$

It would be good to see more comprehensive documentation. To rigorously test documentation thresholds in the future, you may consider using interrogate. This will measure the number of objects for which a docstring is supplied and raise errors/warnings when the coverage decreases. I have opened a suggestive PR for how this can be achieved.

33.1% of classes/methods/functions are covered by tests. It would be good to see this increased. This was calculated with interrogate dynamax.

Broken documentation

The documentation is broken and many notebooks do not render - Example. My suggestion would be to use something like nbsphinx to execute the notebooks each time the documentation is built, and to fail when a notebook does not execute top-to-bottom.

 Incomplete contribution guidelines

Your contribution guidelines are incomplete. To push a change, one would need to add and commit the change before pushing - this is missing from the document.

 Missing typing

The package has no typing. Additionally, many functions use variable names which are very hard to interpret e.g., def _predict(m, P, f, Q, u, g_ev, g_cov) in dynamax.generalized_gaussian_ssm.inference. The combination of no typing and terse variable naming makes the code challenging to read, particularly for new users and/or those new to SSMs. I would strongly encourage the authors to either use more descriptive variable names, or add typing to their code. JaxTyping is excellent for this.

Incorrect typing

Some of the typing used is incorrectly done. Take ParamsGGSSM.initial_mean in dynamax.generalized_gaussian_ssm.models as an example. In this function, the typing on initial_mean is incorrect, as outlined here in JaxTyping. A change to the effect of

- initial_mean: Float[Array, "state_dim"]
+ initial_mean: Float[Array, " state_dim"]

should be made.

Suggested improvements

 Large repo size

You have some very large git pack files:

❯ du -ah dynamax | sort -rh | head -n 10

224M    dynamax
209M    dynamax/.git/objects/pack
209M    dynamax/.git/objects
209M    dynamax/.git
208M    dynamax/.git/objects/pack/pack-aaa194639683ed252462b75ee0ec2203edc5c09f.pack

You may consider running git gc to clean these up unless you need them.

 Tighter Python bounds

I would suggest tighter Python bounds. Currently your lower bound is 3.6; a version which is considered end-of-life by the Python Org.: https://devguide.python.org/versions/#unsupported-versions As an aside, I appreciate that it's specified in your setup.py, but explicitly stating the supported Python version(s) in your project's README is nice.

Package management

It is strongly advised to migrate setup.py and setup.cfg files to a pyproject.toml. I would encourage the authors to do this in order to future-proof Dynamax.

osorensen commented 1 month ago

Thanks a lot for your thorough review, @thomaspinder. @slinderman, you're welcome to start addressing the issues whenever you like.

slinderman commented 1 month ago

Thank you, @osorensen and @thomaspinder! We will start working on these issues and suggestions ASAP, and I'll keep you posted.

gdalle commented 1 month ago

Meanwhile I'm making my way through my own review. I will gather my remarks about code and documentation in two issues on the dynamax repo:

I might be limited by my very ancient Python skills for installation and testing of dynamax, so I'd appreciate a hand in figuring out the errors I observe.

As for the paper itself, it is very clear and does a good job of introducing dynamax. I have three minor suggestions:

Dynamax supports canonical SSMs and allows the user to construct bespoke models as needed.

Can you give more details on how this is implemented API-wise? For instance, how generic can observation distributions be? This question was a major motivation for my own take on HMMs, coded in Julia (https://joss.theoj.org/papers/10.21105/joss.06436). Hopefully I didn't misrepresent dynamax in the state of the art there.

Dynamax provides a unique combination of low-level inference algorithms and high-level modeling objects that can support a wide range of research applications in JAX.

Would it be possible to provide concrete examples of what is missing from the previous libraries? Obviously JAX support is a big aspect, since I know that some of them are coded in Numpy (hmmlearn) or PyTorch (pomegranate)

Parallel message passing routines that leverage GPU or TPU acceleration to perform message passing in sublinear time.

What do you mean by sublinear time? Isn't it just (roughly speaking) total sequential time divided by amount of parallelism?

gdalle commented 1 month ago

I'm halfway through the documentation and a significant number of code examples are broken by the Numpy 2.0 release (about 1 per page). I think it would be a good idea to fix them before I make a second (complete) pass.

osorensen commented 1 month ago

@slinderman, ref the post by @gdalle, could you please follow up and ping us here when done?

slinderman commented 1 month ago

Hi @osorensen, thanks for the reminder. @gdalle, sorry for the rendering issue. I thought we had fixed this by pinning to Numpy<2.0, but somehow this slipped through the cracks. I will fix it and let you know when it's ready for another pass.

osorensen commented 1 week ago

@slinderman any updates on this?

slinderman commented 5 days ago

Hi @osorensen, sorry for the delay. We pinned Numpy < 2.0 and recreated the documentation, and now almost all of the rendering issues should be fixed. There is a known issue with the HMC notebook (see https://github.com/probml/dynamax/issues/384), but the rest look good.

Regarding the overall review progress, we have begun work on the feedback we received from @thomaspinder. It is taking more time than anticipated to fix the typing issues, but we aim to wrap that up in the coming weeks. We greatly appreciate the reviewers' feedback, and we appreciate their patience while we work to address their comments.

osorensen commented 5 days ago

Thanks for the update, @slinderman

gdalle commented 1 day ago

@slinderman thanks for the fixes. Can you tag a new release containing them, so that running the documentation examples does not require cloning the repo?

slinderman commented 1 day ago

Hi @gdalle, of course. Please see https://github.com/probml/dynamax/releases/tag/0.1.5, which is now available on PyPI as well.