openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
697 stars 36 forks source link

[REVIEW]: BridgeStan: Efficient in-memory access to the methods of a Stan model #5236

Closed editorialbot closed 1 year ago

editorialbot commented 1 year ago

Submitting author: !--author-handle-->@roualdes<!--end-author-handle-- (Edward Roualdes) Repository: https://github.com/roualdes/bridgestan Branch with paper.md (empty if default branch): joss Version: v2.1.2 Editor: !--editor-->@Nikoleta-v3<!--end-editor-- Reviewers: @salleuska, @saumil-sh Archive: 10.5281/zenodo.8169248

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/8ecf92ae523d5719bcc2b63ec254945d"><img src="https://joss.theoj.org/papers/8ecf92ae523d5719bcc2b63ec254945d/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/8ecf92ae523d5719bcc2b63ec254945d/status.svg)](https://joss.theoj.org/papers/8ecf92ae523d5719bcc2b63ec254945d)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@adity-om & @salleuska, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @Nikoleta-v3 know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

πŸ“ Checklist for @salleuska

πŸ“ Checklist for @saumil-sh

editorialbot commented 1 year ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 1 year ago
Software report:

github.com/AlDanial/cloc v 1.88  T=0.21 s (368.8 files/s, 39910.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Julia                            8            230             19           1071
Python                          10            254            274           1048
JSON                             7              0              0           1037
Markdown                        12            545              0            885
C++                              3             44             19            444
YAML                             5             51              6            402
reStructuredText                 9            175             83            263
R                                4             24             99            215
TOML                             7             44              3            150
C/C++ Header                     4             74            352            135
make                             3             32             22            135
TeX                              1             15              4             88
DOS Batch                        1              8              1             26
C                                2              6              9             23
CSS                              1              3              0             15
-------------------------------------------------------------------------------
SUM:                            77           1505            891           5937
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 1 year ago

Wordcount for paper.md is 1174

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot commented 1 year ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- None

MISSING DOIs

- None

INVALID DOIs

- None
Nikoleta-v3 commented 1 year ago

πŸ‘‹πŸΌ @roualdes, @adity-om, @salleuska this is the review thread for the paper. All of our communications will happen here from now on.

As a reviewer, the first step is to create a checklist for your review by entering

@editorialbot generate my checklist

as the top of a new comment in this thread.

These checklists contain the JOSS requirements βœ… As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention https://github.com/openjournals/joss-reviews/issues/5236 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@Nikoleta-v3) if you have any questions/concerns. πŸ˜„ πŸ™‹πŸ»

salleuska commented 1 year ago

Review checklist for @salleuska

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Nikoleta-v3 commented 1 year ago

πŸ‘‹πŸ» @adity-om any update on your review? πŸ˜„

roualdes commented 1 year ago

@Nikoleta-v3 just a friendly follow up. Anything you know that we don't already know from this thread? Thanks :)

Nikoleta-v3 commented 1 year ago

Thank you for the nudge @roualdes and apologises for the delay in my reply. Unfortunately, I don't but I am going to reach out to the reviewers.

Nikoleta-v3 commented 1 year ago

@salleuska did you get to complete your review?

Nikoleta-v3 commented 1 year ago

πŸ‘‹πŸ» @adity-om any update on your review?

salleuska commented 1 year ago

@Nikoleta-v3 and @roualdes I am sorry for the late reply. I had to travel in the past weeks and have unexpectedly limited access to the internet. I still have to check out some things about the software functionalities/docs, but I have some comments about the paper and a couple of issues.

Paper

I think that the writing of the paper could be improved and that there some things that are worth clarifying. I think that the package provides a great tool for Stan users to take advantage of internal Stan methods, allowing users to develop their own algorithms. However, it seems to me that some parts of the paper are more focused on marketing the product rather than presenting the methods the bridgestan package allows access.

Major Comments

  1. I would suggest revising the text in "Statement of need" and clarify which methods of Stan models are usually not accessible via the standard Stan interfaces that are instead made available by bridgestan. I also think it would be useful to mention what are potential user cases or if there are already relevant developed algorithms that take advantage of the package.

  2. From the text it is not clear to me what is the state of the art, for example:

Minor

salleuska commented 1 year ago

Code

For now, I found one error in the R example.

https://github.com/roualdes/bridgestan/issues/121#issue-1705891136

I will check julia and python early next week.

roualdes commented 1 year ago

Thanks so much for the helpful feedback.

@Nikoleta-v3 what is the expected process here? Do we wait until all feedback is in before updating our manuscript? Or do we push updates as the feedback rolls in, and the reviewers are responsible for pulling the latest changes?

Nikoleta-v3 commented 1 year ago

Hey @roualdes you can push updates as the feedback rolls in. I do recommend having separate commits for each of the reviewers’ comments (obviously the minor comments you can address in one commit). You can then reply to the comments in this issue and provide the commit hash (in my experience this helps the reviewers’ a lot).

Some people also prefer opening pull requests in the case of major comments.

Now regarding the second reviewer, we have been in touch. Hopefully they will complete their review in the upcoming weeks.

Thank you again for your patience and submitting to JOSS πŸ™πŸ»

roualdes commented 1 year ago

Thanks, @salleuska, for your suggestions. I think your comments have really helped. Here's a quick record of the changes we've made, based on your feedback so far.

Major Comments

  1. Explain user cases, see commit 5d63b03
  2. Add a new paragraph about similar offerings from other software, see commit e2cc9ba and commit d8473a9

Minor Comments

WardBrian commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

salleuska commented 1 year ago

Many thanks @roualdes! I did play around with Julia and Python and haven't had any issues to report. Also the documentation seems in pretty good shape to me. I'll take a look at the changes and keep you posted!

Nikoleta-v3 commented 1 year ago

@editorialbot remove @adity-omfrom reviewers

editorialbot commented 1 year ago

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@editorialbot commands

Nikoleta-v3 commented 1 year ago

@editorialbot commands

editorialbot commented 1 year ago

Hello @Nikoleta-v3, here are the things you can ask me to do:


# List all available commands
@editorialbot commands

# Add to this issue's reviewers list
@editorialbot add @username as reviewer

# Remove from this issue's reviewers list
@editorialbot remove @username from reviewers

# Get a list of all editors's GitHub handles
@editorialbot list editors

# Assign a user as the editor of this submission
@editorialbot assign @username as editor

# Remove the editor assigned to this submission
@editorialbot remove editor

# Remind an author, a reviewer or the editor to return to a review after a 
# certain period of time (supported units days and weeks)
@editorialbot remind @reviewer in 2 weeks

# Check the references of the paper for missing DOIs
@editorialbot check references

# Perform checks on the repository
@editorialbot check repository

# Adds a checklist for the reviewer using this command
@editorialbot generate my checklist

# Set a value for version
@editorialbot set v1.0.0 as version

# Set a value for branch
@editorialbot set joss-paper as branch

# Set a value for repository
@editorialbot set https://github.com/organization/repo as repository

# Set a value for the archive DOI
@editorialbot set set 10.5281/zenodo.6861996 as archive

# Mention the EiCs for the correct track
@editorialbot ping track-eic

# Generates the pdf paper
@editorialbot generate pdf

# Recommends the submission for acceptance
@editorialbot recommend-accept

# Generates a LaTeX preprint file
@editorialbot generate preprint

# Flag submission with questionable scope
@editorialbot query scope

# Get a link to the complete list of reviewers
@editorialbot list reviewers

# Creates a post-review checklist with editor and authors tasks
@editorialbot create post-review checklist

# Open the review issue
@editorialbot start review
Nikoleta-v3 commented 1 year ago

@editorialbot remove @adity-om from reviewers

editorialbot commented 1 year ago

@adity-om removed from the reviewers list!

Nikoleta-v3 commented 1 year ago

@editorialbot add @saumil-sh as reviewer

editorialbot commented 1 year ago

@saumil-sh added to the reviewers list!

Nikoleta-v3 commented 1 year ago

Hey @roualdes πŸ‘‹πŸ» @saumil-sh has agreed to be the second reviewer for your submission. Thank you saumil-shπŸ™

All of our communications will happen here from now on. As a reviewer, the first step is to create a checklist for your review by entering:

@editorialbot generate my checklist

as the top of a new comment in this thread.

These checklists contain the JOSS requirements βœ… As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.

Reviewers are encouraged to submit issues and pull requests on the software repository (please mention #5236 so that a link is created to this thread and I can keep an eye on what is happening) or posting their review on this thread.

We aim for reviews to be completed within about 2-4 weeks, and feel free to ping me (@Nikoleta-v3) if you have any questions/concerns. πŸ˜„ πŸ™‹πŸ»

saumil-sh commented 1 year ago

Review checklist for @saumil-sh

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

salleuska commented 1 year ago

Many thanks @roualdes! I did play around with Julia and Python and haven't had any issues to report. Also the documentation seems in pretty good shape to me. I'll take a look at the changes and keep you posted!

I have read the new version of the paper, it is clearer to me! There is just one sentence that I am not following:

line 35 "By providing access to the method of Stan model, BridgeStan aides algorithm development by making more consistent the models being tested and the implementations of those models and their underlying mathematical representations" -> Plus, what does it mean here that the models (etc..) would be more consistent?

Other than this minor thing, for me, everything is good to go!

roualdes commented 1 year ago

Great. Thanks, @salleuska.

Here's a longer explanation of what I intend to mean by more consistent. I'd appreciate it if you could offer some suggestions on how to make my attempt at a pithy description of the following more clear.

In my experience, statisticians commonly write numerical code implementing popular statistical models to compare their new algorithm against other algorithms. The trouble is numerical code can be tricky to write. There are tricks to avoid under/over-flow, to maintain floating point accuracy, to implement sum-to-one constraints, etc., and the Stan Math library, upon which BridgeStan depends, seeks out implementations to remedy as many of these numerical issues that the Stan Development Team can think of and/or find. Hence, by using BridgeStan, and thus the Stan Math library, algorithm developers can think more about their algorithm and less about numerical issues in their implementation.

Certainly, just using BridgeStan doesn't guarantee that a user doesn't implement poorly a hand-written log-sum-exp, instead of using Stan's built in and more numerically stable version log-sum-exp. Many (all?) of Stan's distributions' functions (log-densities, cumulative density functions, 1 - CDFs, etc.) implicitly use such numerical tricks, so using BridgeStan is at least some steps forward in more consistent numerical implementations of statistical models.

saumil-sh commented 1 year ago

Review report BridgeStan: Efficient in-memory access to the methods of a Stan model

Stan is a probabilistic programming language for statistical modeling along with the Stan math library, a C++ program with automatic differentiation capabilities. The Stan math library provides a template class with a range of functions for Stan models as methods. Markov chain Monte Carlo (MCMC) algorithms utilize these methods of Stan model classes. MCMC are computational statistics algorithms that offer means of sampling from analytically intractable probability distributions. The development in MCMC methods, like Hamiltonian Monte Carlo (HMC), provides a performance gain enabling sophisticated statistical and Bayesian inference. The BridgeStan library grants access to the methods of the Stan model, a C++ class, from Julia, Python, and R; thus, facilitating the development and implementation of MCMC algorithms in high-level languages.

The interfaces of Stan, e.g. CmdStan, provide means of transpiling, compiling, and accessing results. The functionality of BridgeStan differs from these interfaces as it exposes the Stan model class with extern "C" allowing interface with other programming languages. There is some overlap with rstan; however, rstan is not under active development. In principle, the reach of BridgeStan can be widened to any language of choice, e.g. the repository features a contributed rust interface. The documentation and the accompanying report are adequately detailed to convey the fundamentals and workings. The report features an illustrative implementation of Metropolis-adjusted Langevin, an MCMC algorithm. Overall, I suggest revising the report and documentation for clarity. Please consider the following suggestions.

Major comments

  1. The Stan Model Server, and ReddingStan paved the way for BridgeStan, please consider citing or acknowledging them in the report.

  2. Lines 61-67 are ambiguous, at least to me, whether BridgeStan is an alternative to JAX and Turing.jl/JuliAD. If I understand correctly, automatic differentiation is implemented in the Stan math library; therefore, Jax, JuliaAD, and Stan math are comparable; however, this comparison is not in the scope of this submission. A relevant performance benchmark of BridgeStan could be how much overhead (time, memory, or other resources) BridgeStan adds to the functionality of Stan. Such benchmark supports claims like:

    BridgeStan is a zero-cost abstraction built upon Stan's math library. (Line 74)

    Additionally, a benchmark could favor the choice of, for example, Stan + BridgeStan + Julia over Turing.jl. However, adding such a benchmark is time-consuming, and I would leave this decision to the authors.

  3. Please consider substituting hyperlinks and Wikipedia references in the report with more reliable or valuable references wherever applicable. For example, please consider changing PageRank to an appropriate publication. Please consider hyperlinking the Stan Discourse threads discussing the need for access to Stan model methods rather than hyperlinking the Stan Discourse URL.

  4. An admirable scope of BridgeStan is that it facilitates the prototyping algorithms. However, a new user wanting to prototype an algorithm may need help as the documentation assumes familiarity with dependencies. Please consider making the documentation accessible by being more descriptive and explicit. Please consider relevant minor suggestions.

Minor comments

  1. at first glance, it is ambiguous whether Stan is required and whether documentation assumes a working installation. Stan and Stan math are shipped as git submodules when cloning the BridgeStan git repository. Please consider mentioning this at the beginning of the section Getting Started.

  2. When running git clone with the --recurse-submodules flag, the entire git repositories of Stan and Stan math are cloned. I understand that Stan and Stan math are required; however, their git histories are not. The BridgeStan git repository is 1.52MB whereas Stan 283MB and Stan math 642MB! Please avoid this using --depth 1 or --shallow-submodules.

  3. What about pre-existing Stan and the user choosing to obtain BridgeStan from a package repository (of the language) of their choice, e.g. PyPi. Is it possible to connect pre-existing Stan and BridgeStan? Does the user just need to declare paths for the .stan/.so and .data.json files? What is the set_bridgestan_path() in this case, and how to find it?

    As of version 1.0.2, the Julia and Python interfaces will download the source for you, so this section is optional.

    Therefore, in any case, the BridgeStan source is downloaded and is the bridgestan_path. Please consider making documentation explicit for this case.

  4. What decides whether the BridgeStan language interface can compile Stan model? Currently, the Python and Julia interfaces can, and R and Rust cannot compile a Stan model. Please consider highlighting this in the documentation.

  5. The section on R-compatible function mentions,

    These are small shims which call the above functions.

    However, on its own, the section could be more helpful. Please consider (hyper-)linking all the function_R() to the documentation of function().

  6. Please add a warning against using conda/mamba R-recipe until the issue #131 is resolved.

roualdes commented 1 year ago

Thanks @saumil-sh for your helpful feedback. Here's a list of the changes we made based on your feedback so far.

Major comments

  1. and 3. Both refer to commits 4f5b904 and e0878dc on the branch joss.

  2. A benchmark isn't a great fit here, for a few reasons.

    a. Comparing BridgeStan to Stan doesn't make a great amount of sense, since there is no other way we can extract gradients from Stan in a meaningful way to compare to BridgeStan. BridgeStan is indeed the only tool that does this. Plus we know that at most one memory allocation occurs in BridgeStan's extraction of a gradient, so there's not much to measure.

    b. There already exists speed comparisons of Stan to Turing.jl, which is a much more meaningful test than BridgeStan to Turing.jl.

    c. We are fairly word limited, by JOSS standards on word count.

  3. see Minor comments.

Minor comments

All minor comments are addressed by pull requests #141 and #143.

Please let us know if you have any further comments or suggestions.

WardBrian commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

salleuska commented 1 year ago

Great. Thanks, @salleuska.

Here's a longer explanation of what I intend to mean by more consistent. I'd appreciate it if you could offer some suggestions on how to make my attempt at a pithy description of the following more clear.

In my experience, statisticians commonly write numerical code implementing popular statistical models to compare their new algorithm against other algorithms. The trouble is numerical code can be tricky to write. There are tricks to avoid under/over-flow, to maintain floating point accuracy, to implement sum-to-one constraints, etc., and the Stan Math library, upon which BridgeStan depends, seeks out implementations to remedy as many of these numerical issues that the Stan Development Team can think of and/or find. Hence, by using BridgeStan, and thus the Stan Math library, algorithm developers can think more about their algorithm and less about numerical issues in their implementation.

Certainly, just using BridgeStan doesn't guarantee that a user doesn't implement poorly a hand-written log-sum-exp, instead of using Stan's built in and more numerically stable version log-sum-exp. Many (all?) of Stan's distributions' functions (log-densities, cumulative density functions, 1 - CDFs, etc.) implicitly use such numerical tricks, so using BridgeStan is at least some steps forward in more consistent numerical implementations of statistical models.

Hi @roualdes, many thanks for the explanation! And apologies for the slow replies; I need to adjust my Github notifications. I think this is a really good feature, and it could be worth making this clearer. I see that you are almost done with the revision, so here is a last (optional) suggestion:

BridgeStan leverages the capabilities of the Stan Math library to alleviate algorithm developers from the hurdles of handling numerical issues such as under/over-flow or sum-to-one constraints. By entrusting these tasks to the implementations within the Stan Math library, users can focus on their algorithms. This also ensures more precise and stable implementations of statistical algorithms allowing for fair comparisons"

roualdes commented 1 year ago

@salleuska, thanks. That's great. I didn't directly quote you, but I modified that paragraph with heavy inspiration from your suggestions. Here's the paragraph in total,

C++ can be cumbersome for algorithm prototyping. As such, developers have been requesting ways to access Stan models for algorithm development in Python, R, and Julia. BridgeStan answers this call, making it easy for algorithm developers to incorporate existing Stan models in their evaluation, e.g., the dozens of diverse models with reference posteriors in posteriordb [@Magnusson:2022]. BridgeStan further aides algorithm development by leveraging the functions in the Stan Math library, which are written to appropriately handle numerical issues such as under/over-flow and sum-to-one constraints. Algorithm developers using BridgeStan can thus focus more on their algorithms and less on their implementations. These benefits also ensure more precise and stable implementations of statistical algorithms allowing for more fair comparisons.

I also updated the branch joss.

WardBrian commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

saumil-sh commented 1 year ago

Apologies for not getting back earlier.

I agree that comparing BridgeStan and Stan doesn't make sense. I proposed an overhead benchmark as I intend not to leave any loose end for a reader. In the case of at most one memory allocation (this is fantastic!), I agree there is hardly anything to measure. I am convinced that conveying this to a reader is more relevant than referring to automatic differentiation libraries. Adding one or two sentences in this direction may be worth trading the word count.

Otherwise, I am satisfied with all other changes.

roualdes commented 1 year ago

@saumil-sh, thanks. I updated the last paragraph before Section Example to include more information about the memory allocations necessary for the BridgeStan function log_density_gradient, as you suggest. Here's the updated paragraph in total (which is also updated on our BridgeStan branch joss)

BridgeStan enables memory allocated in the host language (Julia, Python, or R), to be reused within Stan; though any language with a C foreign function interface could be similarly interfaced to access Stan methods. For instance, the BridgeStan function log_density_gradient has an optional output argument, the array into which the gradient will be stored. By avoiding unnecessary copies, BridgeStan is a zero-cost abstraction built upon Stan's math library. If no output argument is passed to log_density_gradient, then at most one memory allocation, occuring in the host language, takes place.

roualdes commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

saumil-sh commented 1 year ago

I approve πŸ‘ this report πŸ“„ and the package πŸ“¦ for publication!

roualdes commented 1 year ago

@Nikoleta-v3 please let us know if there is anything else you need from the authors or the reviewers. Thanks.

Nikoleta-v3 commented 1 year ago

Thank you all for your work πŸ™πŸ» I will take one last look at the paper and the package in the next few days, and then we can proceed with the publication. πŸ‘πŸ»

Nikoleta-v3 commented 1 year ago

Hey @roualdes thank you for your patience! I have not further comments. Just two minor typos in the manuscript:

At this point could you please:

I can then move forward with accepting the submission.

roualdes commented 1 year ago

Tagged v2.1.2: https://github.com/roualdes/bridgestan/releases/tag/v2.1.2 Zenodo up to date: https://zenodo.org/record/8169248 image

WardBrian commented 1 year ago

@editorialbot generate pdf

We made some edits including your above suggestions and ensuring our zenodo and authors list match up with recent contributions as well