Closed editorialbot closed 1 year ago
Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.
For a list of things I can do to help you, just type:
@editorialbot commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@editorialbot generate pdf
Software report:
github.com/AlDanial/cloc v 1.88 T=0.21 s (368.8 files/s, 39910.9 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Julia 8 230 19 1071
Python 10 254 274 1048
JSON 7 0 0 1037
Markdown 12 545 0 885
C++ 3 44 19 444
YAML 5 51 6 402
reStructuredText 9 175 83 263
R 4 24 99 215
TOML 7 44 3 150
C/C++ Header 4 74 352 135
make 3 32 22 135
TeX 1 15 4 88
DOS Batch 1 8 1 26
C 2 6 9 23
CSS 1 3 0 15
-------------------------------------------------------------------------------
SUM: 77 1505 891 5937
-------------------------------------------------------------------------------
gitinspector failed to run statistical information for the repository
Wordcount for paper.md
is 1174
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- None
MISSING DOIs
- None
INVALID DOIs
- None
ππΌ @roualdes, @adity-om, @salleuska this is the review thread for the paper. All of our communications will happen here from now on.
As a reviewer, the first step is to create a checklist for your review by entering
@editorialbot generate my checklist
as the top of a new comment in this thread.
These checklists contain the JOSS requirements β As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.
The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention https://github.com/openjournals/joss-reviews/issues/5236 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.
We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.
Please feel free to ping me (@Nikoleta-v3) if you have any questions/concerns. π ππ»
ππ» @adity-om any update on your review? π
@Nikoleta-v3 just a friendly follow up. Anything you know that we don't already know from this thread? Thanks :)
Thank you for the nudge @roualdes and apologises for the delay in my reply. Unfortunately, I don't but I am going to reach out to the reviewers.
@salleuska did you get to complete your review?
ππ» @adity-om any update on your review?
@Nikoleta-v3 and @roualdes I am sorry for the late reply. I had to travel in the past weeks and have unexpectedly limited access to the internet. I still have to check out some things about the software functionalities/docs, but I have some comments about the paper and a couple of issues.
Paper
I think that the writing of the paper could be improved and that there some things that are worth clarifying. I think that the package provides a great tool for Stan users to take advantage of internal Stan methods, allowing users to develop their own algorithms. However, it seems to me that some parts of the paper are more focused on marketing the product rather than presenting the methods the bridgestan package allows access.
Major Comments
I would suggest revising the text in "Statement of need" and clarify which methods of Stan models are usually not accessible via the standard Stan interfaces that are instead made available by bridgestan. I also think it would be useful to mention what are potential user cases or if there are already relevant developed algorithms that take advantage of the package.
From the text it is not clear to me what is the state of the art, for example:
line 43. "Using a memory-compatible C interface makes this possible even if the host language (e.g., R) was compiled with a different compiler, something no prior interface which exposed Stanβs log density calculations could allow." -> Is there previous work that allows to access to Stan's models methods? If so could you cite that? Or give an idea of the shortcomings of these interfaces?
line 44. "Existing tools with similar functionalities include JAX (Bradbury et al., 2018) and Turing.jl via the JuliaAD ecosystem " -> What do you mean with similar functionalities? From the sentence it is unclear whether these tools offer access to methods in Stan or instead provide functions for derivatives and calculation of log-densities?
Minor
line 27: "we find that rstan ranks at number 70, rstantools 179, and rstanarm 502." it is not obvious to a reader what these packages are,
line 39. "BridgeStan offers efficient, in-memory, from memory allocated within the host language, computations of the log joint density function of a Stan model, itself implemented using highly templated C++ from the Stan math library" -> this sentence is hard to follow; I think it would helpful to elaborate a bit, and possible merge these concepts related to memory in the paragraph at line 53.
line 47: "The Stan community by and large uses CPU hardware" - this is unclear; maybe "by" is a "buy"?
There is an abuse of the term "efficient" that I find distracting and I would encourage the authors to remove it when unnecessary.
Code
For now, I found one error in the R example.
https://github.com/roualdes/bridgestan/issues/121#issue-1705891136
I will check julia and python early next week.
Thanks so much for the helpful feedback.
@Nikoleta-v3 what is the expected process here? Do we wait until all feedback is in before updating our manuscript? Or do we push updates as the feedback rolls in, and the reviewers are responsible for pulling the latest changes?
Hey @roualdes you can push updates as the feedback rolls in. I do recommend having separate commits for each of the reviewersβ comments (obviously the minor comments you can address in one commit). You can then reply to the comments in this issue and provide the commit hash (in my experience this helps the reviewersβ a lot).
Some people also prefer opening pull requests in the case of major comments.
Now regarding the second reviewer, we have been in touch. Hopefully they will complete their review in the upcoming weeks.
Thank you again for your patience and submitting to JOSS ππ»
Thanks, @salleuska, for your suggestions. I think your comments have really helped. Here's a quick record of the changes we've made, based on your feedback so far.
Major Comments
Minor Comments
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Many thanks @roualdes! I did play around with Julia and Python and haven't had any issues to report. Also the documentation seems in pretty good shape to me. I'll take a look at the changes and keep you posted!
@editorialbot remove @adity-omfrom reviewers
I'm sorry human, I don't understand that. You can see what commands I support by typing:
@editorialbot commands
@editorialbot commands
Hello @Nikoleta-v3, here are the things you can ask me to do:
# List all available commands
@editorialbot commands
# Add to this issue's reviewers list
@editorialbot add @username as reviewer
# Remove from this issue's reviewers list
@editorialbot remove @username from reviewers
# Get a list of all editors's GitHub handles
@editorialbot list editors
# Assign a user as the editor of this submission
@editorialbot assign @username as editor
# Remove the editor assigned to this submission
@editorialbot remove editor
# Remind an author, a reviewer or the editor to return to a review after a
# certain period of time (supported units days and weeks)
@editorialbot remind @reviewer in 2 weeks
# Check the references of the paper for missing DOIs
@editorialbot check references
# Perform checks on the repository
@editorialbot check repository
# Adds a checklist for the reviewer using this command
@editorialbot generate my checklist
# Set a value for version
@editorialbot set v1.0.0 as version
# Set a value for branch
@editorialbot set joss-paper as branch
# Set a value for repository
@editorialbot set https://github.com/organization/repo as repository
# Set a value for the archive DOI
@editorialbot set set 10.5281/zenodo.6861996 as archive
# Mention the EiCs for the correct track
@editorialbot ping track-eic
# Generates the pdf paper
@editorialbot generate pdf
# Recommends the submission for acceptance
@editorialbot recommend-accept
# Generates a LaTeX preprint file
@editorialbot generate preprint
# Flag submission with questionable scope
@editorialbot query scope
# Get a link to the complete list of reviewers
@editorialbot list reviewers
# Creates a post-review checklist with editor and authors tasks
@editorialbot create post-review checklist
# Open the review issue
@editorialbot start review
@editorialbot remove @adity-om from reviewers
@adity-om removed from the reviewers list!
@editorialbot add @saumil-sh as reviewer
@saumil-sh added to the reviewers list!
Hey @roualdes ππ» @saumil-sh has agreed to be the second reviewer for your submission. Thank you saumil-shπ
All of our communications will happen here from now on. As a reviewer, the first step is to create a checklist for your review by entering:
@editorialbot generate my checklist
as the top of a new comment in this thread.
These checklists contain the JOSS requirements β As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.
Reviewers are encouraged to submit issues and pull requests on the software repository (please mention #5236 so that a link is created to this thread and I can keep an eye on what is happening) or posting their review on this thread.
We aim for reviews to be completed within about 2-4 weeks, and feel free to ping me (@Nikoleta-v3) if you have any questions/concerns. π ππ»
Many thanks @roualdes! I did play around with Julia and Python and haven't had any issues to report. Also the documentation seems in pretty good shape to me. I'll take a look at the changes and keep you posted!
I have read the new version of the paper, it is clearer to me! There is just one sentence that I am not following:
line 35 "By providing access to the method of Stan model, BridgeStan aides algorithm development by making more consistent the models being tested and the implementations of those models and their underlying mathematical representations" -> Plus, what does it mean here that the models (etc..) would be more consistent?
Other than this minor thing, for me, everything is good to go!
Great. Thanks, @salleuska.
Here's a longer explanation of what I intend to mean by more consistent. I'd appreciate it if you could offer some suggestions on how to make my attempt at a pithy description of the following more clear.
In my experience, statisticians commonly write numerical code implementing popular statistical models to compare their new algorithm against other algorithms. The trouble is numerical code can be tricky to write. There are tricks to avoid under/over-flow, to maintain floating point accuracy, to implement sum-to-one constraints, etc., and the Stan Math library, upon which BridgeStan depends, seeks out implementations to remedy as many of these numerical issues that the Stan Development Team can think of and/or find. Hence, by using BridgeStan, and thus the Stan Math library, algorithm developers can think more about their algorithm and less about numerical issues in their implementation.
Certainly, just using BridgeStan doesn't guarantee that a user doesn't implement poorly a hand-written log-sum-exp, instead of using Stan's built in and more numerically stable version log-sum-exp. Many (all?) of Stan's distributions' functions (log-densities, cumulative density functions, 1 - CDFs, etc.) implicitly use such numerical tricks, so using BridgeStan is at least some steps forward in more consistent numerical implementations of statistical models.
BridgeStan
: Efficient in-memory access to the methods of a Stan modelStan is a probabilistic programming language for statistical modeling along with the Stan math library, a C++
program with automatic differentiation capabilities. The Stan math library provides a template class with a range of functions for Stan models as methods. Markov chain Monte Carlo (MCMC) algorithms utilize these methods of Stan model classes. MCMC are computational statistics algorithms that offer means of sampling from analytically intractable probability distributions. The development in MCMC methods, like Hamiltonian Monte Carlo (HMC), provides a performance gain enabling sophisticated statistical and Bayesian inference. The BridgeStan library grants access to the methods of the Stan model, a C++
class, from Julia
, Python
, and R
; thus, facilitating the development and implementation of MCMC algorithms in high-level languages.
The interfaces of Stan, e.g. CmdStan
, provide means of transpiling, compiling, and accessing results. The functionality of BridgeStan differs from these interfaces as it exposes the Stan model class with extern "C"
allowing interface with other programming languages. There is some overlap with rstan
; however, rstan
is not under active development. In principle, the reach of BridgeStan can be widened to any language of choice, e.g. the repository features a contributed rust
interface. The documentation and the accompanying report are adequately detailed to convey the fundamentals and workings. The report features an illustrative implementation of Metropolis-adjusted Langevin, an MCMC algorithm. Overall, I suggest revising the report and documentation for clarity. Please consider the following suggestions.
The Stan Model Server, and ReddingStan paved the way for BridgeStan, please consider citing or acknowledging them in the report.
Lines 61-67 are ambiguous, at least to me, whether BridgeStan is an alternative to JAX
and Turing.jl
/JuliAD
. If I understand correctly, automatic differentiation is implemented in the Stan math library; therefore, Jax
, JuliaAD
, and Stan math are comparable; however, this comparison is not in the scope of this submission. A relevant performance benchmark of BridgeStan could be how much overhead (time, memory, or other resources) BridgeStan adds to the functionality of Stan. Such benchmark supports claims like:
BridgeStan is a zero-cost abstraction built upon Stan's math library. (Line 74)
Additionally, a benchmark could favor the choice of, for example, Stan + BridgeStan + Julia
over Turing.jl
. However, adding such a benchmark is time-consuming, and I would leave this decision to the authors.
Please consider substituting hyperlinks and Wikipedia references in the report with more reliable or valuable references wherever applicable. For example, please consider changing PageRank to an appropriate publication. Please consider hyperlinking the Stan Discourse threads discussing the need for access to Stan model methods rather than hyperlinking the Stan Discourse URL.
An admirable scope of BridgeStan is that it facilitates the prototyping algorithms. However, a new user wanting to prototype an algorithm may need help as the documentation assumes familiarity with dependencies. Please consider making the documentation accessible by being more descriptive and explicit. Please consider relevant minor suggestions.
at first glance, it is ambiguous whether Stan is required and whether documentation assumes a working installation. Stan and Stan math are shipped as git submodules
when cloning the BridgeStan git
repository. Please consider mentioning this at the beginning of the section Getting Started.
When running git clone
with the --recurse-submodules
flag, the entire git
repositories of Stan and Stan math are cloned. I understand that Stan and Stan math are required; however, their git
histories are not. The BridgeStan git
repository is 1.52MB
whereas Stan 283MB
and Stan math 642MB
! Please avoid this using --depth 1
or --shallow-submodules
.
What about pre-existing Stan and the user choosing to obtain BridgeStan from a package repository (of the language) of their choice, e.g. PyPi. Is it possible to connect pre-existing Stan and BridgeStan? Does the user just need to declare paths for the .stan
/.so
and .data.json
files? What is the set_bridgestan_path()
in this case, and how to find it?
As of version 1.0.2, the Julia and Python interfaces will download the source for you, so this section is optional.
Therefore, in any case, the BridgeStan source is downloaded and is the bridgestan_path
. Please consider making documentation explicit for this case.
What decides whether the BridgeStan language interface can compile Stan model? Currently, the Python
and Julia
interfaces can, and R
and Rust
cannot compile a Stan model. Please consider highlighting this in the documentation.
The section on R
-compatible function mentions,
These are small shims which call the above functions.
However, on its own, the section could be more helpful. Please consider (hyper-)linking all the function_R()
to the documentation of function()
.
Please add a warning against using conda
/mamba
R
-recipe until the issue #131 is resolved.
Thanks @saumil-sh for your helpful feedback. Here's a list of the changes we made based on your feedback so far.
and 3. Both refer to commits 4f5b904 and e0878dc on the branch joss.
A benchmark isn't a great fit here, for a few reasons.
a. Comparing BridgeStan to Stan doesn't make a great amount of sense, since there is no other way we can extract gradients from Stan in a meaningful way to compare to BridgeStan. BridgeStan is indeed the only tool that does this. Plus we know that at most one memory allocation occurs in BridgeStan's extraction of a gradient, so there's not much to measure.
b. There already exists speed comparisons of Stan to Turing.jl, which is a much more meaningful test than BridgeStan to Turing.jl.
c. We are fairly word limited, by JOSS standards on word count.
see Minor comments.
All minor comments are addressed by pull requests #141 and #143.
Please let us know if you have any further comments or suggestions.
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Great. Thanks, @salleuska.
Here's a longer explanation of what I intend to mean by more consistent. I'd appreciate it if you could offer some suggestions on how to make my attempt at a pithy description of the following more clear.
In my experience, statisticians commonly write numerical code implementing popular statistical models to compare their new algorithm against other algorithms. The trouble is numerical code can be tricky to write. There are tricks to avoid under/over-flow, to maintain floating point accuracy, to implement sum-to-one constraints, etc., and the Stan Math library, upon which BridgeStan depends, seeks out implementations to remedy as many of these numerical issues that the Stan Development Team can think of and/or find. Hence, by using BridgeStan, and thus the Stan Math library, algorithm developers can think more about their algorithm and less about numerical issues in their implementation.
Certainly, just using BridgeStan doesn't guarantee that a user doesn't implement poorly a hand-written log-sum-exp, instead of using Stan's built in and more numerically stable version log-sum-exp. Many (all?) of Stan's distributions' functions (log-densities, cumulative density functions, 1 - CDFs, etc.) implicitly use such numerical tricks, so using BridgeStan is at least some steps forward in more consistent numerical implementations of statistical models.
Hi @roualdes, many thanks for the explanation! And apologies for the slow replies; I need to adjust my Github notifications. I think this is a really good feature, and it could be worth making this clearer. I see that you are almost done with the revision, so here is a last (optional) suggestion:
BridgeStan leverages the capabilities of the Stan Math library to alleviate algorithm developers from the hurdles of handling numerical issues such as under/over-flow or sum-to-one constraints. By entrusting these tasks to the implementations within the Stan Math library, users can focus on their algorithms. This also ensures more precise and stable implementations of statistical algorithms allowing for fair comparisons"
@salleuska, thanks. That's great. I didn't directly quote you, but I modified that paragraph with heavy inspiration from your suggestions. Here's the paragraph in total,
C++ can be cumbersome for algorithm prototyping. As such, developers have been requesting ways to access Stan models for algorithm development in Python, R, and Julia.
BridgeStan
answers this call, making it easy for algorithm developers to incorporate existing Stan models in their evaluation, e.g., the dozens of diverse models with reference posteriors inposteriordb
[@Magnusson:2022].BridgeStan
further aides algorithm development by leveraging the functions in the Stan Math library, which are written to appropriately handle numerical issues such as under/over-flow and sum-to-one constraints. Algorithm developers usingBridgeStan
can thus focus more on their algorithms and less on their implementations. These benefits also ensure more precise and stable implementations of statistical algorithms allowing for more fair comparisons.
I also updated the branch joss
.
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Apologies for not getting back earlier.
I agree that comparing BridgeStan and Stan doesn't make sense. I proposed an overhead benchmark as I intend not to leave any loose end for a reader. In the case of at most one memory allocation (this is fantastic!), I agree there is hardly anything to measure. I am convinced that conveying this to a reader is more relevant than referring to automatic differentiation libraries. Adding one or two sentences in this direction may be worth trading the word count.
Otherwise, I am satisfied with all other changes.
@saumil-sh, thanks. I updated the last paragraph before Section Example to include more information about the memory allocations necessary for the BridgeStan function log_density_gradient, as you suggest. Here's the updated paragraph in total (which is also updated on our BridgeStan branch joss)
BridgeStan
enables memory allocated in the host language (Julia, Python, or R), to be reused within Stan; though any language with a C foreign function interface could be similarly interfaced to access Stan methods. For instance, theBridgeStan
functionlog_density_gradient
has an optional output argument, the array into which the gradient will be stored. By avoiding unnecessary copies,BridgeStan
is a zero-cost abstraction built upon Stan's math library. If no output argument is passed tolog_density_gradient
, then at most one memory allocation, occuring in the host language, takes place.
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
I approve π this report π and the package π¦ for publication!
@Nikoleta-v3 please let us know if there is anything else you need from the authors or the reviewers. Thanks.
Thank you all for your work ππ» I will take one last look at the paper and the package in the next few days, and then we can proceed with the publication. ππ»
Hey @roualdes thank you for your patience! I have not further comments. Just two minor typos in the manuscript:
Line 8: ~to~
Line 35: I believe it should be aids
At this point could you please:
I can then move forward with accepting the submission.
Tagged v2.1.2: https://github.com/roualdes/bridgestan/releases/tag/v2.1.2 Zenodo up to date: https://zenodo.org/record/8169248
@editorialbot generate pdf
We made some edits including your above suggestions and ensuring our zenodo and authors list match up with recent contributions as well
Submitting author: !--author-handle-->@roualdes<!--end-author-handle-- (Edward Roualdes) Repository: https://github.com/roualdes/bridgestan Branch with paper.md (empty if default branch): joss Version: v2.1.2 Editor: !--editor-->@Nikoleta-v3<!--end-editor-- Reviewers: @salleuska, @saumil-sh Archive: 10.5281/zenodo.8169248
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@adity-om & @salleuska, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @Nikoleta-v3 know.
β¨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest β¨
Checklists
π Checklist for @salleuska
π Checklist for @saumil-sh