stan-dev / cmdstanpy

CmdStanPy is a lightweight interface to Stan for Python users which provides the necessary objects and functions to compile a Stan program and fit the model to data using CmdStan.
BSD 3-Clause "New" or "Revised" License
153 stars 69 forks source link

Differences between CmdStanPy and Pystan? #244

Closed trainorpj closed 3 years ago

trainorpj commented 4 years ago

Summary:

There are two ways to use stan with python (CmdStanPy and Pystan). That might confuse new users (like myself), who would like to know which one they should use. The differences could be listed in the documentation for either library.

Description:

I came across CmdStanPy in @mitzimorris's notebook featured on Andrew Gelman's blog. I've been using pystan for a few months, but I'm considering switching to CmdStanPy, since it seems to have a slightly better user experience.

When I took a look at the documentation, however, I wasn't convinced there was a significant difference between CmdStanPy and Pystan. In fact, there's only one mention of Pystan in the documentation, but it doesn't address the differences.

Could the documentation benefit from a section describing the differences between CmdStanPy and Pystan? It'd be nice to know if/when I should use one over the other?

Additional Information:

I'm happy to write a section for the docs. I would need somebody to explain the differences though 😄

Current Version:

maedoc commented 4 years ago

The main difference is in how the libraries use Stan directly. Stan is a C++ library, which PyStan uses directly via Cython while CmdStanPy make use of Stan through the CmdStan command line interface. There's no one-recommendation-for-all, since both libraries have advantages.

mitzimorris commented 4 years ago

I wrote a case study on Stan in the cloud - https://mc-stan.org/users/documentation/case-studies/jupyter_colab_notebooks_2020.html which has a blurb on CmdStanPy and CmdStanR - what it says is:

  • Simplicity and modularity: these packages wrap CmdStan and just provide functions to compile models, do inference, and assemble and save the results; other packages are needed for downstream analysis

  • Keep up with Stan releases: these interfaces can use any (recent) version of CmdStan, including the current release, Stan 2.23.

  • Quick and easy installation: minimal dependencies with other packages and no direct calls to C++.

  • Flexible licensing: BSD-3.

Also, they're designed to be as memory efficient as possible - if all your data is in files on disk, when you use CmdStanPy to run CmdStan, then CmdStanPy itself will require minimal additional memory overhead - it will just spawn a bunch of CmdStan processes. Therefore, you should be able fit larger datasets to more complex models.

CmdStanPy and CmdStanR are still in beta - we're keeping them in beta until we iron out all of the details of the APIs for both because we want them to be complete, consistent, and good.

I would love it if you would help beta test CmdStanPy and give feedback!

you're welcome to take on some of the open issues, also, w/r/t to names, there's a google doc that we're using to discuss the API. user perspectives would be most useful. https://docs.google.com/spreadsheets/d/1lU6vcY3GF_ftlFdCe2kDhr5FC4YJPWauC5bp2D9zz74/edit#gid=0

tomwphillips commented 4 years ago

I think it's also worth noting that pystan has a GPL v3.0 license whereas cmdstanpy has a BSD 3-clause license. Depending on your situation it may dictate which one you use.

mitzimorris commented 4 years ago

yes, that was definitely a driving force. many people had already written their own version of CmdStanPy - the starting point for this repo was Marmaduke Woodman's PyCmdStan - he said he wrote it because he couldn't deal with the GPL license. also a concern for the folks at Facebook Prophet. lots of folks working for companies raised the same issue.

mitzimorris commented 3 years ago

since this issue was filed, this has been discussed in many forums and the documentation has been built out.