tpapp / StanRun.jl

Run Stan samples from Julia.
Other
3 stars 1 forks source link

A question about using StanRun derivatives in CmdStan.jl #4

Open goedman opened 5 years ago

goedman commented 5 years ago

Hi Tamas,

Thanks for this mornings update of StanSamples.jl. It now runs fine with the updated line 58 (used to be line 35).

Last week I added an example of using StanDump & StanRun to CmdStan.jl (and just for completeness this morning I added the updated StanSamples). All three work great.

As I mentioned before I've started to look into the possibility to use these three packages as the basis for CmdStan.jl v6. Right now StanRun supports Stan's Sample() method and uses Stan's set of default parameter values. This is all contained in the stan_cmd_and_paths() method.

Would you mind if I look into the possibility to derive StanSample, StanOptimize and StanDiagnose from StanRun by adding appropriate methods to stan_cmd_and_paths()?

At the same time I will add a conversion function from samples as NamedTuples to samples in a MCMCChains.Chains object.

tpapp commented 5 years ago

derive StanSample, StanOptimize and StanDiagnose from StanRun by adding appropriate methods to stan_cmd_and_paths()

Thanks for the suggestion. I will look into this but currently I am very busy and can't commit to doing this in the next few weeks.

goedman commented 5 years ago

Thank you.

For now I am testing these ideas in StanSample.jl. That is by far the most complicated one of the three. Once done and you have time (and interest) I'm open to do whatever you feel is most appropriate. I will not register it so we can move/rename to our hearts content.

goedman commented 5 years ago

Hi Tamas, just a quick update.

I’ve made a bit of progress on above suggestion in that I now completed an initial version of StanBase (all shared components), StanSample, StanOptimize, StanVariational and StanDiagnose.

To bring the functionality on par with CmdStan.jl I added a slightly modified version of stan_sample, mainly to interpose a more complete run command generation and optionally multiple init and data files/dicts/namedtuples.

Over the next few weeks I’ll add documentation, more tests and a few more simplifications (mainly in the cmdline generation). I’ll also look into what CmdStan.jl v6 could look like and also StatisticalRethinking.jl (which I’m at the same time updating to StatisticalRethinking 2nd edition).

The StanRun, StanDump and StanSamples based approach has grown on me and I definitely believe in the long run this new setup will be easier to maintain.

Rob

tpapp commented 4 years ago

Hi Rob, sorry that I forgot about this issue. If you want to merge some functionality back to these set of packages, I am happy to take PRs.

goedman commented 4 years ago

Hi Tamas,

No problem. As I worked on the StanJulia packages I decided to extend your 3 packages - through layering - to support the separate Stan methods, all options (parameters, inits and data) and Stan's language include. At the same time I wanted to return an MCMCChains.Chains object (if applicable).

A place was needed to setup common components and opted to create StanBase.jl. That's where I looked most closely if further integration with StanRun would be possible but decided layering might be better in this case. If someone wants to run a Stan program without many special requirements, StanRun is a great solution. I made sure StanSample could handle that case transparently.

I will of course study how you accept and pass on parameters! And will get back to you.

Best, Rob

goedman commented 4 years ago

Hi Tamas,

Thinking a bit more about your question, the main difference between StanRun's approach and what I did is support for the other cmdstan's methods (i.e. optimize, diagnose, variational and generated_quantities) and support for init values (could probably be handled through the new sample_options argument). I also split up these other methods into separate functions, e.g. stan_optimize, stan_variational, etc.

The other difference is to keep the stanmodel (SampleModel in my case) around to make it easier to handle cases such as regenerating cmdstan's summary display, e.g. to compare it with MCMCChains results or Particles results.

It might be possible to merge StanRun, StanBase and StanSample into a single package for sampling. Would that be of interest to you? I could work the next 1 or 2 weeks on a PR for your review?

Best, Rob

goedman commented 4 years ago

Continuation from the Discourse thread:

As another example, closer to home, with the switch to J1.3 StanBase.jl stopped building on Travis. Last week I got fed up with that situation and decided to try and debug it. Each step worked fine on macOS but only the final step worked on Travis. I started with forking StanRun. No problem on MacOS and Travis. In StanBase.jl (which was using StanRun) I simplified the pipeline construct. No problem on macOS, but failed on Travis with timeouts. I removed the dependency on StanRun (and copied over what I was using), Same result. I created a test option to inject simpler run(...) commands and noticed direct running cmdstan worked fine. Finally I switched from the pmap construct to how i's done in CmdStan. It started to build on Travis again.

Not very satisfactory, particularly because a user of CmdStan.jl told me it stopped working on J1.3 on Windows.

Anyway, way too much, not very helpful, detail. It's my problem, I need to solve it.

tpapp commented 2 years ago

@goedman, I am refactoring the package a bit (cf #6), and would be happy to address any issues, let me know if this is still relevant.

In particular, I would be happy to provide whatever interface you need in this package to incorporate it into a larger framework, but I would prefer to keep it modular (ie this remains a self-contained package for just running Stan). The reason is that I run stan on clusters where this is the sole package I need to install (I sync the data and the model, and sync back the results, do the analysis on my own machine).

goedman commented 2 years ago

Hi Tamas ( @tpapp ),

Please feel free to make any changes you like. I've kind of archived CmdStan.jl and recently did quite a few updates to StanSample.jl to support C++ level threads in the cmdstan binary. As I made several more changes to some of the code I "borrowed" from StanRun.jl, I pretty much moved those sections over to StanSample.jl (or to StanBase.jl).

For a non-programmer, I'm ok with the current versions of StanBase.jl v4 and StanSample.jl v6. If that's ok with you, once your changes are complete, I will definitely like to study your updates and see where I can improve StanJulia packages.

Most of my time I continue to think about better versions of the SR2TuringPluto and SR2StanPluto projects and similar projects for the "Regression and other stories" book.

When you say "sync back", you mean the .csv files, named tuples or a Tables.jl type of object? StanSample.jl by default returns a StanTables chain object which supports the Tables.jl API. Thus in both Turing and Stan I can say DataFrame(chain object) to have an easy starting point for further processing.

At some point it would be interesting to compare the RedCardsStudy performance tests between StanRun.jl and StanSample.jl. And maybe the DiffEqBayesStan performance results.

Best regards, Rob