Open tlyim opened 5 years ago
Can you tell me which functionality you need from the package? I got some feedback that said that there was a need for more basic functionality than the .yaml files so I backed that out in an updated copy (on a branch). I'm in the middle of fixing this package up right now so it would be good to know what there is interest in.
I ran multiple chains with cmdstan
on a cluster using the approach described in this post. Now I have the output in four files samples1.csv, ..., samples4.csv
and hope that your stannis
package could make the output analysis (print estimation results of selected parameters and the pairs plots) easier.
I tried bin/stansummary sample*.csv
but the command does not seem to have an option to report only the summary of selected parameters. I only want to focus on a dozen of essential parameters (but technically there are numerous, eg, one error term to sample for each period of a state-space model).
I have the impression that your package would let me import the output files back to R and do the rest as usual in R. Would the following functions work on a standalone basis without installing stannis
? (I am thinking about cloning the repo and source()
these functions.)
samples <- stannis::read_file_set(root='.', pattern = '.*-output.csv')
post_warmup <- stannis::trim_warmup(samples)
merged_samples <- stannis::merge_chains(post_warmup)
samples_in_arrays <- stannnis::array_set(merged_samples)
Background: I need to use cmdstan on a cluster because I have a complex model, of which some parameters seem hard to estimate reliably even with a lot of simulated data points (12,000) and even after non-centered reparametrization. @bbbales2 suggested trying the metric=dense_e
option, which at the moment is only available via the latest version of cmdstan.
Would also love your package's job scheduling functionality if it is more convenient than writing my own job scheduling commands as I am new to the cluster computing environment.
ok, that was the goal for those functions and they did work but I'm afraid the current master
may not. Reading the CmdStan files was complicated and I tried a variety of approaches (a few too many, I'll have to sweep through that code again to make it work with the rest of the current code).
For the moment my suggestion is to do what I do:
sed
to turn the CmdStan files into standard .csv files. You can use this code in a script that takes a single argument, put it in a convert-cmdstan.sh or something like that and use chmod 755 convert-cmdstan.sh
to make it executable.#!/bin/bash
sed -n -e '/^[^#]/p' $1 > $(dirname $1)/$(basename -s .csv $1)-samples.csv
sed -n -e '/^#/p' $1 > $(dirname $1)/$(basename -s .csv $1)-header.txt
Once you have standard .csv files you can use data.table::fread
to read them in. This works well for very large files but you have to learn to use the internal rstan
commands for calculating diagnostics and do pairs plots some other way.
rstan::read_stan_csv
on the cmdstan output directly: it takes a vector of filenames and does the merging. The only issue with it is speed and memory for very large outputs. I'll ping you on this issue once I have the package updated. Thanks for the interest but I don't want to suggest something that wastes your time.
Thanks for the suggestion. I will give it a try to see if I could manage. Do ping me when you have the package updated. Thanks.
Tried to install with devtools::install_github() on a linux cluster and received the following error message: