sakrejda / stannis

Code for effectively dealing with running CmdStan... from R.... because reasons...
GNU General Public License v3.0
13 stars 1 forks source link

Could not complete installation #4

Open tlyim opened 5 years ago

tlyim commented 5 years ago

Tried to install with devtools::install_github() on a linux cluster and received the following error message:

installing to /home/sbbg070/R_library/3.5/stannis/libs
** R
** inst
** byte-compile and prepare package for lazy loading
Note: wrong number of arguments to '=='
Note: wrong number of arguments to '!='
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
Error: package or namespace load failed for ‘stannis’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/home/sbbg070/R_library/3.5/stannis/libs/stannis.so':
  /home/sbbg070/R_library/3.5/stannis/libs/stannis.so: undefined symbol: _ZN5boost10filesystem4pathdVERKS1_
Error: loading failed
Execution halted
ERROR: loading failed
* removing ‘/home/sbbg070/R_library/3.5/stannis’
Error in i.p(...) :
  (converted from warning) installation of package ‘/tmp/RtmpvW4hcg/file944338e625e6/stannis_1.1.tar.gz’ had non-zero exit status
sakrejda commented 5 years ago

Can you tell me which functionality you need from the package? I got some feedback that said that there was a need for more basic functionality than the .yaml files so I backed that out in an updated copy (on a branch). I'm in the middle of fixing this package up right now so it would be good to know what there is interest in.

tlyim commented 5 years ago

I ran multiple chains with cmdstan on a cluster using the approach described in this post. Now I have the output in four files samples1.csv, ..., samples4.csv and hope that your stannis package could make the output analysis (print estimation results of selected parameters and the pairs plots) easier.

I tried bin/stansummary sample*.csv but the command does not seem to have an option to report only the summary of selected parameters. I only want to focus on a dozen of essential parameters (but technically there are numerous, eg, one error term to sample for each period of a state-space model).

I have the impression that your package would let me import the output files back to R and do the rest as usual in R. Would the following functions work on a standalone basis without installing stannis? (I am thinking about cloning the repo and source() these functions.)

samples <- stannis::read_file_set(root='.', pattern = '.*-output.csv')
post_warmup <- stannis::trim_warmup(samples)
merged_samples <- stannis::merge_chains(post_warmup)
samples_in_arrays <- stannnis::array_set(merged_samples)

Background: I need to use cmdstan on a cluster because I have a complex model, of which some parameters seem hard to estimate reliably even with a lot of simulated data points (12,000) and even after non-centered reparametrization. @bbbales2 suggested trying the metric=dense_e option, which at the moment is only available via the latest version of cmdstan.

Would also love your package's job scheduling functionality if it is more convenient than writing my own job scheduling commands as I am new to the cluster computing environment.

sakrejda commented 5 years ago

ok, that was the goal for those functions and they did work but I'm afraid the current master may not. Reading the CmdStan files was complicated and I tried a variety of approaches (a few too many, I'll have to sweep through that code again to make it work with the rest of the current code).

For the moment my suggestion is to do what I do:

  1. on the cluster you use sed to turn the CmdStan files into standard .csv files. You can use this code in a script that takes a single argument, put it in a convert-cmdstan.sh or something like that and use chmod 755 convert-cmdstan.sh to make it executable.
#!/bin/bash
sed -n -e '/^[^#]/p' $1  > $(dirname $1)/$(basename -s .csv $1)-samples.csv
sed -n -e '/^#/p' $1  > $(dirname $1)/$(basename -s .csv $1)-header.txt

Once you have standard .csv files you can use data.table::fread to read them in. This works well for very large files but you have to learn to use the internal rstan commands for calculating diagnostics and do pairs plots some other way.

  1. For smaller outputs you can use rstan::read_stan_csv on the cmdstan output directly: it takes a vector of filenames and does the merging. The only issue with it is speed and memory for very large outputs.

I'll ping you on this issue once I have the package updated. Thanks for the interest but I don't want to suggest something that wastes your time.

tlyim commented 5 years ago

Thanks for the suggestion. I will give it a try to see if I could manage. Do ping me when you have the package updated. Thanks.