stan-dev / rstan

RStan, the R interface to Stan
https://mc-stan.org
1.04k stars 267 forks source link

`rstan` fails to read `cmdstanr` output .csv files #1133

Open santikka opened 4 months ago

santikka commented 4 months ago

Summary:

rstan::read_stan_csv() no longer works for for .csv files produced by cmdstanr

Description:

It seems that rstan expects the save_warmup field to be an integer value, but cmdstanr instead writes this as a logical value (e.g., false) to the conversion to integer fails in rstan:::parse_stancsv_comments() resulting in an NA value for save_warmup. This used to work with earlier versions of cmdstanr/CmdStan.

Reproducible Steps:

library("cmdstanr")
#> This is cmdstanr version 0.8.1
#> - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
#> - CmdStan path: C:/Users/Santtu/.cmdstan/cmdstan-2.35.0
#> - CmdStan version: 2.35.0
file <- file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli.stan")
mod <- cmdstan_model(file)
data_list <- list(N = 10, y = c(0, 1, 0, 0, 0, 0, 0, 0, 0, 1))
samples <- mod$sample(
  data = data_list,
  refresh = 0,
  chains = 1,
  iter_sampling = 10,
  iter_warmup = 10
)
#> Running MCMC with 1 chain...
#> 
#> Chain 1 WARNING: No variance estimation is 
#> Chain 1          performed for num_warmup < 20 
#> Chain 1 finished in 0.0 seconds.
rstan::read_stan_csv(samples$output_files())
#> Warning in parse_stancsv_comments(comments): NAs introduced by coercion
#> Error in if (max(save_warmup) == 0L) {: missing value where TRUE/FALSE needed

Created on 2024-07-10 with reprex v2.1.1

RStan Version:

2.32.6

R Version:

R version 4.4.0 (2024-04-24 ucrt)

Operating System:

Windows 11 Pro 23H2

martinmodrak commented 3 months ago

I can reproduce the issue - outputs from CmdStan 2.34.1 are read fine, outputs from 2.35.0 cause problems.

Although I get a different error message (maybe because of using R 4.3.2)

Error in rstan::read_stan_csv(samples$output_files()) : 
  object 'n_kept' not found

I'll also note that the problem is made worse by the code at: https://github.com/stan-dev/rstan/blob/9b8d8fe715f1d51063621e6ca01d6265b02356fe/rstan/rstan/R/stan_csv.R#L220 where if save_warmup is not all 0 or all 1, n_kept never gets assigned.

jgabry commented 3 months ago

Thanks. I think this might be fixed by https://github.com/stan-dev/rstan/pull/1131. Or at least the save_warmup issue, not sure about the n_kept issue @martinmodrak ran into.

martinmodrak commented 3 months ago

@jgabry Yes, this doesn't resolve the problamatic logic around n_kept. Also #1131 seems likely to break the old .CSVs, though. I didn't do a full checkout, but notice:

as.integer(as.logical("0"))
# [1] NA
as.integer(as.logical("false"))
# [1] 0
jgabry commented 3 months ago

Also #1131 seems likely to break the old .CSVs, though.

Ah good point. Perhaps we should revert #1131 and figure out a more general approach? @bgoodri @andrjohns we probably need to decide whether we want to keep supporting this functionality in RStan. If there continue to be changes in the CSV files created by CmdStan I don't think RStan can keep up with releases in order to make sure read_stan_csv keeps working. Do we even need it anymore given that CmdStanR exists?

andrjohns commented 3 months ago

The approach I took for cmdstanr for compatibility with both was to convert any true/false to 0/1, so that the parsing logic for the rest stayed the same

martinmodrak commented 3 months ago

Note that using cmdstanr with brms relies on read_stan_csv. We probably should add a test with fixed CSVs from s couple CmdStan versions and it should be OK....

Dne st 31. 7. 2024 19:37 uživatel Andrew Johnson @.***> napsal:

The approach I took for cmdstanr for compatibility with both was to convert any true/false to 0/1 https://github.com/stan-dev/cmdstanr/blob/0e3a99a52bc5ad15d9e386e9993c10ad6905818b/R/csv.R#L843, so that the parsing logic for the rest stayed the same

— Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/1133#issuecomment-2261026771, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACILKU3PJQLPI4IAEB7ONHDZPEOFFAVCNFSM6AAAAABKVJVGVGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGAZDMNZXGE . You are receiving this because you were mentioned.Message ID: @.***>

andrjohns commented 3 months ago

Note that using cmdstanr with brms relies on read_stan_csv.

It doesn't look like that's the case anymore (which makes sense, otherwise there would have been a ton of bug reports in brms once 2.35 was released)

jgabry commented 3 months ago

otherwise there would have been a ton of bug reports in brms once 2.35 was released

Yeah that would have been a mess!

katrinabrock commented 2 weeks ago

I was getting the if (max(save_warmup) == 0L) error and installed the latest version of rstan (from r-universe) and I'm now getting object 'n_kept' not found error.

Is there a magical version I can use where I won't get either of these errors? Or is there a way I need to modify the input data to make it work?

I got the code+data combination I'm attempting from someone else and it was working at one point. I'm not sure what package version were used then. :-/

jgabry commented 2 weeks ago

Sorry for the hassle! I'm honestly not sure. Originally rstan::read_stan_csv only existed because there was no such thing as cmdstanr. So users needed a way to import CmdStan output into R. But now that cmdstanr has been around for years, it hasn't been a priority to keep the CmdStan csv files (which change sometimes) backwards compatible with rstan::read_stan_csv nor has there been enough bandwidth among developers to keep rstan::read_stan_csv up to date with changes to CmdStan.

Is this for a model fit with cmdstanr? Here's some code from @andrjohns to produce a named list of arrays of posterior samples, similar to what rstan would do:

https://discourse.mc-stan.org/t/rstan-read-stan-csv-throwing-error-with-cmdstan-models-versions-2-35/35665/7

That will hopefully allow you to avoid rstan::read_stan_csv entirely.

jgabry commented 2 weeks ago

@bgoodri In the next RStan release read_stan_csv should probably be formally deprecated unless the plan is to keep updating it when CmdStan csv files change, but I don't think that's been the plan so far.

jgabry commented 2 weeks ago

@katrinabrock in addition to my previous comment, another option could be to run an older version of cmdstan. I think it might just be as of 2.35.0 that this is an issue, so you could try installing e.g. 2.34.0. cmdstanr::install_cmdstan() will let you specify version = "2.34.0".

katrinabrock commented 2 weeks ago

@jgabry

Thanks for the quick response! switching cmdstanr to version 2.34.0 did not appear to resolve the issue. I'm a bit puzzled because the code I'm working with does not appear to be using cmdstanr. It appears to be using rstan only and passing an r object in as data. I will dig a bit more an post back if I discover anything useful.

jgabry commented 2 weeks ago

Yeah changing the cmdstanr version would only work if the csv files were being produced by cmdstanr. It’s weird if this is only using rstan because read_stan_csv() shouldn’t be necessary to use at all in that case. So yeah, quite strange!

katrinabrock commented 2 weeks ago

ok! I did some more digging and discovered a couple mistakes I made in my initial debugging: 1) I initially thought the script I was working on was calling rstan::stan when it was in fact calling rethinking::stan (https://github.com/rmcelreath/rethinking/blob/master/R/cmdstan_support.r#L104). 2) I switched my cmdstan versions, but forgot that cmdstan version mismatch will not automatically trigger recompilation of the model. Therefore, I was still running my code with models produced with 2.35.0 so of course this did not do anything to resolve the issue.

With this in mind, here is my advice to others (@santikka ) to workaround this issue:

Workarounds

Workarounds specific to rethinking package

For those hitting this issue from using the rethinking package (traceback includes this line https://github.com/rmcelreath/rethinking/blob/master/R/cmdstan_support.r#L97). This is happening because rethinking is trying to convert the cmdstanr output into a rstan output which is no longer supporter (per @jgabry above).

To resolve this, either

1) add rstan_out=FALSE to your rethinking::stan call which will result in the call returning a cmdstanr output instead of attempting to convert

OR

switch to set_ulam_cmdstan(FALSE) which will take cmdstanr out of the picture, use pure rstan and thus the result format will be unchanged

OR

use the general solution below

General workaround

Use cmdstan 3.34.0 or earlier.

Steps: Step 1) (one time per machine) Install earlier version of cmdstanr. cmdstanr::install_cmdstan(version = "2.34.0").

Step 2) (every time run your legacy code) Activate the older version of cmdstan like cmdstanr::set_cmdstan_path(file.path(dirname(cmdstanr::cmdstan_path()), 'cmdstan-2.34.0')). (Should work cross platform, but only tested on Linux.). You can check that this worked by running cmdstanr::cmdstan_version(). Result should be "2.34.0". It would make sense to put this cmdstanr::set_cmdstan_path(file... at the top of any script where you're having this issue.

Step 3) (one time per model) Recompile any models that you compiled with an incompatible version of cmdstanr. There are a few ways you can do this:

Note: With Option B or C, once you've done all this recompiling once for each model with version 2.34.0, you probably don't need to recompile every single time you run your code. To go back to not recompiling by default, remove force_recompile = TRUE from your arguments or run options("cmdstanr_force_recompile" = FALSE).

Real Solution

Re-write your code to use only cmdstanr since it's better supported than rstan and much better supported than a hybrid.

jgabry commented 2 weeks ago

Thanks so much for the detailed report! Hopefully this is useful to anyone else running into the same problem.