read_stan_csv() can't handle output from cmdstan optimize

flaxter commented 9 years ago

I have a model that I run from CmdStan:

./mymodel optimize data file=tmp.r output file=output.csv

And then in R:

opt = read_stan_csv("output.csv")

Gives:

Error in all_int_eq(warmup) : not all are integers
In addition: Warning message:
In FUN(X[[1L]], ...) : line with "Elapsed Time" not found

maverickg commented 9 years ago

No, it reads only the sample from sampling, NOT optimizing.

flaxter commented 9 years ago

OK, thanks for the clarification. I guess this is a more general feature request, then, for there to be a common format between samples and point estimates, e.g. I've also noticed in RStan that the format of the return value from optimizing() is totally different than what you get from extract(). But sometimes I'd like to use the same code to make plots of each. And as we're anticipating things like VB and maximum marginal likelihood I think this will become even more useful!

syclik commented 9 years ago

@flaxter, it's not as easy as you might think. Creating a general format for different methods isn't easy. If you want to be part of that discussion, let's move it to the stan-dev list if you have some thoughts. We've given it some thought and couldn't come up with something general enough without having to over-engineer or it be over-constraining.

(you'll need permissions to post to stan-dev, so just email me and I'll give them to you if you want to have that discussion)

On Fri, Dec 12, 2014 at 11:45 AM, flaxter notifications@github.com wrote:

OK, thanks for the clarification. I guess this is a more general feature request, then, for there to be a common format between samples and point estimates, e.g. I've also noticed in RStan that the format of the return value from optimizing() is totally different than what you get from extract(). But sometimes I'd like to use the same code to make plots of each. And as we're anticipating things like VB and maximum marginal likelihood I think this will become even more useful!

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstan/issues/123#issuecomment-66798685.

flaxter commented 9 years ago

OK, sounds good. Marking as closed for now!

syclik commented 9 years ago

Can you reopen?

We should have a sensible message instead of the current behavior. On Dec 12, 2014 12:08 PM, "flaxter" notifications@github.com wrote:

OK, sounds good. Marking as closed for now!

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstan/issues/123#issuecomment-66802346.

bob-carpenter commented 9 years ago

sampling() gives you a fit object with

sample of draws from the posterior, with
organized into multiple chains
broken into warmup and sampling (storing warmup is optional)
timing info (not sure if that's there in RStan)
link to compiled model
adaptation parameters (mass matrix, step size)

optimizing() gives you a list, with

final optimized value
Hessian at optimized value

The one thing that's common (other than perhaps storing a link to the compiled model) is that they are organized collection of parameter values. In sampling you get K chains of N warmup and M sampling draws. In optimization you get one sequence of intermediate values and a single final estimate.

Any suggestions on how to unify all this in RStan 3?

At the very least, I'd suggest we have a way to get a common data structure for parameters out of both, though typically you want the mean or median or some tail stat for the ones from sampling(), so maybe ways to get that would be helpful (they may already exist, and even if they don't, they're pretty simple to create).

Bob

On Dec 12, 2014, at 12:11 PM, flaxter notifications@github.com wrote:

Reopened #123.

— Reply to this email directly or view it on GitHub.

stan-dev / rstan

read_stan_csv() can't handle output from cmdstan optimize #123