Data type inconsistencies

cscherrer commented 10 years ago

I'm finding it awkward to maintain the distinction between parameters in the constrained space and those in the unconstrained space. It had seemed to me that the standard was for constrained parameters to be expressed as a list, and unconstrained parameters to be expressed as a vector. But optimizing seems to return a vector in the constrained space. This forces a list version to be recovered using something like relist instead of constrain_pars, as one might expect.

Some more details:

https://github.com/stan-dev/stan/issues/406

bob-carpenter commented 10 years ago

On 1/21/14, 7:26 PM, Chad Scherrer wrote:

I'm finding it awkward to maintain the distinction between parameters in the constrained space and those in the unconstrained space. It had seemed to me that the standard was for constrained parameters to be expressed as a list, and unconstrained parameters to be expressed as a vector.

I've never understood the ins and outs of R data structures, but I'm pretty sure it was never the intent of RStan to distinguish constrained/unconstrained parameters in terms of R data types. In my mind, that would be confusing R data types and constrained/unconstrained scales.

But |optimizing| seems to return a vector in the constrained space.

That's always been the intent, as in the other issue we discussed a while ago.

Everything user facing in the basic calls is supposed to be in the constrained space with one exception: random inits, which have to be in the unconstrained space to preserve the constraints. So init=0 means initialize to 0 in the unconstrained space, whereas providing a set of initial values is in the constrained space.

This forces a list version to be recovered using something like |relist| instead of |constrain_pars|, as one might expect.

Our intent was not to have users have to fiddle with constrain/unconstrain. Those methods are just there in case you want to use the compiled models for your own optimization or sampling methods and need to deal with I/O.

I'll let Jiqiang or someone else who knows R and what relist does comment on that.

Thanks for the reports, by the way --- I don't mean to be dismissive here, just explain what we were trying to do. I think RStan and CmdStan both need a lot better doc and more examples to explain what's going on in these cases. As is, our doc's very minimal and "unix-like" as the complaint usually goes.

Bob

cscherrer commented 10 years ago

It's actually really nice in principle to have both available. If some great new optimization routine comes out with an R interface, there's a potential to plug it in pretty directly. The only problem with this is that there's no type safety help from R, so the distinction needs to be made really clear.

cscherrer commented 10 years ago

I think this might be easily fixable. The core of the problem:

constrain_pars takes a vector in the unconstrained space, and returns a list in the constrained space.
unconstrain_pars does the reverse.
optimizing returns a list with a $par slot. But in this slot is a vector... in the constrained space!?

The natural solution, I think, is to change optimizing to return something similar to the output of constrain_pars.

This is completely incompatible with existing code if people are already taking the relist approach. Fortunately, the change from vector to list ought to cause some noisy error messages, instead of silently returning incorrect values. For easy backward support, it could be helpful to have an as_vector=FALSE option.

maverickg commented 10 years ago

optimizing returns a list with a $par slot. But in this slot is a vector... in the constrained space!?

It's on the constrained space, that is, the space on which the parameters are defined in the model. As Bob mentioned, users generally do not need to worry about transformation. Returning a vector is mimicking function optim in R.

Those function for transforming between constrained space and unconstrained space are for using the log posterior density function.

This is completely incompatible with existing code if people are already taking the relist approach. Fortunately, the change from vector to list ought to cause some noisy error messages, instead of silently returning incorrect values. For easy backward support, it could be helpful to have an as_vector=FALSE option.

I did not know there is a function called relist. I think using argument as_vector (or something like that) makes sense. I will add that when I have some time.

cscherrer commented 10 years ago

In case it's helpful, here's how I work around the issue for now:

fit <- stan("model.stan", data=my.data, iter=0)
mvec <- optimizing(fit@stanmodel, data=my.data)
zero <- constrain_pars(fit,rep(0,get_num_upars(fit)))
m <- relist(mvec$par, skeleton=zero)

maverickg commented 10 years ago

thanks. I guess I can use relist to simplify some code in rstan.

maverickg commented 10 years ago

81edfcde93674f16b3b96ffda9e187d31d57424c added an option.

stan-dev / rstan

Data type inconsistencies #36