Closed michael-chong closed 4 years ago
Thanks for reporting this. I have mixed feelings about this. It's weird for CmdStanR and RStan to behave differently here but if there's an object in the list that is invalid (e.g., an R character vector could never be valid input to Stan) then what's the use case for including it in the list passed to Stan? I can definitely see a use case for including valid objects in the list that aren't used for a particular model (they could be used for other models), but I'm having trouble imagining why I would ever need to include an object in the list that can never be used for any model. So can you share a bit more about your use case for this? I'm happy to be persuaded!
For me it can be convenient if I want to fit the model on different data sets, and I want to embed some meta-information about the data. There are two specific cases I have in mind:
It's usually a bit redundant, since this information exists elsewhere in the workflow, and if not I could always tag this information onto an object I save later, but I'd like it if the data lists could be more self-contained and portable.
Thanks, that makes sense. Keeping the data lists more self-contained and portable seems like a good use case.
Do we strip off any of the attributes of the data object supplied by the user? I've been using attributes as a metadata scheme in my projects elsewhere, so maybe we can recommend that here too.
Oh, one idea is that you could use the attributes of the fit object to store your metadata about the fit
fit = mod$sample(...)
attr(fit,'metadata') = my_metadata
That would seem to be more straightforward in that it keeps data and metadata as clearly separate things, no?
Do we strip off any of the attributes of the data object supplied by the user? I've been using attributes as a metadata scheme in my projects elsewhere, so maybe we can recommend that here too.
I think this could work even if we strip off the attributes because:
data <- list(...) # assume the list or individual elements have attributes
fit <- model$sample(data) # suppose we strip attributes internally
data # still has attributes
The only reason the attributes would get lost is if we removed them, then saved the data object inside the returned CmdStanMCMC object, and then it was that data object that was subsequently used. But we don't actually save the data in the object, so the original data list is what the user still has.
Does that make sense? Or did I misunderstand your idea?
I was actually thinking there was a way to retrieve the data from the fitted object, but now I see that's not the case, so setting an attribute on the data object supplied to sample or the resulting fit would both work.
Yeah, with rstanarm and brms you can get the data from the fitted model object, but we don't do that for rstan and cmdstanr. Occasionally I wish we did, but usually I think it's better to avoid storing another copy of the data.
Thanks @mike-lawrence, I hadn't thought of setting attributes on the data list; I think that's a good idea and alternative. Setting attributes on the fitted object might not achieve the same goal if, for instance, the data needs to be used somewhere else.
I'll certainly make use of attributes on the data list, though, and I also like that it distinguishes metadata.
You can even add the data as an attribute to the fit object if you want to be succinct!
Coming back to this to take stock of where we're at. I think @michael-chong's use case for passing invalid objects in is a good one, but if @mike-lawrence's suggestion of using attributes instead is sufficient then maybe we don't need to change anything here? What does everyone think? Thanks Michael and Mike for bringing this up and suggesting an alternative, respectively.
I think using attributes is sufficient for my needs. Thanks @jgabry and @mike-lawrence for the help!
Thanks!
Describe the bug The
sample()
method gives an error when objects in thedata
list are non-numeric, even if the object is not used in the model.To Reproduce
The above code gives:
Error: Variable 'extra_stuff' is of invalid type.
Replacing
extra_stuff
with1
for example allows the model to run with no issues.Expected behavior As in
rstan
(I believe), only objects used in the model are required to be a certain type.Operating system macOS 10.15.7
CmdStanR version number cmdstanr version 0.1.3 cmdstan version 2.25.0