stan-dev / posterior

The posterior R package
https://mc-stan.org/posterior/
Other
168 stars 23 forks source link

New draws_tensor format? #349

Open paul-buerkner opened 7 months ago

paul-buerkner commented 7 months ago

The discussion about a data.table format reminded me of something I wanted to bring up for a while. I wonder if we can have a non-rvar format that can handle high dimensional arrays of posterior draws? Otherwise behaving like a draws_matrix, i.e., storing chains only as an attribute. If I remember correctly, this is kind of how we store draws in rvars behind the scenes without making this its own format.

The reason for bringing this up is that some brms post-processing function return multi-dimensional output (+ the draws dim) and we don't really represent this in posterior so far except for in rvar. Using rvar as a default output in brms however would be too much a backwards compatibility breaking change (although it will become a non-default option in brms 3.0) and perhaps a bit too adventurous too. That is why I am looking for a "low-level" alternative that resembles what brms already does, namely just outputting high-dimensional arrays.

What are your thoughts on this? Specifically, what does @mjskay think?

mjskay commented 6 months ago

Cool, yeah. If I understand the intention, it wouldn't use rvar's specialized indexing with [ and [[, but could probably implement many of the other functions of rvars?

This could be implemented as a codification of what draws_of(<rvar>) returns. Would have to modify it a bit because rvar keeps the nchains and weights` attributes on the surrounding object rather than the array it contains, but in principle I think that could be adjusted.

I'm not sure what to call such a type --- I think it wouldn't be a "draws" type because the other "draws" types are collections of variables rather than a single multidimensional variable, so you wouldn't be able to convert any "draws" object to one of this new type (just as you can't convert other "draws" types to just a single rvar). Maybe this is a "raw_rvar" or a "rvar_array" or something? Or "variable_array"? Hmm.

This also raises another thought I've been mulling over, which is I wonder if it would be helpful to have some variations on rvar with different backing formats. E.g. one solution for #234 is to have a variant of rvar that is a list-column format. This would be less efficient for some operations (like matrix multiplication), but would allow storage in data.tables and may be more efficient for some other operations too (like storage in tibbles).

This perhaps suggests two families of formats within {posterior}: formats that represent collections of variables (the "draws" formats), and formats that represent single multidimensional variables (currently just "rvar", but maybe this family needs a name if we expand it).

paul-buerkner commented 6 months ago

you are right. It is not a draws_ object but like rvar conceptually, and indeed what we do have in draws_of(<rvar>). If you add this new format, I agree it should probably be the same kind of object than is/will be strored in draws_of(<rvar>).

Do you have ideas for next steps we should take in this direction?

mjskay commented 5 months ago

Hmm, I would probably wait on this until the weighted rvars stuff is merged, since that will add another attribute to rvar that will have to be dealt with.

Then, I would try moving any rvar attributes stored outside of the internal array (e.g. nchains and weights) to be attributes of the array itself, probably creating a simple wrapper type (call it var_array for now) around the array in the process. The process of doing that should reveal any hairy corner cases we might expect to encounter.

The var_array type will probably need to support factors as well, and it might need some subtypes to do that properly.

If all of that goes well, I'd try to figure out what rvar operations can be moved to var_array and turned into simple wrappers at the rvar level.

Somewhere in all of this we might also want to come up with a parent type for rvar and var_array, similar to how "draws" works.

paul-buerkner commented 5 months ago

I fully agree with your thoughts. thank you for looking into this!

Matthew Kay @.***> schrieb am Sa., 6. Apr. 2024, 23:49:

Hmm, I would probably wait on this until the weighted rvars stuff is merged.

Then, I would try moving any rvar attributes stored outside of the internal array (e.g. nchains and weights) to be attributes of the array itself, probably creating a simple wrapper type (call it var_array for now) around the array in the process. The process of doing that should reveal any hairy corner cases we might expect to encounter.

The var_array type will probably need to support factors as well, and it might need some subtypes to do that properly.

If all of that goes well, I'd try to figure out what rvar operations can be moved to var_array and turned into simple wrappers at the rvar level.

Somewhere in all of this we might also want to come up with a parent type for rvar and var_array, similar to how "draws" works.

— Reply to this email directly, view it on GitHub https://github.com/stan-dev/posterior/issues/349#issuecomment-2041202328, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2ADQZZZIF2OFFANKATTY4BNWXAVCNFSM6AAAAABEIPGDIWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGIYDEMZSHA . You are receiving this because you authored the thread.Message ID: @.***>