pymc-devs / pymc-experimental

https://pymc-experimental.readthedocs.io
Other
72 stars 46 forks source link

Statespace: Don't automatically save statespace matrices as `Deterministic` variables #302

Closed jessegrabowski closed 4 months ago

jessegrabowski commented 5 months ago

I had originally done this to facilitate out-of-sample sampling tasks, so I could do e.g. pm.Flat('T', T). The result was that all matrices were saved to the idata, creating pretty horrible pm.model_to_graphviz outputs like this:

image

This also was extremely memory wasteful. Many of the matrices are not random at all, and they were being saved (chain, draw) times.

After this refactor, the matrices are dynamically rebuilt as needed from the parameter samples. The new graphs look like this:

image

There are also some en-passant changes to how exogenous data are handled that breaks the Structural example notebook. I will open a new PR after this one to address that, because I think I finally have it set up to handle forecasting with exogenous data. Basically, I was previously treating exogenous data like a type of "parameter". It's been upgraded to a first-class object, and custom models subclassing PyMCStateSpace that use exogenous data will now need to implement data_names and data_info properties.

ricardoV94 commented 5 months ago

For the memory question, matrices that don't change could be defined as constant or mutabledata

jessegrabowski commented 5 months ago

They're stored as TensorVariables in the statespace representation. This new way of doing this will directly use those once instead of copying them over and over into the idata.

ricardoV94 commented 5 months ago

They're stored as TensorVariables in the statespace representation. This new way of doing this will directly use those once instead of copying them over and over into the idata.

ConstantData and MutableData are also stored only once in the idata as opposed to Deterministics which are stored per draw. Seems like that's what you wanted?

jessegrabowski commented 5 months ago

Something like that, but I don't want to have a bunch of logic to decide if a matrix is static or contains parameters. If the user really wants to inspect matrices, he can ask for them manually and save them however he wants. In general, I think the important outputs for most users is going to be the parameters and the states. The rest can be more hidden away.

jessegrabowski commented 4 months ago

What's up with these jax test failures on the Ubuntu CI?

ricardoV94 commented 4 months ago

What's up with these jax test failures on the Ubuntu CI?

https://github.com/pymc-devs/pymc-experimental/issues/305 ?