Closed ricardoV94 closed 2 years ago
An argument for collapsing, is that these are treated as vector distributions by other distribution factories like pm.Mixture, which would expect the logp to be collapsed along the support dimension (the time dimension in this case)
Would it be possible to allow both options? One used by mixtures and the other used in https://github.com/pymc-devs/pymc/blob/main/pymc/backends/arviz.py#L241?
It isn't available in ArviZ yet, but without the pointwise log likelihood, leave future out cross-validation can't be done/accelerated with PSIS. (and leave one out won't be doing what most users expect from it)
We would need to think about a special API for that
What would leave future or need to be able to work?
From a consistency point of view, I think that the consistent thing to do is to collapse the logp on the time axis. The same basic model comparison problem will pop up with any other multivariate vs scale distribution. Imagine if on one model you use a Normal
with a size argument, and in another you use an MvNormal
. The latter model will have a logp with one dimension less than the former model, and model comparison would be a mess.
From a practical point of view, maybe time series model comparisons are worth the effort to bend the consistency of logps. I don’t remember at the moment if pymc recomputes the point log livelihoods after sampling is finished or it does so during sampling. If it were the former, maybe we could make aeppl accept an argument to reduce the logp over a given axis, effectively treating it as a super dimension.
I'll try to be a bit more clear.
The first clarification is that I am asking for pointwise log likelihood data in order to be able to apply PSIS-LF/OO and WAIC. These approximations are the ones that require having the pointwise log likelihood data. That is not a requirement of brute force cross validation (be it leave one out or leave future out). I think that brute force cross validation is rarely useful in practice and we should make it easy for pymc users to use these much less intensive approximations.
The second clarification is that there are many cases in which the data can be partitioned in multiple ways when doing cross validation (see for example this notebook based on the rugby example). However, I think the issues between a multivariate normal and a time series have nothing to do with one another.
The normal vs mvnormal case you mention is conceptually the same as what I have labeled in that notebook as "Predicting the goals scored per match and per team" (normal) vs "predicting the outcome of a match" (mvnormal). When you use the normal, the definition of "observation" is ambiguous, pymc will return the "most pointwise" option and you'll need to sum over one dimension to get the right (right for this specific case, not always) pointwise log likelihood. Whey you use the mvnormal there is no ambiguity on the pointwise log likelihood anymore as you can't compute the log likelihood of a single component of the mvnormal alone.
In the majority of the timeseries examples the prediction task is forecasting the future. In the mvnormal case we don't generally want to make predictions on a new component of the mvnormal but on new multivariate observations (multivariate because they'll be multidimensional even without any batch dimension). Ignoring batch dimensions and assuming we have 50 timesteps and want to predict 4 time steps into the future, the brute force cross validation would look like one of these two:
(the 40 minimum window is one parameter of LFO that needs to be fixed/decided by the user, like the number of steps into the future we want to use to evaluate the forecasting)
Both options can be approximated with PSIS if we have 1) the fit on the whole 50 time steps and 2) the pointwise log likelihood at each time step.
If instead we want to use PSIS-LOO or WAIC on the time dimension (i.e. excluding one time step but keeping future ones in the fits when doing CV) we would also need these two things.
From a practical point of view, maybe time series model comparisons are worth the effort to bend the consistency of logps. I don’t remember at the moment if pymc recomputes the point log livelihoods after sampling is finished or it does so during sampling.
This happens when converting to InferenceData in the link I shared in my first comment
It seems that timeseries warrant a special logp method that does not collapse across time.
Perhaps the simplest would be to pass a do_not_collapse=True
kwarg when computing the elemwise loglike, but keep collapsing by default. Our logps already accept arbitrary kwargs, so it shouldn't be too hard.
The mixture might override this kwarg though, otherwise I think it might error, but we should add a test to be sure
I understand that PSIS-LF/OO is preferable, but there are models where the forecast into the future doesn't give the same number of pointwise logp values. For example, you might want to compare a naive linear regression, to an ARIMA and a Gaussian Process. The linear regression, and ARIMA will be able to compute element wise logps, but the Gaussian Process wont, because at its core, there's a MvNormal.
Closing this in favor of #6090
In discussion with @OriolAbril, it was brought up that, for model comparison, it is useful to have access to the probability of each observation/time step. Currently, the GaussianRandomWalk collapses these probabilities, as does the AR being refactored in #5734
https://github.com/pymc-devs/pymc/blob/6ae76286109f936845046bf77f716f88082bf6fe/pymc/distributions/timeseries.py#L240