Question: Use posterior as prior for new fitting with new data?

pyro-ppl / numpyro

Probabilistic programming with NumPy powered by JAX for autograd and JIT compilation to GPU/TPU/CPU.

https://num.pyro.ai

Apache License 2.0

2.18k stars 238 forks source link

Question: Use posterior as prior for new fitting with new data? #1649

Closed EdwardRaff closed 1 year ago

EdwardRaff commented 1 year ago

I have some (larger) initial amount of data X_big that I want to train my model with. Afterward, I have many rounds of a smaller amount of new data X_small_i. I'd like to be able to sequentially use the posterior from the prior round as the prior for the next round, updating the model more quickly each time (re-training from scratch isn't viable).

Is there a way to do this in Numpyro today? I don't need to use any GP which would be problematic with the gram matrix. So it would be a parametric model.

martinjankowiak commented 1 year ago

you can fit a parametric model using SVI and then use the resulting density as a new prior. however this is generally expected to work pretty poorly, at least if you're looking for high-fidelity posterior approximations a la long runs of MCMC. in other words you're going through a parametric bottleneck so it's not clear if doing MCMC is really worth it. certainly it can't "rescue" you from any misfit in the parametric fit. might be more sensible to just stick to SVI throughout

EdwardRaff commented 1 year ago

We really want to have high-quality posteriors as this is explorative research in that respect.

Are you referring to using the svi_state object when calling update function manually?

martinjankowiak commented 1 year ago

i'm not referring to any code i'm referring to algorithms. my point is just that mcmc gets its nice asymptotic guarantees from its non-parametric nature. if there's a parametric bottleneck in there you will lose those asymptotic guarantees unless you do something (like importance sampling) to correct for mismatch in the parametric approximation. of course you can use a flexible estimator like a normalizing flow or what not and hope for the best but it's not clear a priori if all that effort will yield a result that outperforms an approach based purely on variational inference from the get go.

EdwardRaff commented 1 year ago

I think maybe we are using different definitions of parametric? I mean that the number of parameters in the model will be fixed irrespective of how much data I want to train on. Where a GP your number of parameters is a function of the number of rows of data.

martinjankowiak commented 1 year ago

@EdwardRaff can you please ask a concrete question on our forum? github issues are intended for bug reports, feature requests, etc.

in particular i'm unclear if you're asking about generic algorithm advice or particular implementation details. if the latter you would need to be more specific about what algorithm you want to implement