Open janfb opened 2 months ago
MAP
MAP is using the score directly for doing gradient ascent on the posterior to find to MAP. this is currently not working accurately
Is it the case that it does not work accurately, or that this doesn't run at all? Trying to find the MAP with gradient ascent requires differentiating through the backward
method of zuko CNF's, which causes autograd errors for me.
Regardless, even if we can backprop through log_prob
as constructed with CNFs, this would be incredibly slow as evaluating the log prob in this way requires an ODE solve. I am wondering if we could instead find an approximate MAP by using a variational lower bound of the log prob, e.g. Eq. 11 of Maximum Likelihood Training of Score-Based Diffusion Models. This way we don't need to compute a lot of ODE solves. @manuelgloeckler do you have any thoughts on this?
Update after talking to @manuelgloeckler, the easiest way to calculate the MAP here would be to use the score directly at a time t = epsilon, instead of calculating and gradding through the exact log_prob. which as stated above would be really inefficient. I will implement this soon.
Update after talking to @manuelgloeckler, the easiest way to calculate the MAP here would be to use the score directly at a time t = epsilon, instead of calculating and gradding through the exact log_prob. which as stated above would be really inefficient. I will implement this soon.
I think that's actually what @michaeldeistler had implemented already. It's in the backup branch, here:
and then in case of the score-based potential it would just use the gradient directly from here:
Or are you referring to yet a different approach?
there are a couple of unsolved problems and enhancements for NPSE:
MAP
MAP is using the score directly for doing gradient ascent on the posterior to find to MAP. this is currently not working accurately
IID sampling
Log prob and sampling via CNF
Once trained, we can use the score_estimator to define a probabilistic ODE, e.g., a CNF via
zuko
and directly calllog_prob
andsample
on it. At the moment this is already happening when constructing theScorePosterior
withsample_with="ode"
. However, it is a bit all over the place, e.g., log_prob is coming from the potential via zuko anyways, and the for the sampling we construct a flow with each call. A possible solution to make things clearer is creating aODEPosterior
that could be used by flow matching as well.Allow transforms for potential
See
score_estimator_based_potential
, which currently asserts whether thetaenable_transform=False
Better converged checks
Unlike the
._converged
method in base.py, this method does not reset to the best model. We noticed that this improves performance. Deleting this method will make C2ST tests fail. This is because the loss is very stochastic, so resetting might reset to an underfitted model. Ideally, we would write a custom._converged()
method which checks whether the loss is still going down for all t.