missing features and todos for score estimation

janfb commented 2 months ago

there are a couple of unsolved problems and enhancements for NPSE:

MAP

MAP is using the score directly for doing gradient ascent on the posterior to find to MAP. this is currently not working accurately

IID sampling

iid sampling is implemented as proposed in Geffner et al, i.e., using the iid_bridge and by accumulating the individual scores over a batch of iid samples. However, this is not working accurately either. I have not found the source of the error yet, I added a couple of TODOs, e.g., here: https://github.com/sbi-dev/sbi/blob/9c6734f99b3ebaac0c1c3dcf3451c29fc69bda10/sbi/samplers/score/score.py#L93-L109

Log prob and sampling via CNF

Once trained, we can use the score_estimator to define a probabilistic ODE, e.g., a CNF via zuko and directly call log_prob and sample on it. At the moment this is already happening when constructing the ScorePosterior with sample_with="ode". However, it is a bit all over the place, e.g., log_prob is coming from the potential via zuko anyways, and the for the sampling we construct a flow with each call. A possible solution to make things clearer is creating a ODEPosterior that could be used by flow matching as well.

Allow transforms for potential

See score_estimator_based_potential, which currently asserts whether theta enable_transform=False

Better converged checks

Unlike the ._converged method in base.py, this method does not reset to the best model. We noticed that this improves performance. Deleting this method will make C2ST tests fail. This is because the loss is very stochastic, so resetting might reset to an underfitted model. Ideally, we would write a custom ._converged() method which checks whether the loss is still going down for all t.

gmoss13 commented 1 month ago

MAP

MAP is using the score directly for doing gradient ascent on the posterior to find to MAP. this is currently not working accurately

Is it the case that it does not work accurately, or that this doesn't run at all? Trying to find the MAP with gradient ascent requires differentiating through the backward method of zuko CNF's, which causes autograd errors for me.

Regardless, even if we can backprop through log_prob as constructed with CNFs, this would be incredibly slow as evaluating the log prob in this way requires an ODE solve. I am wondering if we could instead find an approximate MAP by using a variational lower bound of the log prob, e.g. Eq. 11 of Maximum Likelihood Training of Score-Based Diffusion Models. This way we don't need to compute a lot of ODE solves. @manuelgloeckler do you have any thoughts on this?

gmoss13 commented 1 month ago

Update after talking to @manuelgloeckler, the easiest way to calculate the MAP here would be to use the score directly at a time t = epsilon, instead of calculating and gradding through the exact log_prob. which as stated above would be really inefficient. I will implement this soon.

janfb commented 1 month ago

Update after talking to @manuelgloeckler, the easiest way to calculate the MAP here would be to use the score directly at a time t = epsilon, instead of calculating and gradding through the exact log_prob. which as stated above would be really inefficient. I will implement this soon.

I think that's actually what @michaeldeistler had implemented already. It's in the backup branch, here:

https://github.com/sbi-dev/sbi/blob/2b233cec4102b0156753e5608921611fefea751f/sbi/utils/sbiutils.py#L946-L954

and then in case of the score-based potential it would just use the gradient directly from here:

https://github.com/sbi-dev/sbi/blob/2b233cec4102b0156753e5608921611fefea751f/sbi/inference/potentials/score_based_potential.py#L132-L160

Or are you referring to yet a different approach?

sbi-dev / sbi