36 integrate pydelfi code into the ili framework

maho3 commented 1 year ago

Integrated a very barebones pydelfi trainer into the ili framework.
Separated dependency on pytorch (sbi) and tensorflow 1 (pydelfi) in setup.cfg. ltu-ili can now be run with either of these installations on the backend, but not both at the same time (yet).
Currently and in future development, the dataloaders and validation runners should be able to handle either backend, while the inference runners will only handle a specific backend at a time. As an example of this, ee the backend accommodations made in ili/validation/runner.py.
I also highlighted the differences in the dependency builds in the new INSTALL.md. Previous ltu-ili installations will still work , but under the sbi models only.

This probably needs some stress-testing. I hope to make a comparison plot of Quijote TPCF with sbi and pydelfi before making this a full PR.

maho3 commented 1 year ago

Comparison! On the same Quijote TPCF data point, we have:

SBI SNPE_C with one MAF: plot_single_posterior-Copy1

Pydelfi with one MAF: plot_single_posterior (1)

Not a rigorous comparison yet, but it's cool that they both work!

maho3 commented 1 year ago

And here are the ranks stats for the Pydelfi+MAF model. Quite biased at the edges of the prior, but at least it looks like its learning something! To be improved... predictions (1) coverage

rankplot

maho3 commented 1 year ago

Okay so... PyDELFI does indeed work! The issues we were seeing in the previous comment were a result of the fact that the sampling chains at inference time were not converging. The solution was to increase the burn-in time to 1000 samples-per-chain, and we get unbiased posteriors! Below is are some plots derived from our toy simulator example:

Here's a single constraint. Notice that there are a few chains that still hadn't converged when burn-in ended. plot_single_posterior (2)

And here's the ensemble of constraints averaged over the test set. predictions (2) coverage (1) rankplot (1)

Now, the bad part. The emcee sampling currently implemented in PyDELFI is really bad. The above plots were created using only 10 samples (after burn-in) for each of 200 test points, and it took ~45 minutes at inference time. If we were to evaluate this on all of Quijote, it could take days...

It is currently bulit to use MPI, but it's hard to get that working. It doesn't take advantage of batch evaluations on a GPU nor CPU multiprocessing. These are all changes that we could make in pydelfi_wrappers.py, but they are best reserved for a future PR. I have made issue #47 for this.

maho3 commented 1 year ago

One more note, I've realized that, if you don't include the sequential training procedure, the likelihood estimation framework in PyDELFI is exactly that of SNLE_A in the sbi package. In fact, the pyDELFI paper directly points to the SNLE_A paper for its likelihood estimation fitting.

As a result, it may almost always be better to just use the sbi package instead of pyDELFI (due to its regular maintenance), but we should include both for completeness.

maho3 / ltu-ili

36 integrate pydelfi code into the ili framework #40