sbi-dev / sbi

Simulation-based inference toolkit
https://sbi-dev.github.io/sbi/
Apache License 2.0
599 stars 152 forks source link

Questions about flow parameters and quality assurance #304

Closed tvwenger closed 4 years ago

tvwenger commented 4 years ago

Thanks for the great tool! I am new to likelihood free inference and normalizing flows, so I apologize in advance if these questions are naive.

I have a model (~25 free parameters) that generates data in 3 dimensions. I am experimenting with the tunable parameters for the density estimator and inference:

  1. Masked autoregressive flow (MAF) vs neural spline flow (NSF)
  2. Number of hidden features
  3. Number of transform layers
  4. Simulation and training batch size
  5. Number of simulations for inference

I have no sense for what are good values for these parameters, and I'm not sure what is the easiest way to "test" the neural network to see that it's giving sensible results. I suspect that more simulations is always better, but I fear that too many hidden features or transform layers will result in "over-fitting," since using more hidden features and layers results in faster convergence (fewer epochs).

I've looked into the output of "show_round_summary", which gives values for "Best validation performance" and "Acceptance rate" (the latter is always 1.0 for SNLE), but I don't know how to interpret these numbers. I've also tried generating some new simulated data and seeing how the likelihood log_prob changes with increasing features/layers. Based on these simple tests with SNLE, I've found that NSF gives pretty constant likelihood probs with increasing the number of features/layers from the default 50/5 to 200/10, whereas MAF seems to perform better with increasing features/layers.

Could you give me some advice or point me to some references so that I can get an idea about how to set these parameters properly for my model and data? Perhaps NSF is always preferred over MAF, or the number of transform layers should always be N times the parameter dimensionality, or you should always run M times as many simulations as you have observations, etc. Any guidelines or suggestions would be useful (and I'd recommend to include them on the website for other newbies like me!).

Thanks for your help!

michaeldeistler commented 4 years ago

Hi, thanks for using sbi.

Unfortunately, I can not help you much. We are working towards automatically setting these parameters and giving guidelines to users, but we do not yet have principled ways to do so. Generally, the "Best validation performance" is a very good indicator for how well your neural net did (higher is better). "Acceptance rate" indeed only applies to "SNPE", not to "SNLE".

Regarding overfitting: we use early stopping. If the validation loss does not decrease for stop_after_epochs epochs (default 20), we stop training and return the model that had the best validation loss.

Anyways: depending on what algorithm you use (SNPE, SNLE, SNRE), the optimal hyperparameters will be different.

Here, we are applying SNPE to canonical problems in neuroscience, so you can have a look at the hyperparameters we used here.

I'll try to give some recommendations below, but please treat them carefully as they are purely based on empirical experience. All below comments refer to SNLE (which you seem to be using).

SNLE learns the likelihood. In your case, the likelihood is a three dimensional density, which is rather low dimensional. So, I would expect that you do not need a huge neural network for this.

  1. I think that nsf will outperform maf in very most cases. But training time is higher.
  2. I think 50-100 will be sufficient in your case
  3. 5-10 is good. I have experienced improvements with 10 transform layers, but my model was a bit bigger than yours.
  4. Simulation_batch_size will not influence inference. It is purely used for the process of simulating (i.e. it is the size of the batch of parameters passed to the simulator). Training batch size: If you have less than around 5000 simulations (overall), leave it at 50. Beyond 5000 simulations, you can experiment with 100 or even 200.
  5. The more the better. How many very much depends on how nonlinear the mapping between parameters and data is.

Hope this helps! Michael

tvwenger commented 4 years ago

Dear Michael,

Thank you very much for the quick response! It is good to hear that what I'm doing is not crazy. I've been tackling a problem for over a year and haven't been making any headway - I think sbi is exactly what I needed!

Regarding 5 - The relationships between my model parameters and the data is very non-linear, and there is a hidden random variable that spreads out the data for any given set of model parameters (i.e., there is not a one-to-one mapping between a single set of parameters and a single point in data space). I have found that I need ~1 million simulations to accurately learn the likelihood, and it works extremely well!

Regarding 4 - Since I have so many simulations, I am using training batch size >> 1 (typically 1000-10000). Does a larger training batch size affect the learning process at all, or are there any downsides to using a large training batch size?

Thanks again. I'll be sure to send along the paper if my project goes anyway! Trey

michaeldeistler commented 4 years ago

Hi Trey,

Regarding 4: Glad to hear that it works well :)

Regarding 5: the training batch size is the 'classical' batch size you have in any neural network. Effects are described e.g. here. 10000 seems excessive to me, maybe rather stick to 500 or so.

one more quick thing: your parameter space is more high-dimensional than your dataspace and the mapping is highly non-linear. I could image that your results are improved when you 'encode' the parameters before passing them to the density estimator. sbi offers this functionality through an embedding_net: https://www.mackelab.org/sbi/reference/#sbi.utils.get_nn_models.likelihood_nn

You would do sth like:

embedding_hiddens = 25
encoded_output_dim = 10
torch.Sequential(
    nn.Linear(num_free_parameters, embedding_hiddens)
    nn.ReLU(),
    nn.Linear(embedding_hiddens, embedding_hiddens)
    nn.ReLU(),
    nn.Linear(embedding_hiddens, encoded_output_dim)
)

and then pass it to likelihood_nn(). Just try it out and see if it helps :)

Michael

tvwenger commented 4 years ago

Great, thanks for the tips! I will give it a try.

tvwenger commented 4 years ago

@michaeldeistler I gave the embedding net a try and I seem to have encountered a bug. See: https://github.com/mackelab/sbi/issues/310

michaeldeistler commented 4 years ago

You are probably using sbi version 0.11.x. This was a bug that got fixed in 0.12.1. Please update sbi ;)

Sorry, should have mentioned this above already...