Closed tvwenger closed 4 years ago
Hi, thanks for using sbi
.
Unfortunately, I can not help you much. We are working towards automatically setting these parameters and giving guidelines to users, but we do not yet have principled ways to do so. Generally, the "Best validation performance" is a very good indicator for how well your neural net did (higher is better). "Acceptance rate" indeed only applies to "SNPE", not to "SNLE".
Regarding overfitting: we use early stopping. If the validation loss does not decrease for stop_after_epochs
epochs (default 20), we stop training and return the model that had the best validation loss.
Anyways: depending on what algorithm you use (SNPE
, SNLE
, SNRE
), the optimal hyperparameters will be different.
Here, we are applying SNPE to canonical problems in neuroscience, so you can have a look at the hyperparameters we used here.
I'll try to give some recommendations below, but please treat them carefully as they are purely based on empirical experience. All below comments refer to SNLE
(which you seem to be using).
SNLE learns the likelihood. In your case, the likelihood is a three dimensional density, which is rather low dimensional. So, I would expect that you do not need a huge neural network for this.
nsf
will outperform maf
in very most cases. But training time is higher.Hope this helps! Michael
Dear Michael,
Thank you very much for the quick response! It is good to hear that what I'm doing is not crazy. I've been tackling a problem for over a year and haven't been making any headway - I think sbi
is exactly what I needed!
Regarding 5 - The relationships between my model parameters and the data is very non-linear, and there is a hidden random variable that spreads out the data for any given set of model parameters (i.e., there is not a one-to-one mapping between a single set of parameters and a single point in data space). I have found that I need ~1 million simulations to accurately learn the likelihood, and it works extremely well!
Regarding 4 - Since I have so many simulations, I am using training batch size >> 1 (typically 1000-10000). Does a larger training batch size affect the learning process at all, or are there any downsides to using a large training batch size?
Thanks again. I'll be sure to send along the paper if my project goes anyway! Trey
Hi Trey,
Regarding 4: Glad to hear that it works well :)
Regarding 5: the training batch size is the 'classical' batch size you have in any neural network. Effects are described e.g. here. 10000 seems excessive to me, maybe rather stick to 500 or so.
one more quick thing: your parameter space is more high-dimensional than your dataspace and the mapping is highly non-linear. I could image that your results are improved when you 'encode' the parameters before passing them to the density estimator. sbi
offers this functionality through an embedding_net
: https://www.mackelab.org/sbi/reference/#sbi.utils.get_nn_models.likelihood_nn
You would do sth like:
embedding_hiddens = 25
encoded_output_dim = 10
torch.Sequential(
nn.Linear(num_free_parameters, embedding_hiddens)
nn.ReLU(),
nn.Linear(embedding_hiddens, embedding_hiddens)
nn.ReLU(),
nn.Linear(embedding_hiddens, encoded_output_dim)
)
and then pass it to likelihood_nn()
. Just try it out and see if it helps :)
Michael
Great, thanks for the tips! I will give it a try.
@michaeldeistler I gave the embedding net a try and I seem to have encountered a bug. See: https://github.com/mackelab/sbi/issues/310
You are probably using sbi version 0.11.x
. This was a bug that got fixed in 0.12.1
. Please update sbi
;)
Sorry, should have mentioned this above already...
Thanks for the great tool! I am new to likelihood free inference and normalizing flows, so I apologize in advance if these questions are naive.
I have a model (~25 free parameters) that generates data in 3 dimensions. I am experimenting with the tunable parameters for the density estimator and inference:
I have no sense for what are good values for these parameters, and I'm not sure what is the easiest way to "test" the neural network to see that it's giving sensible results. I suspect that more simulations is always better, but I fear that too many hidden features or transform layers will result in "over-fitting," since using more hidden features and layers results in faster convergence (fewer epochs).
I've looked into the output of "show_round_summary", which gives values for "Best validation performance" and "Acceptance rate" (the latter is always 1.0 for SNLE), but I don't know how to interpret these numbers. I've also tried generating some new simulated data and seeing how the likelihood log_prob changes with increasing features/layers. Based on these simple tests with SNLE, I've found that NSF gives pretty constant likelihood probs with increasing the number of features/layers from the default 50/5 to 200/10, whereas MAF seems to perform better with increasing features/layers.
Could you give me some advice or point me to some references so that I can get an idea about how to set these parameters properly for my model and data? Perhaps NSF is always preferred over MAF, or the number of transform layers should always be N times the parameter dimensionality, or you should always run M times as many simulations as you have observations, etc. Any guidelines or suggestions would be useful (and I'd recommend to include them on the website for other newbies like me!).
Thanks for your help!