Open zanklanecek opened 3 years ago
We have developed a way to tackle hyperparameter tuning which is discussed in the last paragraph in Sec 3 of our paper (https://arxiv.org/pdf/1906.02506.pdf). The key trick is to first tune OGN (no Variational part) similarly to Adam, and then go to Variational OGN afterwards. This often works. Tuning the prior precision, the tempering parameter, and the damping factor will then help you to get better uncertainty than deterministic methods such as ADAM or OGN.
Setting of parameter tau, by starting with very small values, and then slowly moving up to 1 is also very important to find good settings of hyperparameters. Hope this helps.
Is there a pytorch implementation of OGN? In the pytorchsso library I haven't been able to find one. The Appendix C in the paper explains that OGN is deterministic version of VOGN and one can just replace u_t with w_t. Is there an easy way to do this with VIOptimizer class?
It should be somewhere in the code. Let me ask @kazukiosawa; but if you find it in the meantime please let us know. Thanks!
@zanklanecek Thanks for your question! You can train a model with OGN by using SecondOrderOptimizer class, which is the parent class of VIOptimizer. See here for the correspondence of the arguments between these two classes.
Thanks @emtiyaz and @kazukiosawa for fast response 🥇. So to sum up:
If you want OGN as optimizer:
curv_shapes = {"Conv2d": "Kron", "Linear": "Diag"}
curv_kwargs = {"damping": 1e-3, "ema_decay": 0.999}
optimizer = torchsso.optim.SecondOrderOptimizer(model, "Cov", curv_shapes, curv_kwargs)
If you want VOGN as optimizer:
optimizer = torchsso.optim.VOGN(model, dataset_size=len(train_loader.dataset))
No dataset_size is needed for OGN?
actually, for OGN,
curv_shapes = {"Conv2d": "Diag", "Linear": "Diag"}
(see the definition of VOGN class).
right. OGN doesn't need dataset_size
@kazukiosawa @emtiyaz just to make clear what are the corresponding names of parameters in the code:
init_precision
?Thanks for your help!
@zanklanecek
init_precision
is for the posterior of the weights. You can set the prior variance (1/precision) by prior_variance
. (Please check https://github.com/cybertronai/pytorch-sso/blob/master/torchsso/optim/vi.py#L46-L47)
kl_weighting
is the tempering parameter.
I hope this helps!
@kazukiosawa
I made a typo there. I want to know what is prior precision (delta) in your article? Tuning prior precision (delta) is therefore same as tuning 1/(prior variance)?
I am sorry, I haven't dug dip into the theory and equations, just want to try VOGN on my classification task and see where it gets.
Thanks!
@zanklanecek No problem! Yes, tuning the prior precision (delta) is the same as tuning 1/(piror_variance)!
Thanks for now, I will try it out and see where I can get. Adam and OGN are both easily used for training. Hopefuly I can achieve similar results with VOGN which would be wonderful for my research.
I am trying out VOGN with ResNet18 for binary classification. With ADAM optimizer I am able to find a set of hyperparameters that yields excellent training and testing results (AUC >0.98). When changing ADAM for VOGN the training fails completely (AUC~0.5, training loss not changing). After trying numerous ranges of hyperparameters I am still having troubles with training the model. Should I keep on trying to train the model with VOGN or is this normal behavior that VOGN doesn't always work?
Thanks!