team-approx-bayes / dl-with-bayes

Contains code for the NeurIPS 2019 paper "Practical Deep Learning with Bayesian Principles"
242 stars 23 forks source link

Practical Deep Learning with Bayesian Principles

This repository contains code that demonstrate practical applications of Bayesian principles to Deep Learning. Our implementation contains an Adam-like optimizer, called VOGN, to obtain uncertainty in Deep Learning.

Setup

This repository uses PyTorch-SSO, a PyTorch extension for second-order optimization, variational inference, and distributed training.

$ git clone git@github.com:cybertronai/pytorch-sso.git
$ cd pytorch-sso
$ python setup.py install

Please follow the Installation of PyTorch-SSO for CUDA/MPI support.

Bayesian Uncertainty Estimation

Decision boundary and entropy plots on 2D-binary classification by MLPs trained with Adam and VOGN. VOGN optimizes the posterior distribution of each weight (i.e., mean and variance of the Gaussian). A model with the mean weights draws the red boundary, and models with the MC samples from the posterior distribution draw light red boundaries. VOGN converges to a similar solution as Adam while keeping uncertainty in its predictions.

With PyTorch-SSO (torchsso), you can run VOGN training by changing a line in your train script:

import torch
+import torchsso

train_loader = torch.utils.data.DataLoader(train_dataset) 
model = MLP()

-optimizer = torch.optim.Adam(model.parameters())
+optimizer = torchsso.optim.VOGN(model, dataset_size=len(train_loader.dataset))

for data, target in train_loader:

    def closure():
        optimizer.zero_grad()
        output = model(data)
        loss = F.binary_cross_entropy_with_logits(output, target)
        loss.backward()
        return loss, output

    loss, output = optimizer.step(closure)

To train MLPs by VOGN and Adam and create GIF

$ cd toy_example
$ python main.py

For detail, please see VOGN implementation in PyTorch-SSO.

Bayes for Image Classification

This repository contains code for the NeurIPS 2019 paper "Practical Deep Learning with Bayesian Principles," [poster] which includes the results of Large-scale Variational Inference on ImageNet classification.

VOGN achieves similar performance in about the same number of epochs as Adam and SGD. Importantly, the benefits of Bayesian principles are preserved: predictive probabilities are well-calibrated (rightmost figure), uncertainties on out-of-distribution data are improved (please refer the paper), and continual-learning performance is boosted (please refer the paper, an example is to be prepared).

See classification (single CPU/GPU) or distributed/classification (multiple GPUs) for example scripts.

Citation

NeurIPS 2019 paper

@article{osawa2019practical,
  title = {Practical Deep Learning with Bayesian Principles},
  author = {Osawa, Kazuki and Swaroop, Siddharth and Jain, Anirudh and Eschenhagen, Runa and Turner, Richard E. and Yokota, Rio and Khan, Mohammad Emtiyaz},
  journal = {arXiv preprint arXiv:1906.02506},
  year = {2019}
}