luiarthur / TuringBnpBenchmarks

Benchmarks of Bayesian Nonparametric models in Turing and other PPLs
https://luiarthur.github.io/TuringBnpBenchmarks/
MIT License
29 stars 1 forks source link
bayesian-inference bayesian-nonparametric-models benchmarks julia-language probabilistic-programming

TuringBnpBenchmarks

Benchmarks of Bayesian Nonparametric models in Turing and other PPLs.

This work is funded by GSoC 2020.

My mentors for this project are Hong Ge, Martin Trapp, and Cameron Pfiffer.

Abstract

Probabilistic models, which more naturally quantify uncertainty when compared to their deterministic counterparts, are often difficult and tedious to implement. Probabilistic programming languages (PPLs) have greatly increased productivity of probabilistic modelers, allowing practitioners to focus on modeling, as opposed to the implementing algorithms for probabilistic (e.g. Bayesian) inference. Turing is a PPL developed entirely in Julia and is both expressive and fast due partly to Julia’s just-in-time (JIT) compiler being implemented in LLVM. Consequently, Turing has a more manageable code base and has the potential to be more extensible when compared to more established PPLs like STAN. One thing that may lead to the adoption of Turing is more benchmarks and feature comparisons of Turing to other mainstream PPLs. The aim of this project is to provide a more systematic approach to comparing execution times and features among several PPLs, including STAN, Pyro, nimble, and Tensorflow probability for a variety of Bayesian nonparametric (BNP) models, which are a class of models that provide a much modeling flexibility and often allow model complexity to increase with data size.

To address the need for a more systematic approach for comparing the performance of Turing and various PPLs (STAN, Pyro, nimble, TensorFlow probability) under common Bayesian nonparametric (BNP) models, which are a class of models that provide a great deal of modeling flexibility and allow the number of model parameters, and thus model complexity, to increase with the size of the data. The following models will be implemented (if possible) and timed (both compile times and execution times) in the various PPLs (links to minimum working examples will be provided):

In addition, the effective sample size and inference speed of a standardised setup, e.g. HMC in truncated stick-breaking DP mixture models, for the respective PPLs will be measured.

What this repo contains

This repository includes (or will include) tables and other visualizations that compare the (compile and execution) speed and features of various PPLs (Turing, STAN, Pyro, Nimble, TFP) with a repository containing the minimum working examples (MWEs) for each implementation. Blog posts describing the benchmarks will also be included.

Software / Hardware

All experiments for this project were done in an c5.xlarge AWS Spot Instance. As of this writing, here are the specs for this instance:

The following software was used: