pyro-ppl / pyro

Deep universal probabilistic programming with Python and PyTorch
http://pyro.ai
Apache License 2.0
8.6k stars 986 forks source link

Test infrastructure overhaul #120

Closed eb8680 closed 6 years ago

eb8680 commented 7 years ago

(This is a meta-issue for discussing the overall architecture and goals of the new test infrastructure, making a roadmap, and for tracking progress on its constituent issues. I'll continue filling it in and adding other issues but posted so we can discuss sooner. Also collecting all issues in this GitHub Project)

Architecture

After some reading and several discussions, especially with @ngoodman and @neerajprad , we settled on something like this (note many of these exist in some form in the current tests, but will need refactoring):

Deterministic tests

  1. Poutines: each individual poutine and each poutine composition appearing in other Pyro code should be tested directly for expected behavior.
  2. Parameters and parameter store: we should check that parameters are registered, retrieved, and flushed correctly
  3. Serialization: we should check that the parameter store can be serialized and deserialized correctly and without loss of information.
  4. Other utilities: Pyro has many other small deterministic utilities that play critical roles (e.g. the histogram builders in #61 ) and should be tested individually

Pyro instrumentation

The Pyro-specific tests should sit on top of and reuse tools for monitoring, evaluating, criticizing, and visualizing Pyro models. We can discuss those tools and their interfaces in separate issues:

  1. Visualization: #20
  2. Evaluating and criticizing:
  3. Logging/storage:
  4. Profiling:

Stochastic test platform/library

See #101 and also this blog post. Basically, although we're particularly interested in testing our inference algorithms, down the line there will be many other stochastic programs we're interested in testing hypotheses over. It seems like we should be able to build a lightweight platform to handle this more general use case.

This platform should be able to:

Stochastic unit tests

Pyro has many small stochastic components that should be tested individually:

  1. Distributions: #16
  2. Gradient estimators: #84 and others
  3. map_data: #93
  4. Marginal likelihood estimators and other (partially) stochastic functions, e.g. analytic KL divergences, CUBO: #91 #41 and others

Each of these requires a different set of bespoke tests implemented on top of the test platform.

Pyro stochastic integration tests

Most of the current Pyro tests are actually integration tests: that is, they run inference algorithms with a particular model and guide and compare empirical and ground-truth posterior statistics to decide whether the test passes or fails. We want to make this more systematic and make the results less noisy and more useful. We also want to test runtime of some examples against previous versions to monitor performance regression.

To that end, integration tests should be generated automatically from the following components:

  1. Model
  2. Guide
  3. Data
  4. Algorithm
  5. Test/hypothesis/experiment
  6. Setup/configuration

V0

There's a ton of work to do here, but fortunately we're not that far away from a minimal working prototype of the whole thing that should solve a lot of our immediate problems.

Basically, I think we need to:

eb8680 commented 6 years ago

@neerajprad we should probably revisit this now that we're adding more algorithms. Should we keep this issue open as a tracker or close it and open more specific ones? E.g. in #622 @ngoodman requested that we port some MCMC tests from webPPL.

neerajprad commented 6 years ago

Should we keep this issue open as a tracker or close it and open more specific ones?

Most of the action items from V0 are done. We can close this and track the remaining task of refactoring integration tests / putting in new tests for MCMC algorithms in #634.