rethinkpriorities / squigglepy

Squiggle programming language for intuitive probabilistic estimation features in Python
MIT License
65 stars 8 forks source link

Numeric methods #61

Open michaeldickens opened 10 months ago

michaeldickens commented 10 months ago

Changes

This PR adds support for representing distributions numerically. The PR includes documentation on how exactly it works, see doc/source/numeric_distributions.rst. So I will just summarize the changes.

When people do cost-effectiveness analyses or other sorts of Fermi estimates, if they incorporate uncertainty at all, they almost always use Monte Carlo sampling. Numerically representing probability distributions as histograms is usually much more accurate at a given level of speed (or, equivalently, much faster at a given level of accuracy), so it kind of bugs me that people hardly ever use numeric methods. But as far as I know, there aren't any good tools for numeric methods, and it's a bit tricky to implement from scratch: not difficult in the grand scheme of things, but more involved than writing a Monte Carlo simulation, which you can do in only a few lines of code. So I decided to extend Squigglepy to support numeric methods.

I hope to reproduce this PR in Squiggle since Squiggle is actively supported and more widely used, but I did it in Squigglepy first because I thought it would be simpler (especially because Python has good numeric libraries).

An overview of supported features:

  1. Numeric representation of any of these distribution types: normal, log-normal, uniform, constant, chi-square, exponential, PERT, beta, gamma, Pareto
  2. Support for the special types MixtureDistribution and ComplexDistribution
  3. mathematical operations over numeric distributions: addition, subtraction, multiplication, division, negation, reciprocal, exp, log, power
  4. statistical operations over numeric distributions: mean, standard deviation, cdf, ppf, get random sample, clip, condition on the random variable satisfying a condition, probability that the random variable satisfies a condition
  5. Distributions with non-infinitesimal probability mass at zero (eg for representing interventions that have some chance of no effect)

QA

  1. There are a lot of possible edge cases. I wrote a lot of tests, which catch many edge cases that will most likely never happen in practice (e.g., what if you construct a two-sided distribution where only 0.00000000001% of the probability mass is below 0? what if a distribution has a standard deviation of 1 quintillion?), but there could well be broken edge cases that I didn't find.
  2. mypy is raising a bunch of errors because the code sometimes dynamically dispatches based on a variable's type, and mypy doesn't like that. It looks like mypy also has failures in other files so I don't know if valid static typing is a requirement for Squigglepy. If it is, I can do back and try to fix the type errors.
michaeldickens commented 10 months ago

Oh I should also mention, the function sq.numeric works basically as a drop-in replacement for sq.sample. I tested this PR on two of Laura Duffy's cost-effectiveness models (https://github.com/rethinkpriorities/risk_model) and it was pretty easy to port over, just had to ctrl-F sq.sample -> sq.numeric plus a couple other changes.