Fuzzy checking: Bayesian version

Description

Category: test
JIRA issue: SSCI-995 (research ticket)
Research reference: none

Changes and notes

Terminology:

Here, I am considering our existing integration tests to be part of "V&V" (which I write "v_and_v" in file paths). They are the automated part, while our traditional V&V process is the manual part. This PR adds automated V&V/integration testing of proportions (that are not 0 or 1).
"Fuzzy checking" is the term I am using for checking anything that is subject to sampling error/stochastic variation. For the time being, this is only proportions, but could be expanded to other types of quantities in the future.

Statistically, this uses Bayesian hypothesis testing. Each check compares two models/hypotheses of the underlying rate, one that represents "if there is a bug" and one that represents "if there is not a bug." We use a simple Jeffreys beta prior for "if there is a bug." For "if there is not a bug" we fit a beta distribution to a 95% confidence interval provided by the user. In both cases, this is then fed through a binomial (creating a beta-binomial) for the actual count data. We fail the tests if any check is "decisive" (Bayes factor > 100) in favor of "there is a bug."

Technically, this uses a pytest fixture that is kept active for the entire test session and allows testing these hypotheses while maintaining a log of information about them. This diagnostic information is output at the end of a test run for optional human inspection. I have gitignored the diagnostic output -- is this the right call?

Verification and Testing

I have verified that these tests pass. I have also verified that they do not pass, even with relatively small population (20k), when I intentionally introduce bugs into the simulation. The two that I tried were:

Multiplying all domestic migration rates by 1.5
Always rounding the immigration rate upward instead of using stochastic rounding

ihmeuw / vivarium_census_prl_synth_pop