Open marcoct opened 6 years ago
The label Research
is indeed an appropriate one, and very interesting! The statistical problem this ticket touches upon is "goodness-of-fit" testing: are samples X drawn from distribution F? The majority of frequentist goodness-of-fit tests in the continuous setting, Kolmogorov-Smirnov, Cramer-von Mises, Anderson-Darling, are (i) univariate, and (ii) based on the CDF representation of the distribution. While the literature on non-parametric techniques for GOF testing is vast (can refer to references if interested, including recent NPB techniques), I do not know of any other methods which are routinely used as statistical tests in related fields.
Here is one idea for the univariate setting, using the differential entropy as a test-statistic:
It should be possible to compute large sample error bounds of the MC estimator, and obtain confidence intervals.
Currently, most of the built-in probabilistic primitives only have their density functions tested (at a test point). The samplers are untested. It should be possible to develop a standard testing scheme that gives confidence that sampler conforms to the corresponding density function, possibly by estimating KL divergences. At a high-level, the design could continue sampling (gaining more confidence in the fidelity of the sampler) until either a sufficient level of confidence is reached, or a timeout is reached (where exceeding the timeout cause a test failure).