Closed joshuaehill closed 1 year ago
It seems like my collapsing of some commits got a little out of hand. In any case, this is ready for merging.
OK, I think that I fixed the unsightly Git problems with prodigious use of the Git's reflog and force-pushing, so the backend Git history should at least make sense. The code itself hasn't changed in a week (I haven't made any actual changes to the code since 2023-08-10 20:22:50 UTC).
Sorry for any confusion resulting from my Git ineptitude! :-)
I was interested in being able to do larger-scale simulations for establishing the restart cutoff (
X_cutoff
), and when I implemented this functionality, it became clear that the existing codebase could be much faster. I restructured the code and simplified the logic to get faster simulations.Here, we simulate a "worst case" for the restart sanity test. This is "worst case" in the sense that the adopted distribution results in the largest acceptable collision bound for a given assessed entropy level, so if a data sample fails this test, it is likely to indicate an underlying problem. This "worst case" uses the "inverted near-uniform" family; see Hagerty-Draper "Entropy Bounds and Statistical Tests" for a full definition of this distribution and justification for its use here. This distribution has as many maximal probability symbols as possible (each occurring with probability p), and possibly one additional symbol that contains all the residual probability.
The results should be broadly equivalent to the prior code, but it runs much faster. (Note, this is a stochastic process, so very small variations in the cutoff value are expected).