Closed buzden closed 11 months ago
I feel myself a bit instusive, but (as in some other your projects) I suggest myself to be added to the collaborators for this project in order to be able to merge such fixes without interrupting you, @stefan-hoeck.
I've added another small fix around coverage checking to this PR in a separate commit, they are somewhat related
The last two bits are not so trivial, then the first two.
Currently coverage checks with early termination are run using the default settings with minimal tests count being defaultMinTests
, i.e. 100, and with usual size settings, i.e. size is set to 0 at the beginning, increasing till 100 on each test. This leads to a strange result that condifence checks are performed at minimal sizes of tests half of the check, which leads to wrong statistics being collected, since most of the testing to achieve condifence in coverage is performed on maximal sizes of the tests, but the very first decision is taken based on values generated using small sizes. Moveover, this is not tunable at all, which leads to inability to check distribution correctness, because, this basically leads us to be able to achieve good distribution on any given size parameter, which is not always possible.
That's why I propose to start with normal maxSize
in case we run on the confidence check mode (i.e. set with verifiedTermination
). This is the thrid commit in this PR.
But also, when generated space is realy huge (as I have in one of my examples), the default setting of starting the confidence interval check at 100th test is too limiting, sometimes we just need more. The fourth commit adds an ability to tune this setting (while still preserving the old default).
while still preserving the old default
To be honest, all my experience with the Wilson bounds (mostly with checks of distribution of derived generators in DepTyCheck, but now also with generation data for testing the normalised compression distance) tells that they work nicely only after ~300 samples, not 100 if actual distribution is rather close to the expected bounds (even when bounds are satisfied finally). So, maybe, it is good to set the default to be 300 instead of the current 100.
Sorry for the late response. This looks very good to me. I'll merge it as it is and we can tune them default number of tests in another PR. For me, it would be fine to use a default of 300 or even 500.
Export obviously forgotten to be exported function that turns on coverage check mode