Closed daryadedik closed 6 years ago
Note: I am still a bit concerned about the correction we need to perform for multiple testing again (since chi-square is also a statistical test). If we log and peek into the results again, we need to correct alpha again. Maybe we can avoid this by not conducting the statistical test, but rather by introducing the threshold. But I will investigate which threshold will be meaningful.
We could consider an alternative approach entirely. We could simply arrange to store the raw data (the number of assignments) and then do the statistical analysis in some external pipeline connected to our monitoring tool. We might want to do a lot of debugging and experimenting with different analysis methods for this data.
Then, in the longer term, when we understand this issue better and we understand the multiple testing issue, we can consider how much of the analysis can be added into expan
.
In general, I think we should "gather good data first, ask questions later"
In decided to stick with the chi-square test goodness of fit test, because it's very straightforward (you just need a variant column and expected variant weights) and the other: if you test multiple times it's easy to correct with our correction methods: just collect p values and call method from out correction module and get the new alpha to make a decision.
@shansfolder @gbordyugov @aaron-mcdaid-zalando please review or approve, I need it for further development.
@daryadedik it looks good from my side, but I would suggest to wait for a second reviewer for such new functionality. ;)
@shansfolder sure.
Is this true?:
This adds two new function, chi_square_test_result_and_statistics
and chi_square
, but those tests are not called anywhere (except in the tests).
If that is true, then this can be safely merged. I'm not in a position to fully review the maths, but I'll give a thumbs up if the above is true
@aaron-mcdaid-zalando chi_square is called inside of chi_square_test_result_and_statistics. The letter is not called anywhere in expan, except tests
:+1:
(feel free to make more changes, and ping me again for a thumbs up)
@aaron-mcdaid-zalando let me know if you are ok with the changes.
:+1:
I have removed the previous chi-square test of homogeneity, since this is not exactly what we need.
A test of homogeneity tests whether two (or more) sets of multinomial counts come from different sets of population proportions. Or it tries to find out whether two different unknown populations have the same distribution as each other. A goodness of fit test is for testing whether a set of multinomial counts is distributed according to a pre-specified set of population proportions. The second is what we need: we try to answer the question whether our data or, more specifically, our variants splits consistent with the expected proportions we claim.
If you have comments or doubts feel free to approach and discuss.