maccman / abba

A/B testing framework
MIT License
1.35k stars 70 forks source link

reported confidence can be incorrect/misleading #3

Open showard opened 11 years ago

showard commented 11 years ago

Full disclosure: I authored the original ABBA library.

This looks like a great library for people who want to run A/B tests themselves, I haven't seen any other package that takes care of the whole stack and makes it this easy.

However, I'm wary of the statistics here, for a few reasons:

As an example of the potential for numerical accuracy issues, consider an experiment with 6/500 conversions in baseline and 20/500 conversions in one variation. The package reports 99.9% confidence, or a one-tailed p-value <= 0.001. Fisher's Exact Test gives a one-tailed p-value of 0.0042. The original ABBA gives a two-tailed p-value of 0.0063, corresponding to a one-tailed p-value of roughly 0.0033. So we're underestimating the one-tailed p-value by a factor of 3-4 and the two-tailed p-value (which is probably more appropriate) by a factor of 6-8. This is pretty substantial -- our long-run false-positive rate will be 6-8x higher than we expect (ignoring multiple testing issues).

I think this would be a really awesome contribution to the world of A/B testing with some more robust statistics. I'd suggest using the (original) ABBA JS library (or perhaps a port of the Python version to Ruby), which also gets you some nice confidence intervals on proportions and improvements. Together the two would make a pretty sweet solution to do-it-yourself A/B testing.