zalando / expan

Open-source Python library for statistical analysis of randomised control trials (A/B tests)
MIT License
334 stars 50 forks source link

If the number of *finite* samples is too small, then the data isn't valid for analysis #236

Closed aaron-mcdaid-zalando closed 6 years ago

aaron-mcdaid-zalando commented 6 years ago

This is to fix a bug which the number of finite samples (non-NaN, non-inf) is too small.

(we should add units tests for this too)

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-0.2%) to 92.283% when pulling ba2b168d83ce297c01afd6b3f9e9f5946bf159d1 on fix-for-when-zero-nonfinite-samples into 20dc15ff1891fb22ee2142817ba5c6a415595469 on master.

aaron-mcdaid-zalando commented 6 years ago

Fixing one bug somehow exposed other bugs :-)

In order to be able to count the number of isfinite entries, I had to coerce the data to np.arrays of float64. On live, I think they are already like that, but the unit tests have some np.arrays of object; so I have to change them.

aaron-mcdaid-zalando commented 6 years ago

... on second thoughts, seems like I can just adjust the test data slightly

aaron-mcdaid-zalando commented 6 years ago

I guess we should add a test for this, and also perhaps move this logic into _is_valid_for_analysis?

aaron-mcdaid-zalando commented 6 years ago

@gbordyugov , this is related to the ZeroDivisionError ticket

aaron-mcdaid-zalando commented 6 years ago

... to clarify my last comment, I should clarify that the TypeError happens in a different context and is not resolved by this PR. But that error did make me think that we need to be clearer about our types in our softwar

There are other approaches to the type system, I don't like this solution exactly, but this is one possible solution

aaron-mcdaid-zalando commented 6 years ago

We should add more unit tests for this issue. But we'll release now