Structure for the tests

maximtrp / scikit-posthocs

Multiple Pairwise Comparisons (Post Hoc) Tests in Python

https://scikit-posthocs.rtfd.io

MIT License

351 stars 40 forks source link

Structure for the tests #8

Open raamana opened 6 years ago

raamana commented 6 years ago

Hi Maksim!

Nice package, thanks for sharing! I am trying to use it and hoping to contribute.

Before I use it, I'd like to understand how it is tested. For example, if you look at https://github.com/maximtrp/scikit-posthocs/blob/4709cf2821ce98dffef542b9b916f7c4a5f00ff4/tests/test_posthocs.py#L27

or any other tests, you seem have to pre-selected the outputs/results to compare.. Where do they come from? From the PMCMR R-package? I tried to look into the PMCMR package to look at their tests - I can't see them in the package distributed, do you know if it is tested?

What are the other principles behind the tests you wrote?

maximtrp commented 6 years ago

Hey! I am using PMCMRplus as a reference package and two datasets (seaborn's exercise, a custom one for block design data, and one more for Tukey HSD) that are initialized in lines #11, #12, and #139. As can be seen in the code, pulse and kind columns of exercise dataset are used as a value and group column, respectively. Block dataset is converted to a DataFrame (rows are blocks, and cols are treatments or groups). The results, you have noticed to be pre-defined, were obtained with PMCMRplus.

You are welcome to make pull requests!

raamana commented 6 years ago

Thanks - I tried to check into the source of PMCMRplus and PMCMR, and they do not seem to contain any tests at all. Testing against non-tested package doesn't give me comfort.

I am trying to think of developing some ground truth tests.. Let me know if you have thought about it already, and have some ideas.

maximtrp commented 6 years ago

PMCMRplus contains references to scientific articles as well as sample datasets from literature (such as Sachs, 1997). It would be nice to move scikit-posthocs tests to these data. I do not have much time to do it now, though. If you can help, it would be great.

raamana commented 6 years ago

I see.. Happy to help, but its not clear if the R package is tested thoroughly. will keep you posted.

maximtrp commented 6 years ago

We can look through the articles and take the results provided by researchers as references for tests. It is a huge work, though

raamana commented 6 years ago

I know.. I guess we can start with some sanity checks, and work our way towards more complicated tests. I will open another PR soon.

maximtrp commented 6 years ago

Thank you for your contribution

raamana commented 6 years ago

Take a look at some super-basic likely-wrong initial attempt: https://github.com/raamana/scikit-posthocs/blob/cb06d223b88a8d4d8fec949498a89f9141131009/tests/test_properties_posthocs.py#L50

maximtrp commented 6 years ago

Thank you. Actually, the problem we should deal with first is correctness of implementation. Your code tries to check a test by playing with type I errors (false positives), which are related to statistical value of a test, but not its implementation quality. Though, this is not absolutely wrong, and it may be kept as a fallback unit test if no valid examples of a post-hoc test usage are found. Also, we should use an acceptable and established way to create random samples.

Here are some steps I have summarized:

1) Check if a test is implemented correctly (we have to find a validated example, probably provided by authors, or verify code by comparing it to the mathematics given in a scientific source). 2) Check code performance and optimize it.

raamana commented 6 years ago

Hi! I agree with you on the motivation. That’s definitely the long term plan - what I provided is a first pass of a rough draft attempting to get there.. at the end, we must have multiple of types of tests, both unit and integration tests (against various expected behaviours), as well as for specific properties. what you propose takes lot of effort, and I’d be happy to contribute when I can.

Speed isn’t a concern for me typically, unless it’s too slow to be a dealbreaker. I try to get the implementation correctly first, and then if the speed is not satisfactory, we can certainly try to profile the code and speed it up. I wouldn't worry about it all unless we are forced to.