Add bootstrap statistical tests

JosephLalli commented 3 years ago

Hi Florian,

I hope I'm doing this right! This first pull request is focused on adding a bootstrapping statistical test. The user supplies 'n_bootstraps' to add_stat_annotation function, and can either specify 'bootstrap' or 'paired_bootstrap' as the test.

In tests.py, I have implemented these two tests in the if/elif switch. I have also added custom functions that implement these tests. If you know of a third party package that implements these better than I have, I'm happy to use that package instead.

'bootstrap' draws n_bootstraps samples with replacement from box_data_a and box_data_b. Each sample is of the same size as the originating dataset. The mean value is calculated for each sample. I compare samples_a>samples_b, which returns 1 if true and 0 if false. The average of this comparison is the percentage of samples for which the comparison is true. I do this for a < b, take the min value, and double it (because we are performing a two-sided test). 'paired_bootstrap' does the same thing, except that we calculate sample_a - sample_b and determine the number of times the difference is greater or less than 0.

I have tried to add appropriate documentation. I have supplied a dataset from a project I am about to publish that uses statannot throughout, and added examples to the example.ipynb file to show that these functions work and are more highly powered than Mann-Whitney nonparametric tests. In doing so I regenerated your example pngs, so those 'changes' are not real changes.

You can expect several more pull requests like this in the upcoming days.

Thanks, Joe Lalli

JosephLalli commented 3 years ago

Also, I nearly forgot, one of my examples uses a hue value, and dodges the hues. This is because I like to just create a kwarg variable 'fig_args' and give the kwargs to both seaborn and statannotation. This works well most of the time, but I was getting an error due to 'dodge' not being a kwarg for add_stat_annotation.

Statannotation generates a figure w/ a hue using 'dodge=True' as default, so I have just added dodge as a potential kwarg and set the default to True.

trevismd commented 3 years ago

Hi Joe, As you can see, I had some time available now and integrated your contribution on axes coordinates.

I see the value of the PR, of course, but also two problems:

It adds complexity to add_stat_annotation and its interface, which are already overloaded, and I am working on trying to simplify it.
Implementing statistical tests requires specific validation, testing, maintenance, that go beyond the current scope of this package and why it is used. This opinion was shared by users/contributors of statannot too.
(Also note that this covers bootstrapping of mean but it could be interesting to prepare support for other functions)

It could however be a good fit for using the possibility of working with other functions than those already supported.

Your functions could be made available in several ways:

We could include them in statannotations as examples in the documentation notebook, or
You could make gists or, to be more convenient,

Make a new repository with the functions, say joe-stats, including a function (similar to StatTest.from_library) like

def for_statannotations(func_name, **func_kwargs):
if func_name == "bootstrap_mean":
    n_bootstraps = func_kwargs.pop(n_bootstraps, None)
    if n_bootstraps is None:
        raise ValueError("Missing `n_bootstraps` parameter)

     return StatTest(bootstrap_mean, 
                     test_long_name="Non-parametric bootstrapped two-sided comparison", 
                     test_short_name = 'bootstrap', n_bootstraps=n_bootstraps)

Users could then do

from joestats import joestats

bootstrap_mean = joestats.for_statannotations("bootstrap_mean", n_bootstraps = 1e3)
ax = sns.boxplot(...)
add_statannotation(..., test=bootstrap_mean, ...)

What do you think?

Also, to track your contribution, you are welcome to submit a separate PR for the dodge parameter so you can get the credits. Otherwise, I'll do it and refer to this.

edit: a typo

trevismd commented 3 years ago

Hey Joe, I hope you understand why I'm closing this PR request.
Please do suggest this material in another form as I suggested above.
As example for one of the ideas, I also made a package for permutation-based statistics (permutation-stats, https://github.com/trevismd/permutations-stats), and I made a gist to show how to use it with statannotations(https://gist.github.com/trevismd/f556d83f6efdad249f995eb65daeb1d9).

trevismd / statannotations

Add bootstrap statistical tests #4