trevismd / statannotations

add statistical significance annotations on seaborn plots. Further development of statannot, with bugfixes, new features, and a different API.
Other
619 stars 67 forks source link

How to plot facet grid with hue argument? #135

Open JohannesWiesner opened 9 months ago

JohannesWiesner commented 9 months ago

Hi! I am not sure if I am just doing something wrong that's why I am opening another issue here. However, this is related to #120.

I would like to use a FacetGrid plot to give my figure more structure. Specifically, I have a data frame where I would like to test for receptor expression differences between brain regions of interest and non-regions of interest (using the hue-argument). I would like to do this for n different receptors (r1, r2, ... rn). On top, receptors can be assigned to more broad receptor groups which should be visualized as different subplots within my facet grid (one column for each receptor group, but a maximum of 2 columns). See this beautiful hand-made preview:

github

And here's my dataframe:

expression.csv

Is it possible to achieve this with statannotations? I am not sure, because the example does not include the hue-argument and I am not sure if this creates problems. I tried

annot = Annotator(None, pairs)

g = sns.FacetGrid(expression_long, col='receptor_group', height=12, sharey=False)

plot_params = {'x':'expression',
               'y':'receptor',
               'hue':'roi',
               'hue_order':['roi','non-roi'],
               'orient':'h'}

pairs = [((receptor,'roi'),(receptor,'non-roi')) for receptor in expression_long['receptor'].unique()]

g.map_dataframe(annot.plot_and_annotate_facets,
                plot='boxplot',
                plot_params=plot_params,
                configuration={"test": "Mann-Whitney"},
                annotation_func="apply_test")
plt.show()

but this gives me:

ValueError: Missing group valueCHRM1in receptor (specified inpairs)

trevismd commented 9 months ago

Yes, statannotations works well with the hue argument in FacetGrid too but pairs are defined at plot level, so the xand hue should be the same across plots, which is not the case for you here. The "today" solution for you would perhaps be to define subplots to create your desired layout and then use the "regular" plot + stannanotations on each subplot as you'll have different pairs to compare in each one. (See this post https://www.statology.org/seaborn-subplots/) Something like this:

annot = Annotator.get_empty_annotator()
plot_params = {
    'x':'receptor',
    'y':'expression',
    'hue':'roi',
    'hue_order':['roi','non-roi'],
}
receptor_groups = expression_long['receptor_group'].unique()
sns.color_palette("Paired")
with sns.plotting_context("paper"):
    fig, axes = plt.subplots(4, 2, figsize=(20,  30))
    for ax_row_idx, ax_row in enumerate(axes):
        for ax_col_idx, ax in enumerate(ax_row):
            ax_idx = ax_row_idx * 2 + ax_col_idx
            if ax_idx >= len(receptor_groups):
                ax.set_axis_off()
                continue
            ax_group = receptor_groups[ax_idx]
            expression_long_group = expression_long.loc[expression_long.receptor_group==ax_group, :]
            group_receptors = expression_long_group['receptor'].unique()

            sns.boxplot(ax=ax, data=expression_long_group, **plot_params)
            annot.new_plot(
                ax,
                data=expression_long_group,
                pairs=[((receptor,'roi'),(receptor,'non-roi')) for receptor in group_receptors],
                plot='boxplot',
                **plot_params
            ).configure(test="Mann-Whitney").apply_and_annotate()

            ax.set_title(ax_group)
            if len(group_receptors) > 10:
                ax.set_xticklabels(labels=ax.get_xticklabels(), rotation=45)
plt.show()

Which results in this approximation of your diagram :) Tweaking spacing and legends, groups ordering, maybe using the last row for your larger group (look for add_subplot) should enable you to get there though.

expression

JohannesWiesner commented 9 months ago

Perfect, thanks so much for the code! Then only issue that I see right now, is that the multiple comparisons correction is now done within each group and not over all receptors right?

trevismd commented 9 months ago

Of course!

This is correct, but it is also the case with plot_and_annotate_facet (I should make that clearer).

Depending on the correction method, you can fix this by either

JohannesWiesner commented 9 months ago

This is correct, but it is also the case with plot_and_annotate_facet (I should make that clearer).

Ah, interesting! Yes, I think making that clearer would help a lot :)

passing a num_comparisons option (like for Bonferonni)

That sounds like a good idea, but would only work for methods, that do not need to know all the p-values before-hand right?

JohannesWiesner commented 8 months ago

Of course!

This is correct, but it is also the case with plot_and_annotate_facet (I should make that clearer).

Depending on the correction method, you can fix this by either

  • passing a num_comparisons option (like for Bonferonni) or
  • running the stats beforehand and then use set_pvalues instead on each subplot. In that case, you'll have to

    1. Compute all the pairs you use in the plots
    2. Plot a chart with receptors of all groups, but using the pairs described above
    3. Collect the pvalues for each pair
    4. Use these when you're making the "real" plot as drafted above.

Would love if this would work out-of-the-box! The general idea here is that you often want to plot stuff using facet_grid for better readability but you don't want the multiple comparison to be done within each subplot.