kassambara / rstatix

Pipe-friendly Framework for Basic Statistical Tests in R
https://rpkgs.datanovia.com/rstatix/
432 stars 51 forks source link

Filtering stats results breaks add_xy_position #197

Open s-andrews opened 8 months ago

s-andrews commented 8 months ago

If I want to plot the stats results for only the significant results from a stats test I can filter the output tibble before running add_xy_position, however this results in the y positions being spaced inconsistently rather than all the brackets being evenly separated.

For example from the following data:

library(tidyverse)
library(rstatix)
library(ggpubr)

set.seed(1)

tibble(
  A = rnorm(10,mean=1),
  B = rnorm(10,mean=1),
  c = rnorm(10,mean=3),
  D = rnorm(10,mean=5),
  E = rnorm(10,mean=1)
) %>%
  pivot_longer(
    cols=everything(),
    names_to="group",
    values_to="value"
  ) %>%
  filter(!is.na(group)) -> data

If I do a test on all groups and calculate the y position all values are equally spaced:

data %>%
  tukey_hsd(value~group) %>%
  add_xy_position() -> stats_all

# All the differences are the same
diff(stats_all$y.position, lag=1)

# Gives [1] 0.4788 0.4788 0.4788 0.4788 0.4788 0.4788 0.4788 0.4788 0.4788

...and the plot shows equal spacing:

data %>%
  ggplot(aes(x=group, y=value)) +
  geom_boxplot() +
  stat_pvalue_manual(stats_all)

However if I filter for only significant results then I get unequal spacing:

data %>%
  tukey_hsd(value~group) %>%
  filter(p.adj<0.05) %>%
  add_xy_position() -> stats_significant

# Spacings are now different between different comparisons
# Space is left for other comparisons even though they're
# not there
diff(stats2$y.position, lag=1)

# Gives [1] 0.4788 0.9576 0.4788 0.9576 0.4788 0.4788

..the plot is therefore also unequal

data %>%
  ggplot(aes(x=group, y=value)) +
  geom_boxplot() +
  stat_pvalue_manual(stats_significant)

I should also note that the same effect exists if you use the hide.ns option to stat_pvalue_manual but this is likely a different issue with a different fix.

SamGG commented 6 months ago

Hi, I got the same issue that is due to the fact that filtering occurs after positions have been computed. My current workaround is to explicitly set the comparisons.

data %>%
  tukey_hsd(value~group) %>%
  filter(p.adj<0.05) %>%
  add_xy_position(comparisons = with(., Map(c, group1, group2))) -> stats_significant
diff(stats_significant$y.position)

There should be a better writing than my poor knowledge of tidyr.

@kassambara In add_y_position function, I think comparisons should be set before https://github.com/kassambara/rstatix/blob/360cda40bd22e80bce19ed63fbadfc4a9e52ce23/R/get_pvalue_position.R#L190 which should avoid the join/merge call.