kassambara / ggpubr

'ggplot2' Based Publication Ready Plots
https://rpkgs.datanovia.com/ggpubr/
1.13k stars 165 forks source link

How do I 'automate' ggpubr functions and also how do I get comparisons between subgroup levels of groups on the x-axis using rstatix? #297

Open jaydoc opened 4 years ago

jaydoc commented 4 years ago

Not really an issue, but a bunch of questions/requests.

  1. Suppose I had a number of variables for which I would require the same analysis and similar looking plots, is there a pipe-friendly way using ggpubr and rstatix to do it all using one command/script rather than having to change the name of the y variable, plot title, and y-axis label every time?

  2. Using the toothgrowth dataframe,

Create a box plot bxp <- ggboxplot( df, x = "dose", y = "len", color = "supp", palette = c("#00AFBB", "#E7B800") )

Add p-values onto the box plots stat.test <- stat.test %>% add_xy_position(x = "dose", dodge = 0.8) bxp + stat_pvalue_manual( stat.test, label = "p", tip.length = 0 )

Additional statistical test stat.test2 <- df %>% t_test(len ~ dose, p.adjust.method = "bonferroni")

Add p-values of stat.test and stat.test2

  1. Add stat.test stat.test <- stat.test %>% add_xy_position(x = "dose", dodge = 0.8)

bxp.complex <- bxp + stat_pvalue_manual( stat.test, label = "p", tip.length = 0 )

  1. Add stat.test2 Add more space between brackets using step.increase stat.test2 <- stat.test2 %>% add_xy_position(x = "dose")

bxp.complex <- bxp.complex + stat_pvalue_manual( stat.test2, label = "p", tip.length = 0.02, step.increase = 0.05 ) + scale_y_continuous(expand = expansion(mult = c(0.05, 0.1)))

This creates a plot with comparisons between the supp subgroups of each dose level as well as between the entire dose groups themselves?

What would I need to do if I wanted comparisons between say, the OJ supp subgroup of the 0.5 dose level and the OJ subgroup of the 1 dose level?

kassambara commented 4 years ago

Do the following ressources help?

jaydoc commented 4 years ago

this link answers some of what I want. But not sure if it will work when there are two grouping variables, say species and another variable that is used for the fill function.

I will try it out and post a reproducible example if I can’t find a solution!

Once again, this package is a life saver for someone like me in biomedical research who knows enough R to realize hope much easier typing commands is vs. clicking menus, but not enough to program something that makes life so much easier! Thanks a lot.

jaydoc commented 4 years ago

Could you look at this example?

data

example.df <- data.frame( genotype = sample(c("A", "B","C","D"), 50, replace = TRUE), oxygen = sample(c("Yes", "No"), 50, replace = TRUE), value1 = rnorm(50, 100, 5), value2=rnorm(50, 10,5), value3 = rnorm (50, 25, 5))

example plot

ggboxplot(example.df, x = "genotype", y = "value1", color = "oxygen", bxp.errorbar = TRUE, palette = "jco")

example stat test

stat.test <- example.df %>% group_by(oxygen) %>% dunn_test(value1~genotype) %>% adjust_pvalue(method = "bonferroni") %>% add_significance("p.adj")

What I want to accomplish is to write a script that will

  1. Get separate boxplots for each numeric variable with genotype on x-axis grouped by oxygen

  2. Plot the significance values calculated using dunn test for comparisons between certain genotype/oxygen subgroups. For example,

I don't have much knowledge of R to do this using functions such as pivot_long and would greatly appreciate any help.

kassambara commented 4 years ago

Is something like below that you want to achieve for each variable?

suppressPackageStartupMessages(library(rstatix))
suppressPackageStartupMessages(library(ggpubr))

# Data preparation
set.seed(1234)
df <- data.frame(
  genotype = as.factor(sample(c("A", "B","C","D"), 50, replace = TRUE)),
  oxygen = as.factor(sample(c("Yes", "No"), 50, replace = TRUE)),
  value = rnorm(50, 100, 5), value2=rnorm(50, 10,5), value3 = rnorm (50, 25, 5))

# Data visualization
bxp <- ggboxplot(
  df, x = "genotype", y = "value", color = "oxygen", palette = "jco",
  bxp.errorbar = TRUE
  )

# Comparing groups at each x-position
# Group by genotype and then compare oxygen groups yes vs no
stat.test1 <- df %>%
  df_group_by(genotype) %>%
  t_test(value ~ oxygen) %>%
  adjust_pvalue(method = "bonferroni") %>%
  add_significance("p.adj")
stat.test1
#> # A tibble: 4 x 11
#>   genotype .y.   group1 group2    n1    n2 statistic    df     p p.adj
#>   <fct>    <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl> <dbl>
#> 1 A        value No     Yes        2     5    -0.105  1.09 0.932     1
#> 2 B        value No     Yes        8     7    -0.779 12.9  0.45      1
#> 3 C        value No     Yes        5     6    -0.343  8.63 0.74      1
#> 4 D        value No     Yes       11     6     0.923 13.3  0.372     1
#> # … with 1 more variable: p.adj.signif <chr>
# Add p-values to the plot
stat.test1 <- stat.test1 %>% add_xy_position(x = "genotype", dodge = 0.8)
bxp <- bxp + stat_pvalue_manual(stat.test1,  label = "p.adj", tip.length = 0)
bxp


# Parwise comparisons of groups on x-axis within
#  another grouping variables
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# Group by oxygen and then compare genotypes
# Pairwise p-values are automatically adjusted by groups
stat.test2 <- df %>%
  group_by(oxygen) %>%
  t_test(value ~ genotype, p.adjust.method = "bonferroni")
stat.test2
#> # A tibble: 12 x 11
#>    oxygen .y.   group1 group2    n1    n2 statistic    df     p p.adj
#>  * <fct>  <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl> <dbl>
#>  1 No     value A      B          2     8   0.153    1.14 0.901     1
#>  2 No     value A      C          2     5   0.154    1.13 0.901     1
#>  3 No     value A      D          2    11  -0.209    1.12 0.866     1
#>  4 No     value B      C          8     5   0.00188 10.5  0.999     1
#>  5 No     value B      D          8    11  -1.05    15.9  0.311     1
#>  6 No     value C      D          5    11  -1.08    11.2  0.303     1
#>  7 Yes    value A      B          5     7  -0.0624  10.0  0.951     1
#>  8 Yes    value A      C          5     6   0.298    8.11 0.773     1
#>  9 Yes    value A      D          5     6   0.642    9.00 0.537     1
#> 10 Yes    value B      C          7     6   0.330    9.66 0.749     1
#> 11 Yes    value B      D          7     6   0.646   11.0  0.531     1
#> 12 Yes    value C      D          6     6   0.206    8.96 0.842     1
#> # … with 1 more variable: p.adj.signif <chr>
# Show box plots with p-values
stat.test2 <- stat.test2 %>% add_xy_position(x = "genotype")
bxp +
  stat_pvalue_manual(
    stat.test2, label = "p.adj",
    color = "oxygen", step.group.by = "oxygen",
    tip.length = 0, step.increase = 0.5, bracket.nudge.y = 10
    )


# Keep only comparisons of interest
# between (oxygen - yes) of genotype A and B,
# and between (oxygen - yes) of genotype C and D
stat.test2.filtered <- stat.test2 %>%
  add_xy_position(x = "genotype", step.increase = 0) %>%
  mutate(comparisons = paste0(group1, "-", group2)) %>%
  filter(oxygen == "Yes" & comparisons %in% c("A-B", "C-D"))
bxp +
  stat_pvalue_manual(
    stat.test2.filtered, label = "p.adj",
    color = "oxygen",
    tip.length = 0, step.increase = 0,
    bracket.nudge.y = 8
  )

Created on 2020-06-28 by the reprex package (v0.3.0.9001)