Open jaydoc opened 4 years ago
Do the following ressources help?
this link answers some of what I want. But not sure if it will work when there are two grouping variables, say species and another variable that is used for the fill function.
I will try it out and post a reproducible example if I can’t find a solution!
Once again, this package is a life saver for someone like me in biomedical research who knows enough R to realize hope much easier typing commands is vs. clicking menus, but not enough to program something that makes life so much easier! Thanks a lot.
Could you look at this example?
example.df <- data.frame( genotype = sample(c("A", "B","C","D"), 50, replace = TRUE), oxygen = sample(c("Yes", "No"), 50, replace = TRUE), value1 = rnorm(50, 100, 5), value2=rnorm(50, 10,5), value3 = rnorm (50, 25, 5))
ggboxplot(example.df, x = "genotype", y = "value1", color = "oxygen", bxp.errorbar = TRUE, palette = "jco")
stat.test <- example.df %>% group_by(oxygen) %>% dunn_test(value1~genotype) %>% adjust_pvalue(method = "bonferroni") %>% add_significance("p.adj")
Get separate boxplots for each numeric variable with genotype on x-axis grouped by oxygen
Plot the significance values calculated using dunn test for comparisons between certain genotype/oxygen subgroups. For example,
I don't have much knowledge of R to do this using functions such as pivot_long and would greatly appreciate any help.
Is something like below that you want to achieve for each variable?
suppressPackageStartupMessages(library(rstatix))
suppressPackageStartupMessages(library(ggpubr))
# Data preparation
set.seed(1234)
df <- data.frame(
genotype = as.factor(sample(c("A", "B","C","D"), 50, replace = TRUE)),
oxygen = as.factor(sample(c("Yes", "No"), 50, replace = TRUE)),
value = rnorm(50, 100, 5), value2=rnorm(50, 10,5), value3 = rnorm (50, 25, 5))
# Data visualization
bxp <- ggboxplot(
df, x = "genotype", y = "value", color = "oxygen", palette = "jco",
bxp.errorbar = TRUE
)
# Comparing groups at each x-position
# Group by genotype and then compare oxygen groups yes vs no
stat.test1 <- df %>%
df_group_by(genotype) %>%
t_test(value ~ oxygen) %>%
adjust_pvalue(method = "bonferroni") %>%
add_significance("p.adj")
stat.test1
#> # A tibble: 4 x 11
#> genotype .y. group1 group2 n1 n2 statistic df p p.adj
#> <fct> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 A value No Yes 2 5 -0.105 1.09 0.932 1
#> 2 B value No Yes 8 7 -0.779 12.9 0.45 1
#> 3 C value No Yes 5 6 -0.343 8.63 0.74 1
#> 4 D value No Yes 11 6 0.923 13.3 0.372 1
#> # … with 1 more variable: p.adj.signif <chr>
# Add p-values to the plot
stat.test1 <- stat.test1 %>% add_xy_position(x = "genotype", dodge = 0.8)
bxp <- bxp + stat_pvalue_manual(stat.test1, label = "p.adj", tip.length = 0)
bxp
# Parwise comparisons of groups on x-axis within
# another grouping variables
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# Group by oxygen and then compare genotypes
# Pairwise p-values are automatically adjusted by groups
stat.test2 <- df %>%
group_by(oxygen) %>%
t_test(value ~ genotype, p.adjust.method = "bonferroni")
stat.test2
#> # A tibble: 12 x 11
#> oxygen .y. group1 group2 n1 n2 statistic df p p.adj
#> * <fct> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 No value A B 2 8 0.153 1.14 0.901 1
#> 2 No value A C 2 5 0.154 1.13 0.901 1
#> 3 No value A D 2 11 -0.209 1.12 0.866 1
#> 4 No value B C 8 5 0.00188 10.5 0.999 1
#> 5 No value B D 8 11 -1.05 15.9 0.311 1
#> 6 No value C D 5 11 -1.08 11.2 0.303 1
#> 7 Yes value A B 5 7 -0.0624 10.0 0.951 1
#> 8 Yes value A C 5 6 0.298 8.11 0.773 1
#> 9 Yes value A D 5 6 0.642 9.00 0.537 1
#> 10 Yes value B C 7 6 0.330 9.66 0.749 1
#> 11 Yes value B D 7 6 0.646 11.0 0.531 1
#> 12 Yes value C D 6 6 0.206 8.96 0.842 1
#> # … with 1 more variable: p.adj.signif <chr>
# Show box plots with p-values
stat.test2 <- stat.test2 %>% add_xy_position(x = "genotype")
bxp +
stat_pvalue_manual(
stat.test2, label = "p.adj",
color = "oxygen", step.group.by = "oxygen",
tip.length = 0, step.increase = 0.5, bracket.nudge.y = 10
)
# Keep only comparisons of interest
# between (oxygen - yes) of genotype A and B,
# and between (oxygen - yes) of genotype C and D
stat.test2.filtered <- stat.test2 %>%
add_xy_position(x = "genotype", step.increase = 0) %>%
mutate(comparisons = paste0(group1, "-", group2)) %>%
filter(oxygen == "Yes" & comparisons %in% c("A-B", "C-D"))
bxp +
stat_pvalue_manual(
stat.test2.filtered, label = "p.adj",
color = "oxygen",
tip.length = 0, step.increase = 0,
bracket.nudge.y = 8
)
Created on 2020-06-28 by the reprex package (v0.3.0.9001)
Not really an issue, but a bunch of questions/requests.
Suppose I had a number of variables for which I would require the same analysis and similar looking plots, is there a pipe-friendly way using ggpubr and rstatix to do it all using one command/script rather than having to change the name of the
y
variable, plot title, andy
-axis label every time?Using the toothgrowth dataframe,
Create a box plot
bxp <- ggboxplot( df, x = "dose", y = "len", color = "supp", palette = c("#00AFBB", "#E7B800") )
Add p-values onto the box plots
stat.test <- stat.test %>% add_xy_position(x = "dose", dodge = 0.8) bxp + stat_pvalue_manual( stat.test, label = "p", tip.length = 0 )
Additional statistical test
stat.test2 <- df %>% t_test(len ~ dose, p.adjust.method = "bonferroni")
Add p-values of
stat.test
andstat.test2
stat.test <- stat.test %>% add_xy_position(x = "dose", dodge = 0.8)
bxp.complex <- bxp + stat_pvalue_manual( stat.test, label = "p", tip.length = 0 )
step.increase
stat.test2 <- stat.test2 %>% add_xy_position(x = "dose")
bxp.complex <- bxp.complex + stat_pvalue_manual( stat.test2, label = "p", tip.length = 0.02, step.increase = 0.05 ) + scale_y_continuous(expand = expansion(mult = c(0.05, 0.1)))
This creates a plot with comparisons between the
supp
subgroups of eachdose
level as well as between the entiredose
groups themselves?What would I need to do if I wanted comparisons between say, the OJ
supp
subgroup of the 0.5dose
level and the OJ subgroup of the 1dose
level?