kassambara / ggpubr

'ggplot2' Based Publication Ready Plots
https://rpkgs.datanovia.com/ggpubr/
1.13k stars 165 forks source link

Filtering facet.by plots by significant pvalues #122

Closed tiagochst closed 6 years ago

tiagochst commented 6 years ago

Is it possible to have a function to keep only those ggboxplots that have a significant comparison when using facet.by ?

For example, if a have a list of genes from a given pathway and want to compare the expression of all those genes between some groups. Since the number of genes might be too big, is it possible to add an option to keep only genes that a comparison presents significant p-values?

For example, I wanted to show only CACNA1E in the plot below: screenshot from 2018-10-19 16-42-54

  p <- ggboxplot(as.data.frame(exp), 
                 x = "oncogene", 
                 y = "value",
                 facet.by = "external_gene_name",
                 color = "oncogene", 
                 add = "jitter")  +
    stat_compare_means(comparisons = my_comparisons, method.args = list(alternative = "greater"))
kassambara commented 6 years ago

Hi,

I would suggest the following procedure

  1. Perform a differential expression analysis between group to keep only significant genes (using limma)
  2. Visualize some of key genes differentially expressed (using ggpubr)

If you want to do the filtering process in ggpubr, you can go as follow.

  1. Load packages:
library(tidyverse)
library(ggpubr)
  1. Prepare some data:
# Prepare some data
df <- iris %>%
  as_tibble() %>%
  gather(key = "gene", value = "expression", -Species) %>%
  rename(group = Species)
df
# A tibble: 600 x 3
   group  gene         expression
                  
 1 setosa Sepal.Length        5.1
 2 setosa Sepal.Length        4.9
 3 setosa Sepal.Length        4.7
 4 setosa Sepal.Length        4.6
 5 setosa Sepal.Length        5  
 6 setosa Sepal.Length        5.4
 7 setosa Sepal.Length        4.6
 8 setosa Sepal.Length        5  
 9 setosa Sepal.Length        4.4
10 setosa Sepal.Length        4.9
# ... with 590 more rows
  1. Perform Anova to filter out not significant genes (Anova adjusted p-value > 0.05)
res.stats <- compare_means(expression ~ group, group.by = "gene", data = df, method = "anova") %>%
  filter(p.adj > 0.05)
  1. Visualize some of significant genes