zhxiaokang / RASflow

RNA-Seq analysis workflow
MIT License
104 stars 57 forks source link

visualization.R EnhancedVolcano() #14

Closed rpg18 closed 4 years ago

rpg18 commented 4 years ago

Hi! I realised that the volcano plot function not always plots the most significant genes. The reasons of this limitation are the xlim and ylim parameters from EnhancedVolcano() [lines 82 and 85 visualization.R script]. I modified the xlim and ylim parameters with something like:

extract x/y limits based on the log2FC and padj values from results DESeq2 table

 log2FC_lim <- max(abs(dea.table$log2FoldChange))
  padj_lim <- -log10(min(dea.table$padj)) # NAs already removed from dea.table

redefine xlim and ylim graph coordinates EnhancedVolcano(..., xlim = c(-log2FC_lim-1, log2FC_lim+1), ylim = c(0, padj_lim), ...)

This worked for me using DESeq2. In this way, the user will always get the most significant genes annotated in the volcano plot.

Thank you for your very useful tool.

All the best, Raquel

zhxiaokang commented 4 years ago

Thank you Raquel!

I've updated it according to your suggestion.

But while testing, I found that the FDR calculated by edgeR can be 0 which leads to Inf in y-axis. To avoid that problem, I replaced the 0 FDR with a very low value (100 times smaller than the minimum non-zero FDR).

I'm not sure whether that's also a problem with DESeq2, but I did the same for it just in case there is any 0 padj.

Please refer to this commit