kevinblighe / EnhancedVolcano

Publication-ready volcano plots with enhanced colouring and labeling
399 stars 81 forks source link

zero pvalues after lfcshrink #14

Closed MichaelPeibo closed 5 years ago

MichaelPeibo commented 5 years ago

Hi, Very nice function! I tried to use res(from DESeq2) to do volcano plot, however, I found there is difference between lfcshrink and not shrinked. See below volcano indeFilter_F_lfcshrink.pdf volcano indeFilter_T_notlfcshrink.pdf

Any suggestion which one I should represent for a publication-ready plot? Thanks!

kevinblighe commented 5 years ago

Hey, this question may be better for the Bioconductor forum: https://support.bioconductor.org/t/Latest/

The lfc-shrunk fold changes look more realistic. However, I notice that you are also plotting the adjusted p-values on the y-axis? It may be better to plot the nominal p-values. Did you try that?

kevinblighe commented 5 years ago

Posted on Bioconductor: https://support.bioconductor.org/p/119564/

MichaelPeibo commented 5 years ago

Hi @kevinblighe Thanks for helping! Nominal p-values looks better! Also I am confused some missing labels of 'red' gene points. https://github.com/kevinblighe/EnhancedVolcano/issues/5#issuecomment-478214805 Any suggestion? Thanks!

kevinblighe commented 5 years ago

Hey, those are most likely NA p-values. DESeq2 will set, to NA, the p-values for genes that have failed independentFiltering or cooksCutoff.

MichaelPeibo commented 5 years ago

Hi, I wonder whether those genes have biological meaning and I should check them or not; Or should I remove them by setting na.rm=True somewhere?

kevinblighe commented 5 years ago

Hey, this is a question unrelated to EnhancedVolcano. EnhancedVolcano just plots the data that you provide to it. The genes that fail independentFiltering or cooksCutoff may have high variance, for whatever reason. You can remove them before using EnhancedVolcano, or you can just let the function automatically remove them (actually, it is the ggplot2 'engine' that removes them).

radlinsky commented 5 years ago

Hey, those are most likely NA p-values. DESeq2 will set, to NA, the p-values for genes that have failed independentFiltering or cooksCutoff.

Hello @kevinblighe I am also seeing missing labels, but in this case, the genes with missing labels have a non-NA s-value. It appears to randomly not plot a few gene labels, with no rhyme or reason?

E.g. missing NRIP3 label in the attached plot, which I generated from a data.frame. I am re-creating the issue here with just the first 8 rows of my dataframe (original had 19k)

MissingLabelExample

EnhancedVolcano(data
                  lab = data$hgnc_symbol,
                  x = 'log2FoldChange',
                  y = 'svalue',
                  pCutoff=0.001,
                  ylab = bquote(~-Log[10]~italic(S)~"-value"))

Attached is the my example data, note NRIP3 is missing from the labels.

first8.txt

I'm running EnhancedVolcano from conda: https://anaconda.org/bioconda/bioconductor-enhancedvolcano (currently says version 1.2.0)

kevinblighe commented 5 years ago

Hey radlinsky, yes, EnhancedVolcano will only fit as many labels as it possibly can into the plot space. This prevents the figure from becoming 'over-crowded' with labels. Looking at your plot, you can likely fit more by setting drawConnectors = TRUE

Version 1.2.0 is the new version.