kevinblighe / EnhancedVolcano

Publication-ready volcano plots with enhanced colouring and labeling
397 stars 81 forks source link

Quantities of DEGs (Red Dots) in Plot Do Not Match Known DEG Quantites in Dataset #109

Open cdixo opened 1 year ago

cdixo commented 1 year ago

Hello!

First off, thank you for such a great program to plot our transcriptomic data. I am experiencing an issue where I have a known number of DEGs (10 up and 11 down) (which are called via DESeq2 earlier on in my script and produce an output excel file (attached)) in a dataset, however, when I attempt to plot this data using the enhancedVolcano function, I receive different numbers (8 up and 5 down) (image attached). However, in running the same exact code for analysis of a different comparative dataset within the same experiment (it is in a loop btw), it does in fact plot the expected number of correct red dots (DEGs) (image not included). I have found that for my datasets which possess small enough quantities of DEGs to manually count the dots (between 10 and ~60), half are correct while the other half are not. Many of my comparisons are hundred or thousands of genes though, so these analysis are not able to be counted but I am concerned if they are accurate or not as well. My padj I am attempting to use as a cutoff is 0.05 and my FC is 2 (same exact parameters as was used when creating the output excel file from DESeq2).

I have attached my code below along with the first few lines of my dataset I am feeding in to the command.

Could you please help me figure out how to resolve this issue?

Thank you very much!

Cullen

First few lines of input data frame ('res'):

    ID  GREM_C_4    GREM_H_4    baseMean    log2FoldChange  IfcSE   stat    pvalue  padj
maker-11-augustus-gene-150.20   maker-11-augustus-gene-150.20   0.00E+00    1.54E+02    76.904112   -23.3273462 3.3838413   -6.893747   5.43E-12    1.10E-07
maker-Un-snap-gene-41.29    maker-Un-snap-gene-41.29    1.85E+02    1.11E+03    646.47729   -2.582155   0.3939573   -6.554403   5.59E-11    5.65E-07
maker-14-augustus-gene-13.33    maker-14-augustus-gene-13.33    0.00E+00    1.39E+02    69.503805   -21.9192253 3.3838916   -6.47752    9.32E-11    6.29E-07
snap-13-processed-gene-82.13    snap-13-processed-gene-82.13    1.86E+04    4.12E+01    9335.580536 8.8200584   1.4665938   6.013975    1.81E-09    9.16E-06
maker-16-augustus-gene-195.32   maker-16-augustus-gene-195.32   5.90E+04    2.32E+02    29615.67186 7.9901411   1.4176479   5.636196    1.74E-08    7.03E-05

enhancedVolcano command:

EnhancedVolcano(res,
                      lab = rownames(res),
                      x = 'log2FoldChange',
                      y = 'pvalue',
                      xlim = c(-8, 8),
                      title = p,
                      pCutoff = 0.05,
                      pCutoffCol = 'padj',
                      FCcutoff = 2, 
                      selectLab=""  
)

Known number of DEGs based on DESeq2 output excel file results:

(10 up, 11 down)

snap-13-processed-gene-82.13    GREM_C_4._higherThan_.GREM_H_4
maker-16-augustus-gene-195.32   GREM_C_4._higherThan_.GREM_H_4
augustus-1-processed-gene-218.4 GREM_C_4._higherThan_.GREM_H_4
augustus-13-processed-gene-83.4 GREM_C_4._higherThan_.GREM_H_4
maker-2-augustus-gene-49.32 GREM_C_4._higherThan_.GREM_H_4
maker-17-augustus-gene-47.25    GREM_C_4._higherThan_.GREM_H_4
maker-16-augustus-gene-131.36   GREM_C_4._higherThan_.GREM_H_4
maker-6-augustus-gene-152.38    GREM_C_4._higherThan_.GREM_H_4
augustus-18-processed-gene-2.0  GREM_C_4._higherThan_.GREM_H_4
maker-4-augustus-gene-13.43 GREM_C_4._higherThan_.GREM_H_4
maker-11-augustus-gene-150.20   GREM_C_4._lowerThan_.GREM_H_4
maker-Un-snap-gene-41.29    GREM_C_4._lowerThan_.GREM_H_4
maker-14-augustus-gene-13.33    GREM_C_4._lowerThan_.GREM_H_4
maker-6-snap-gene-63.44 GREM_C_4._lowerThan_.GREM_H_4
maker-6-snap-gene-163.38    GREM_C_4._lowerThan_.GREM_H_4
maker-7-augustus-gene-11.43 GREM_C_4._lowerThan_.GREM_H_4
augustus-15-processed-gene-187.8    GREM_C_4._lowerThan_.GREM_H_4
maker-6-augustus-gene-165.36    GREM_C_4._lowerThan_.GREM_H_4
snap-9-processed-gene-22.10 GREM_C_4._lowerThan_.GREM_H_4
maker-2-snap-gene-48.42 GREM_C_4._lowerThan_.GREM_H_4
maker-9-augustus-gene-124.46    GREM_C_4._lowerThan_.GREM_H_4

Output graphic:

GREM_C_4_v_GREM_H_4 VolcanoPlot

SusiJo commented 1 year ago

Hi, We encountered a similar problem in some plots that the number of actually DE genes is not matching the dots represented in the static images. @kevinblighe: Could it be that dots that have very similar log2FC and adjusted p-values are overlapping each other and are therefore not shown? Would it be an option to shift the x and y coordinates minimally (jitter) to make them visible or changing opacity in the plots?

cdixo commented 1 year ago

Hi @SusiJo ,

Thank you for responding to my issue - I'm sorry to hear you are experiencing the same issue.

I should have responded to my own help post sooner but my lab and I have resolved what was incorrect with our original code resulting in this issue.

The issue is that the option 'xlim = c(-8, 8)' is actually setting the x-axis limit, not the x-axis size of the image. We had originally thought that this option allowed you to dictate essentially how wide the image should be, however, it is actually telling the program "I want you to only print the x-axis from -8 FC to 8 FC". Because of this, it was cutting off DEGs which contained FC's greater than or less than 8, thus, the number of dots do not match up with the known quantity of DEGs which was expected to be plotted.

Hopefully this helps you resolve your issue as well perhaps.

Best,

Cullen

junli1988 commented 9 months ago

Here's an easy way to get past your problem in the future, set an "xlim" based on the minimum and maximum number on your table: xlim = c(min(data.matrix[[x]], na.rm = TRUE) - 0.5, max(data.matrix[[x]], na.rm = TRUE) + 0.5)

This way you won't have to adjust the axis manually in a loop