kevinblighe / EnhancedVolcano

Publication-ready volcano plots with enhanced colouring and labeling
399 stars 81 forks source link

Noise around the midline #73

Closed soren-rand closed 3 years ago

soren-rand commented 3 years ago

Hi Kevin

First of all I wanna say a big big thank you. I am quite new to analyzing data in both R and the terminal and find it difficult at times to navigate in and perform analysis. You have created a super easy tutorial with aesthetically satisfying plots! Working my way through all the beginner mistakes, I once again find my self struggling with a problem.

I have done harmonization, imputation and PrediXcan on GWAS summary statistics. With my outputs from PrediXcan I'm trying to present my data in the most sexy manner with the volcano plot. However there seems to be some inconsistency in how the plot is visually presented. Instead of having most genes scattered around the midline creating that "tulip"-shape I have instead managed to create a plot, where most genes are equally distributed on the X-axis and with greater gene accumulation around the midline. Ironically making the plot looking like an actual volcano .. 😄

Any ideas as to what might have happened or how i could fix the problem?

Sending my best wishes from Denmark

Soren

Skærmbillede 2021-03-26 kl  16 01 43
kevinblighe commented 3 years ago

Hey Soren, hello to Denmark!

Hmmm, I have never tried to plot GWAS summary stats with EnhancedVolcano. It seems that the majority of SNPs have p-values near 1.00, but with a broad range of effect size. Is the dataset small? What are the x-axis values? - log odds ratios?

In order to improve visualisation, you may have to filter out many of these, or, try to find some other metric to use for x-axis.

soren-rand commented 3 years ago

Hi Kevin Thanks for your swift reply and help! After our post-imputation I have 11.7 million variants and when I performed PrediXcan it leaves me with approximately 13.000 genes (dependent on the tissue) with altered expression. So the size of the dataset should be in order. My X-axis values are effect size/beta. About filtering, I believe you are absolutely right, I too suspect there is a step of filtering I am missing.. Best wishes!

kevinblighe commented 3 years ago

Cool, any improvement?

soren-rand commented 3 years ago

Sorry for the late reply. I'm still trying to make it work with some help from colleagues. We'll try with different versions of tissues on GTEx and otherwise see if we can replicate others Volcano plot. I'll keep you posted if I find the solution. Thanks so much for your attention :-)

kevinblighe commented 3 years ago

I may close this for now, but please re-open if needed. The issue, as I see it, is that the input data is from a non-native format, resulting in a skewed volcano plot. Volcanos were originally developed for gene expression data, if I am to be believed.

Kevin