zkutalik / ssimp_software

GNU General Public License v3.0
16 stars 10 forks source link

How to interpret Z_reimputed and r2_reimputed // GC lambda #41

Closed drveera closed 6 years ago

drveera commented 6 years ago

Hi I noticed that one of my top SNP (P=6.46e-10, Z=6.181) gets Z value of 5.35 (P=8.5e-08) in the imputed file. I thought all the SNPs that I provide for imputation will have the same values in the output file with source as 'GWAS', only the unavailable SNPs will be imputed and will have source as 'SSIMP'. Is that right? Do sometimes the GWAS source SNPs will also be imputed ?

Also, what is the use of Z_reimputed and r2_reimputed? The following is the histogram of the r2_reimputed values for SNPS with source "GWAS". How should I interpret this ? Is it normal?

screen shot 2018-01-18 at 09 58 35

Also here is the scatter plot of actual Z scores vs Imputed Z scores for the SNPS with source GWAS.

screen shot 2018-01-18 at 09 56 54

And one important thing I want to mention is that I used GWAS SNPs that are imputed rather than genotyped. The short version of a long story is I did some analysis inside a secured server where they allowed me to take only 500k SNP inside due to computational limitations. So I ran GWAS on only 500K SNPS (pruned from original 8 million imputed SNPs) and exported them out of server and used your software to impute the rest.

It will be very helpful if you have any comments or suggestions.

Thank you

Regards Veera

aaronmcdaid commented 6 years ago

Hi @drveera , Z_reimputed and r2_reimputed are from the first window only, where each tag is reimputed based on all the other tags. You can ignore those columns

It can be interesting to give a sense of how accurate the imputation is. We hide one tag, reimpute it from all the other tags, and then we can compare (as you have done) how similar they are to each other

drveera commented 6 years ago

Thanks @aaronmcdaid for your response. So do you think the distribution of r2_reimputed shown above is acceptable? Will it help if I retain only SNPs with r2_reimputed > 0.8 or 0.9 and rerun the imputation with only those SNPs?

sinarueeger commented 6 years ago

I added an example here: docu/sanitycheck_reimputed.R

drveera commented 6 years ago

Hi thanks again for you reply. Still not fully clear. As you can see in the above histogram, there are SNPs that has re_imputed values around 0. So does that mean the information imputed using these SNPs as reference will also be wrong? Should I remove those SNPs with low re_imputed R2 and repeat the imputation again? Will it help? So far I have done imputation for 5-7 different GWAS datasets. Consistently the GC Lambda values are always < 1. Any idea why is it so. Due to over correction?

sinarueeger commented 6 years ago

Do not remove these tag SNPs with a low r2_reimputed. A tag SNP with a low r2_reimputed simply means that the tag SNP is badly tagged, but that does not mean that it will not work as a tag SNP itself. For the GC Lambda I will answer you by tomorrow eve.

drveera commented 6 years ago

Sure. thank you very much for your responses.

sinarueeger commented 6 years ago

Are you calculating the GC lambda using all SNPs, or only including the SNPs with a high imputation quality (r2.pred > 0.7)?

drveera commented 6 years ago

Hi I’m calculating only for snps with r2 > 0.90

On 19 Jan 2018, 12:03 PM +0100, Sina Rüeger notifications@github.com, wrote:

Are you calculating the GC lambda using all SNPs, or only including the SNPs with a high imputation quality (r2.pred > 0.7)? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

sinarueeger commented 6 years ago

Sorry for making you wait - will write you tomorrow.

sinarueeger commented 6 years ago

You mentioned that you have imputed a GWAS and the GC lambda is for all summary stats for well imputed SNPs (R2 > 0.9) is < 1. You repeated this for 5-7 GWASs and you observe always GC lambda < 1. Am I summarising this correctly? What are is the GC lambda of all the typed SNPs (aka tag SNPs)? It is indeed strange that the GC lambda for well imputed SNPs is below one, but one explanation could be that the GC lambda of tag SNPs is also < 1.

drveera commented 6 years ago

Yes you are right. The GC lambda for preimputation set is around 1.01-1.05 and for all post imputation sets are 0.90-0.98. Not big difference, but still I get LDscore regression intercept < 1 and ratio <0 for all. Do you think if there any over-correction happening during the imputation?

sinarueeger commented 6 years ago

From applying SSIMP on simulation and applications on real data, we know, that SNPs with imperfect imputation quality have an underestimated test statistic. This might be the reason why you observe - even when only including r2.hat > 0.9 - a GC lambda < 1.