Open RL-m opened 1 year ago
@RL-m Could you clarify how did you define "recall" -- if a signal is captured by a 95% CS? Also, in our paper we focused on variants with MAF > 5% whereas in Wu et al the other paper you cited, they included rare variants. Is the majority of variants also rare in your wgs-imp simulation?
Sorry I missed those information. Here I defined "recall" as the "power" in SuSiE paper -- proportion of causal captured by a 95% CS and I only used common variants (MAF > 1%) in my simulation.
It is reassuring that susie and finemap show similar trends for the imputed SNPs.
Yes, as Gao said, in your results you should try breaking down the precision/recall by SNP allele frequency.
What is perhaps most weird is that recall decreases with h^2 for the imputed simulations. That seems very wrong. It may not be hard to diagnose the problem because from the non-imputed results the recall should be almost 100% for h^2 that large.... I would suggest looking in detail at what is going on for 1-2 of the simulated datasets with large h^2 and imputed genotypes. For example, is the true causal SNP the one with the largest or near-largest z score?
Matthew
On Mon, Oct 24, 2022 at 8:47 AM Peter Carbonetto @.***> wrote:
It is reassuring that susie and finemap show similar trends for the imputed SNPs.
Yes, as Gao said, in your results you should try breaking down the precision/recall by SNP allele frequency.
— Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/174#issuecomment-1289063997, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRIBIYLN5734QLHE6F3WE2HQ5ANCNFSM6AAAAAARMY7T3A . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@RL-m Are you running susie
or susie_rss
?
I ran susie_rss
with in-sample LD.
@stephens999 I checked if causal SNP is the top SNP (with the largest Chi^2) in each simulation. Here is the result. y-axis is the proportion of simulations where causal SNP has the largest Chi^2. For WGS genotype, almost all the causal SNPs are "top SNP" under the largest h^2, while for imputed genotype, 70% of causal SNPs are "top SNP". For h^2 less than 0.016 (Non-centrality parameter less than 142), the difference is not very obvious.
To add to my question, I think it's expected that z-score of causal SNP in imputed genotype has huge difference from causal SNP in WGS genotype under large h^2 (imputation error may be a reason). But what I don't understand is the decrease of recall in fine-mapping results when h^2 is large in imputed simulations. Why didn't fine-mapping power have the same trend as GWAS power?
are you simulating data with real genotypes and then analyzing it with imputed genotypes?
That is simulating Y = X_real b + E and analyzing using Y = X_impute b + E
where X_impute \approx X_real but not equal?
On Mon, Oct 24, 2022 at 8:45 PM RL-m @.***> wrote:
To add to my question, I think it's expected that z-score of causal SNP in imputed genotype has huge difference from causal SNP in WGS genotype under large h^2 (imputation error may be a reason). But what I don't understand is the decrease of recall in fine-mapping results when h^2 is large in imputed simulations. Why didn't fine-mapping power have the same trend as GWAS power?
— Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/174#issuecomment-1289866894, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRKLF6WGD4QFAHCGBLTWE43T3ANCNFSM6AAAAAARMY7T3A . You are receiving this because you were mentioned.Message ID: @.***>
@stephens999 Yes, that is my simulation design. I used Y= X_real b + E to mimic real phenotype and Y = X_imputed b + E to mimic array-imputed genotype employed in common GWAS analysis.
Hello SuSiE developers,
I used SuSiE to run simulations and found that SuSiE performed differently between WGS genotype imputed genotype. I designed 2 sets of simulations: 1) I used real WGS data as genotype and simulated phenotype in similar way as mentioned in SuSiE paper section 4. 2) I selected partial SNPs (SNPs on UKB Axiom array) from WGS data and imputed them to HRC reference panel as genotype (as mentioned in this paper). Then I simulated phenotype in the same way as 1). In both scenarios I set one single effect variable and I change the variance explained from 0.004~0.8 with a sample size of 8853. However, with larger variance explained (>0.016), the recall of SuSiE results decreased in scenario 2) while the recall of scenario 1) is as expected. I also ran another fine-mapping software FINEMAP, and the results are very similar as SuSiE.
I couldn't think of a proper explanation for this finding and I checked your simulations in the paper, it seems that you used the same genotype as scenario 1). Do you have any idea why imputed genotype would cause these results?