Open joshchiou opened 3 years ago
@joshchiou If you could please share with us a reproducible example that results in no CSs, we'd be happy to take a look.
@joshchiou you requested a rather high coverage of 0.99. Do you find any CS if you lower it to say 0.90?
Thanks for the fast response. @pcarbo, I uploaded example 1 and example 2.
@gaow I have tried using the default coverage=0.95
and coverage=0.9
as well but have not found any CS.
Here is the relevant code I've been using.
res <- susie_rss(data$z$z, as.matrix(data$ld), L=10, coverage=0.99, min_abs_corr=sqrt(0.1)
isn't it expected to find no CSs if there are no sufficiently strong signals? Is it possible that these regions do not have sufficiently strong signals?
On Wed, May 5, 2021 at 9:55 AM Josh Chiou @.***> wrote:
Thanks for the fast response. @pcarbo https://github.com/pcarbo, I uploaded an example here https://www.dropbox.com/s/iiphf5ujbzel2of/test_data.rds?dl=0.
@gaow https://github.com/gaow I have tried using the default coverage=0.95 and coverage=0.9 but have not found any CS
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/122#issuecomment-832755647, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRIWDIVH4KC5VCQBSJTTMFL65ANCNFSM44DYECHA .
I suppose it's possible. I have been using a list of index variants from the paper describing the original GWAS. I checked the regions that did not output CSs and a few of them are sub-genome-wide significant (P<5e-8) in the published discovery summary statistics.
That being said, most of them are above or near genome-wide significance (P<=5e-8). I did try lowering the coverage to 0.9, and it seemed to produce CSs for some, but not all of them. Is using a higher coverage (e.g. 0.99) not recommended? Many fine mapping papers report 99% credible sets, which is why I typically default to 0.99.
@joshchiou Thanks for sharing the example above. Among the examples that did not produce a CS, what is the smaallest p-value?
Here is an example 3 of one that did not produce a CS using lower coverages (coverage=0.9
or coverage=0.95
). The p-value of the index variant of the region is ~1.5e-8.
Not sure whether this is related, but I am also having issues fine mapping regions with very strong signal (P<1e-300) see example 4. Although it eventually converges after 1000s of iterations, the CSs all contain exactly 1 variant which are in high LD with each other (abs(r)>0.9).
can you make sure you are running the latest version of SuSie from github?
On Wed, May 5, 2021 at 6:09 PM Josh Chiou @.***> wrote:
Here is an example 3 https://www.dropbox.com/s/ml45jyiv06zft0k/test_data3.rds?dl=0 of one that did not produce a CS using lower coverages (coverage=0.9 or coverage=0.95). The p-value of the index variant of the region is ~1.5e-8.
Not sure whether this is related, but I am also having issues fine mapping regions with very strong signal (P<1e-300) see example 4 https://www.dropbox.com/s/sijiffpdbj7c6xm/test_data4.rds?dl=0. Although it eventually converges after 1000s of iterations, the CSs all contain exactly 1 variant which are in high LD with each other (abs(r)>0.9).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/122#issuecomment-833106152, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRP7SQFM2GDORVNG7F3TMHF3RANCNFSM44DYECHA .
Sure, I'm currently running 0.11.7 (Apr 27) but I can try again with the latest commit.
@joshchiou How did you get the LD matrix? I checked example 4, the SNPs in the identified CSs are highly correlated (based on the LD matrix), but their z scores are quite different. If 2 SNPs are highly correlated, their z scores should be very similar. This suggests that the LD matrix is from a poor reference panel.
it seems in your example 3 that the problem is that the CS is being filtered out due to low purity.
Try this: temp = readRDS("~/Downloads/test_data3.rds") temp.fit = susie_rss(temp$z[,1],as.matrix(temp$ld)) temp.cs = susie_get_cs(temp.fit,as.matrix(temp$ld),min_abs_corr = 0.0)
I do think it is a bit surprising at first site that this CS should be low purity... i will have to look a bit more at it. Matthew
On Wed, May 5, 2021 at 6:20 PM Matthew Stephens @.***> wrote:
can you make sure you are running the latest version of SuSie from github?
On Wed, May 5, 2021 at 6:09 PM Josh Chiou @.***> wrote:
Here is an example 3 of one that did not produce a CS using lower coverages (coverage=0.9 or coverage=0.95). The p-value of the index variant of the region is ~1.5e-8.
Not sure whether this is related, but I am also having issues fine mapping regions with very strong signal (P<1e-300) see example 4. Although it eventually converges after 1000s of iterations, the CSs all contain exactly 1 variant which are in high LD with each other (abs(r)>0.9).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
@zouyuxin I created the matrix by extracting from pre-computed LD values from gnomAD v2.1.1 NFE (Non-finnish European, N= 7,718). It's not an exact match to the GWAS (of European samples) I'm trying to fine map, but unfortunately I don't have access to the underlying genotypes.
That being said, I have noticed the same thing happening as example 4 for regions with strong signals for different traits in the UK biobank. In those cases, I was using the same samples to calculate the LD matrix.
If you can give us an example of this happening with an in-sample ld matrix we can look into it...
On Wed, May 5, 2021, 21:10 Josh Chiou @.***> wrote:
@zouyuxin https://github.com/zouyuxin I created the matrix by extracting from pre-computed LD values from gnomAD v2.1.1 https://gnomad.broadinstitute.org/downloads NFE (Non-finnish European, N= 7,718). It's not an exact match to the GWAS (of European samples) I'm trying to fine map, but unfortunately I don't have access to the underlying genotypes.
That being said, I have noticed the same thing happening as example 4 for regions with strong signals for different traits in the UK biobank. In those cases, I was using the same samples to calculate the LD matrix.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/122#issuecomment-833172189, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRO2TB7IA535GQ225VTTMH277ANCNFSM44DYECHA .
Here is example 5, which eventually converges with an in-sample LD matrix (generated by plink using 341k unrelated UK biobank samples).
@joshchiou Could you share more details about how did you compute the z scores? Is it about a quantitative trait or a binary trait?
in example 5 the LD matrix is not positive semi-definite. Possibly this is rounding error. But is it possible it is computed with missing genotypes or some other issue?
sort(ld.eigen$values)[1:10] [1] -1.1655572 -0.7359617 -0.6541895 -0.4747877 -0.3862273 -0.3661112 [7] -0.3456588 -0.3045849 -0.2319399 -0.2089683
On Thu, May 6, 2021 at 5:25 PM Yuxin Zou @.***> wrote:
@joshchiou https://github.com/joshchiou Could you share more details about how did you compute the z scores? Is it about a quantitative trait or a binary trait?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/122#issuecomment-833912689, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRI7KGPXZ4BKYY2NFGLTMMJL5ANCNFSM44DYECHA .
@zouyuxin In example 5, the z-scores are from a quantitative trait. Examples 1-4 are from a binary case/control trait.
@stephens999 I computed the LD matrix from imputed genotypes with plink 1.9 - rounding error is definitely a possibility, also hard calls instead of dosages (which the GWAS used) might contribute additional error.
to return to the original issue: the answer is that the cs is being pruned due to being low purity.
You can avoid the purity filter by something like:
susie_get_cs(fit,as.matrix(temp$ld),min_abs_corr = 0.0)
Removing the purity filter in this way will tend to also return some "irrelevant" CSs that contain very large numbers of SNPs,
but you can make a decision about those -- they are fairly obvious. (Looking at the estimated prior
variance using susie_get_prior_variance(fit)
may also be helpful -- small values here may indicate
irrelevant CSs).
My guess is that this problem here may be partly due to a poor LD matrix -- the SNPs in the CS are probably in higher LD than the matrix indicates, and so if we had access to the correct matrix maybe the filter would not remove this CS.
I'm leaving this issue open as it may be worth us thinking about whether the default filter needs modifying when using an out-of-sample LD matrix.... certainly we might want to document this issue as something for users to be aware of.
Hi there, I've been using
susie_rss
for fine mapping summary statistics using LD references from gnomAD. For most loci (112/134), I have been able to get credible sets, but I have been unable to get credible sets from the remaining loci.I've noticed that for these loci, SuSiE usually converges within the 2nd iteration
res$converged
res$niter
But there are no credible sets in the results
res$sets
Any idea what's going on here? I'm happy to share the data if needed - I'm pulling from public sources.