Closed VitorAguiar closed 1 year ago
@VitorAguiar It is possible one or more CSs are being removed due to low purity. Try setting min_abs_corr
in susie
or susie_rss
to a smaller number (e.g., 0.1) and see if that fixes the issue.
Hi @pcarbo, setting min_abs_corr = 0.1 does not fix it. However I found something interesting.
For the plot above, I used L = 3. If I simply use L = 5 (and back to the default value for min_abs_corr), that SNP on the top now has PIP=0.949 and it's the single variant in a new CS3 (blue).
Therefore, with L = 5 I have 3 credible sets, whereas with L = 3 I have 2 credible sets. I don't understand how that can happen, I mean, using L = 3 should also reveal 3 credible sets, right?
I agree that should not happen. Can you try with L = 3
and refine = TRUE
?
@pcarbo, using susie_rss(z, ld, n, coverage = 0.9, r_tol = 1e-05, L = 3, refine = TRUE)
does not fix it, I still get 2 CSs and the top variant is not captured by any CS.
what is the ELBO value for the different runs?
(In rare cases it is possible that increasing L can lead to finding a better solution...)
Matthew
On Wed, May 24, 2023 at 1:03 PM Vitor @.***> wrote:
@pcarbo https://github.com/pcarbo, using susie_rss(z, ld, n, coverage = 0.9, r_tol = 1e-05, L = 3, refine = TRUE) does not fix it, I still get 2 CSs and the top variant is not captured by any CS.
— Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/189#issuecomment-1561710025, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRN7PLITAKRC2YQI3J3XHZEN3ANCNFSM6AAAAAAYMPS35M . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi Matthew,
For L = 3:
-15455.63 -15454.81 -15454.76 -15454.76 -15454.75 -15454.75
For L = 5:
-15454.91 -15453.16 -15452.94 -15452.91 -15452.9 -15452.9 -15452.9
Now I just noticed that if I use L = 5
and refine = TRUE
, the algorithm converges in 2 iterations, and I'm back to 2 CSs and the top PIP variant (PIP=0.88) is not included in any CS. The ELBO in this case is:
-15451.95 -15451.95
Can you share a plot of z-score vs. variable? (Same as the PIP plot except that you are showing the z-scores instead of the PIPs.)
@pcarbo I suppose this is the plot you want to see:
Thank you all for being so helpful! I just want to add an additional analysis.
Instead of defining a ±250kb window around a lead SNP as I did above, I also tested a larger window corresponding to the LD block in Europeans (as it was done by Kundu et al, Nature Genetics, 2022; see "Methods: Locus definition"). Then I ran susie_rss(z, R, n, coverage = 0.9, r_tol = 1e-05, L = 5)
.
In the plot below, the vertical lines correspond to the small window tested in my previous posts.
One CS is found outside the smaller window, and the right-most half of the smaller window in not included in the LD-based window, but if I compare the PIPs for the SNPs included in both datasets (~460 SNPs), I find this:
So, the SNP with high PIP not included in any CS when L = 3
in my previous posts now has a low PIP of 0.01!
@VitorAguiar Regarding the z-scores, yes as you can see the third CS does not have very strong support (the first two CSs have SNPs which lager z-scores), which I think explains why you are usually only getting two CSs. So in the end the susie fit with two CSs is probably the right result (even if the PIP for that one SNP in the third CS is high).
@pcarbo That is interesting. So, would you say that besides PIP I should also put a threshold on the z-scores of variants in a CS to be confident of the CS?
The results of Weissbrod et al, Nature Genetics, 2020 comes to my mind. If I remember correctly, they used functional annotations to tune the prior probabilities at each SNP for Susie, and discuss that 10% of the 95% CSs that they find are not GWAS genome-wide significant, and the reasons for that could be (1) GWAS threshold is too stringent; (2) hidden effects by linkage masking might be revealed by Susie modeling; and (3) functionally-informed fine mapping can identify true weak effects.
So, would you say that besides PIP I should also put a threshold on the z-scores of variants in a CS to be confident of the CS?
In principle, you shouldn't have to since susie should automatically determine the number of CSs so long as estimate_prior_variance = TRUE
. You could also look at the log-Bayes factor (lbf) output to assess the support for the CS.
going back over the thread i realize i didn't look at the initial post carefully enough. The short answer to the question "is it expected that the SNP with the largest PIP is not included in the credible sets?" is that this is not that unexpected.
eg If you have a "borderline significant" SNP that is not in LD with much else, then that SNP may have a moderately large PIP (eg 0.5) but not be included in any CS.
I see. Thank you all for this incredible interaction. I learned that I need to be careful and explore different values for L
and genomic window sizes, and check the z-scores and lbf to interpret SNPs that are or are not included in a CS.
I worry though that many people are just reporting 95% CSs, and it seems that results can change a lot depending on how you run the analysis.
I worry though that many people are just reporting 95% CSs, and it seems that results can change a lot depending on how you run the analysis.
We share this sentiment! Good documentation can help but as you can see it is work-in-progress…
Hello,
is it expected that the SNP with the largest PIP is not included in the credible sets?
For the plots below, I've used a 500kb region from a GWAS with N ~ 11K (4K patients, 7K controls) and LD matrix from 1000 Genomes Project (N = 503). I asked for a coverage of 90%. The CS in green has P = 93.0%, and the CS in blue has P = 95.3%. The SNP on the top has PIP = 0.66.