stephenslab / susieR

R package for "sum of single effects" regression.
https://stephenslab.github.io/susieR
Other
172 stars 43 forks source link

IBSS algorithm did not converge in 100 iterations! #176

Closed liqingbioinfo closed 1 year ago

liqingbioinfo commented 1 year ago

Dear SuSiE creators:

I ran SuSiE on my summary statistical data and external reference LD. I got many warning messages that said, "IBSS algorithm did not converge in 100 iterations!" I first ignored such warning messages, but then found SuSiE credible sets results were somehow against intuitions (listed below). Whether the failure of SuSiE IBSS convergence result in my strange SuSiE results? If so, I really appreciate any solution to avoid the failure of IBSS convergence.

SNP CHR BP A2 A1 MAF N Z PIP CREDIBLE_SET
rs10890343 1 46157922 C T 0.288325 85716 -3.03604 1 3
rs6665808 1 46162283 C T 0.288348 85716 -3.02961 1 4
rs61784800 1 46178112 A G 0.295552 85716 -2.81034 1 2
rs28376548 1 46210382 G A 0.286697 85716 -3.04516 1 1

rs10890343 and rs6665808 are close to each other, and they have similar Z scores as they are in high LD. Based on SuSiE's toy example, these two highly correlated SNPs should be grouped into one credible set as we have difficulties distinguishing them. However, I got two credible sets for each of them. Any elaborations are highly appreciated!

Yours sincerely Leah

gaow commented 1 year ago

@liqingbioinfo could you try the diagnosis along the lines of this page see if that gives you some hints about it?

liqingbioinfo commented 1 year ago

Hi Gao

Thank you so much for your prompt reply.

I read the "LD information from the reference panel" part on your page before posting my question. In your "LD information from the reference panel" part, you demonstrated that one misleading SNP had an incorrect SNP effect size sign which led to a false positive discovery. After correcting the sign of effect size of that missing leading SNP, the false positive credible set disappeared. However, in my case, I am sure that my SNPs effect sizes signs are correct, and my external reference LD information and my summary statistics are from the same ethic. Are there any other ways to avoid the failure of IBSS convergence? How about including fewer SNPs in SuSiE?

Best regards Leah

gaow commented 1 year ago

@liqingbioinfo could you post here the plot from kriging_rss ? Thanks!

liqingbioinfo commented 1 year ago

Hi Gao

Thank you very much for your suggestions. I ran "kriging_rss(zscores, R_ref, n=85716)$plot" but got an error as below:

Error in check_aesthetics(): ! Aesthetics must be either length 1 or the same as the data (38): y

  Can I send my scores files and my LD matrices to you? So you can reproduce my results and even errors to debug? If so, what's the best email to reach you? 

Best regards Leah

pcarbo commented 1 year ago

@liqingbioinfo How many SNPs are you looking at here?

liqingbioinfo commented 1 year ago

@pcarbo Hi Peter, I had 5,418 z scores when I got the convergence failure. I just tried to reduce it to about 1000 (split my 5k SNPs into 5 sets, each set with around 1k z scores), and the convergence failure disappeared. Does that mean I should run susie_rss with no more than 1000 z scores?

pcarbo commented 1 year ago

That's interesting @liqingbioinfo; it is hard to say, it is possible you got lucky and removed some SNPs causing issues. Could you please rerun susie_rss with verbose = TRUE and share the output to the console?

liqingbioinfo commented 1 year ago

@pcarbo Thanks so much, Peter. I followed your advice and ran susie_rss with verbose=TRUE on my 5,418 z scores. Please find the outputs below.

fitted_rss <- susie_rss(zscores, R_ref, n=85716, L = 10, verbose = TRUE) HINT: For large R or large XtX, consider installing the Rfast package for better performance. WARNING: XtX is not symmetric; forcing XtX to be symmetric by replacing XtX with (XtX + t(XtX))/2 [1] "objective:-121564.286011201" [1] "objective:-121420.928545254" [1] "objective:-121256.684175818" [1] "objective:-121088.312820567" [1] "objective:-120918.536214462" [1] "objective:-120748.457449151" [1] "objective:-120578.603311599" [1] "objective:-120409.173385859" [1] "objective:-120240.325384438" [1] "objective:-120072.194488068" [1] "objective:-119904.873966087" [1] "objective:-119738.427870526" [1] "objective:-119572.867855749" [1] "objective:-119407.842906286" [1] "objective:-119241.852181439" [1] "objective:-119074.607408203" [1] "objective:-118907.531289341" [1] "objective:-118742.796378205" [1] "objective:-118578.500782208" [1] "objective:-118414.375249951" [1] "objective:-118250.40007027" [1] "objective:-118087.675809122" [1] "objective:-117925.96048741" [1] "objective:-117765.089686481" [1] "objective:-117604.987921771" [1] "objective:-117445.615983111" [1] "objective:-117286.918360643" [1] "objective:-117128.82780346" [1] "objective:-116971.298348918" [1] "objective:-116814.335543441" [1] "objective:-116657.987024549" [1] "objective:-116502.30795693" [1] "objective:-116347.33870226" [1] "objective:-116193.101988764" [1] "objective:-116039.608201594" [1] "objective:-115886.860796443" [1] "objective:-115734.859765478" [1] "objective:-115583.603474728" [1] "objective:-115433.089543085" [1] "objective:-115283.315237552" [1] "objective:-115134.277646092" [1] "objective:-114985.973753259" [1] "objective:-114838.40047636" [1] "objective:-114691.554683881" [1] "objective:-114545.433207362" [1] "objective:-114400.032849364" [1] "objective:-114255.350389157" [1] "objective:-114111.382587815" [1] "objective:-113968.126191819" [1] "objective:-113825.577936552" [1] "objective:-113683.734549082" [1] "objective:-113542.592750684" [1] "objective:-113402.149258683" [1] "objective:-113262.400788854" [1] "objective:-113123.344056457" [1] "objective:-112984.975778081" [1] "objective:-112847.292672682" [1] "objective:-112710.291463202" [1] "objective:-112573.968876956" [1] "objective:-112438.321646848" [1] "objective:-112303.346512506" [1] "objective:-112169.040220658" [1] "objective:-112035.399525871" [1] "objective:-111902.421191271" [1] "objective:-111770.10198895" [1] "objective:-111638.438700569" [1] "objective:-111507.428117674" [1] "objective:-111377.067042135" [1] "objective:-111247.352286537" [1] "objective:-111118.280674407" [1] "objective:-110989.849040374" [1] "objective:-110862.054230276" [1] "objective:-110734.893100701" [1] "objective:-110608.362518161" [1] "objective:-110482.459356388" [1] "objective:-110357.180489992" [1] "objective:-110232.522780571" [1] "objective:-110108.48304331" [1] "objective:-109985.057981263" [1] "objective:-109862.244079211" [1] "objective:-109740.037532675" [1] "objective:-109618.43439218" [1] "objective:-109497.430841572" [1] "objective:-109377.023598933" [1] "objective:-109257.213357407" [1] "objective:-109138.032970216" [1] "objective:-108856.183253484" [1] "objective:-108072.56499724" [1] "objective:-107779.716861665" [1] "objective:-107611.134044919" [1] "objective:-107442.588540823" [1] "objective:-107274.080240477" [1] "objective:-107105.609032486" [1] "objective:-106937.174803695" [1] "objective:-106768.777438831" [1] "objective:-106600.416821207" [1] "objective:-106432.092832491" [1] "objective:-106263.805352197" [1] "objective:-106095.554258855" [1] "objective:-105927.339429416" Warning message: In susie_suff_stat(XtX = XtX, Xty = Xty, n = n, yty = (n - 1) * : IBSS algorithm did not converge in 100 iterations! Please check consistency between summary statistics and LD matrix. See https://stephenslab.github.io/susieR/articles/susierss_diagnostic.html

pcarbo commented 1 year ago

@liqingbioinfo You can send me an email with the inputs you provided to kriging_rss. You can find my email address on my webpage.

Jesson-mark commented 1 year ago

@pcarbo Thanks for your great work to susieR. I'm recently using it to do fine-mapping using public GWAS summary statistics and LD from a reference panel. I just encountered the same problem as liqingbioinfo and I have performed diagnostics listed in Diagnostic for fine-mapping with summary statistics, below is lambda and output of kriging_rss:

> lambda = estimate_s_rss(z_scores, ld, n=n)
> lambda
[1] 0.611342

image

The summary statistics of GWAS have samples above 400k, and the reference panel used to calculate LD is the same ethnic population from 1000 Genome Project, which has about 600 individuals. So does this warning inform that the sample size of reference panel is not enough to calculate LD and thus cannot represent true linkage disequilibrium?

stephens999 commented 1 year ago

Yes, this plot shows very extreme decisions from the expectations. There may be some issues with allele flips in your data. (See Zou et al, plos genetics). In general use of an out of sample ld matrix should be a last resort, i would try to use the in sample ld matrix.

Matthew

On Thu, Dec 15, 2022, 21:51 王杰 @.***> wrote:

@pcarbo https://github.com/pcarbo Thanks for your great work to susieR. I'm recently using it to do fine-mapping using public GWAS summary statistics and LD from a reference panel. I just encountered the same problem as liqingbioinfo and I have performed diagnostics listed in Diagnostic for fine-mapping with summary statistics https://stephenslab.github.io/susieR/articles/susierss_diagnostic.html, below is lambda and output of kriging_rss:

lambda = estimate_s_rss(z_scores, ld, n=n) lambda [1] 0.611342

[image: image] https://user-images.githubusercontent.com/49034603/208017222-57af665d-eb99-4929-a5f8-451743ca1872.png

The summary statistics of GWAS have samples above 400k, and the reference panel used to calculate LD is the same ethnic population from 1000 Genome Project, which has about 600 individuals. So does this warning inform that the sample size of reference panel is not enough to calculate LD and thus cannot represent true linkage disequilibrium?

— Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/176#issuecomment-1354160428, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRROSZIWUWADOEU4WPUTWNPRMZANCNFSM6AAAAAAS4I4CF4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

pcarbo commented 1 year ago

I'm sharing the kriging_rss plots from the data shared by the previous commenter @liqingbioinfo which also suggest that there are some inconsistencies between the z-scores and the LD matrix.

screenshot

liqingbioinfo commented 1 year ago

@pcarbo Thank you, Peter, for your great help in plotting these figures. We agreed that these figures showing expected values deviate from observed values, especially those red dots. Should I remove these red dots? Using in-sample LD is the best choice. However, I do not have individual genotype data. What can I do to improve my SuSiE results, considering I only have summary statistic data and external LD references?

stephens999 commented 1 year ago

the bottom line is, I think, that the field does not really have a great answer to your question. The problem with mismatched reference is recognized, but difficult to solve in general. Removing the red dots might be a good start but it is hard to say it will suffice. The simplest fall-back is to use L=1 because that does not require LD information (and will work well for regions containing at most 1 causal SNP).

I will add that in the last plot none of the z scores are very big - by eye the region looks like it could be null?

Matthew

On Fri, Dec 16, 2022 at 9:01 AM liqingbioinfo @.***> wrote:

@pcarbo https://github.com/pcarbo Thank you, Peter, for your great help in plotting these figures. We agreed that these figures showing expected values deviate from observed values, especially those red dots. Should I remove these red dots? Using in-sample LD is the best choice. However, I do not have individual genotype data. What can I do to improve my SuSiE results, considering I only have summary statistic data and external LD references?

— Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/176#issuecomment-1355016236, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRL3XGWXZIVQ3CIFHTDWNR76DANCNFSM6AAAAAAS4I4CF4 . You are receiving this because you commented.Message ID: @.***>

gaow commented 1 year ago

There are two issues in this type of problems:

  1. Simple harmonization: major/minor allele flips, strand flips
  2. Mismatched summary stats and LD reference diagnosis and QC

For 1, I believe many software packages implement some features to do that eg bigsnpr::snp_match and also codes from LDSC related software.

For 2, we don't know what would be a proper method to use and fix the issue (if at all). The best we can do is to identify and QC them. For example there is a software to do that with one approach implemented.

@pcarbo do you konw if there is a good streamlined workflow for these ? We have built some for in house uses since a while ago so I did not really search actively if someone else recently posted something decent to use along these lines. @liqingbioinfo @Jesson-mark your input are welcomed!

pcarbo commented 1 year ago

I do not know of a streamlined workflow; in the susie-rss paper in fact we made a point of mentioning that we have a diagnostic, but not an automated pipeline for "cleaning up" the summary statistics. A simple thing you can do of course is remove the SNPs that are far away from the diagonal in the kriging plot. There is also the DENTIST software, and it is possible that this software automates this more (I don't have any experience with it).

liqingbioinfo commented 1 year ago

@stephens999 @gaow @pcarbo Tons of thanks. I will begin by removing the red dots and ensuring my LD reference matches my GWAS summary statistics.

PhoebeGuo97 commented 1 year ago

Hello, I'm having the same issue. I'm wondering how to remove those red dots. I don't know what SNPs corresponding to them according to condz$plot. Thanks!

pcarbo commented 1 year ago

@PhoebeGuo97 It isn't very well explained in the susieR package, but you would use the conditional_dist output from kriging_rss; the S1 text of the susie-rss PLoS Genetics paper (see the section "Likelihood ratio for detecting allele flips") explains how to use these diagnostic statistics to identify problematic SNPs.

PhoebeGuo97 commented 1 year ago

@pcarbo Thanks for the information! After removing red dots, I'm still getting the same warning, i.e. IBSS algorithm did not converge in 100 iterations. I'm wondering what the range of an acceptable lambda is. I have tried two LD reference panels and removed red dots, so far the smallest lambda I can get is ~0.23. Many thanks!

gaow commented 1 year ago

@PhoebeGuo97 i sent you an email to test out an in house pipeline we had implementing SLALMON if you could test it out then we 'll try to clean it up and share with the authors here for review. Thanks!

PaulaLopezMarti commented 1 year ago

Hello, I am having a similar issue as @PhoebeGuo97. I've run susie_rss() on 1399 SNPs and I got the same warning (IBSS algorithm did not converge in 100 iterations...). I built the LD matrix from the 1KG European population (same ethnicity as in the summary statistics file) using plink1.9. I've checked for allele discrepancies (with bigsnpr::snp_match as suggested above) and removed those conflicting SNPs, so I had a resulting set of 1399 SNPs. When running susie_rss() I get the mentioned error. Also, my lambda score is ~0.22, and when plotting the condz$plot there are no red dots (although maybe there is too much deviation between expected and observed? I don't know which would be the "accepted" limit). Thanks! b21bee83-f299-4cfb-976a-4b1bec5af2db

pcarbo commented 1 year ago

@PaulaLopezMarti You could try removing some more SNPs that are furthest away from the diagonal in your plot.

Also feel free to share the console output of susie_rss with verbose = TRUE.

gaow commented 1 year ago

The SNPs further away from the diagonal are those with small p-values. Our diagnostic model assumes small effects though. I'm not sure how well it behaves in this case when the outliers have what seems strong effect size.

stephens999 commented 1 year ago

the diagnostic can help diagnose problems, but here it looks like the LD matrix might just not be good enough...

PhoebeGuo97 commented 1 year ago

I just tried rerunning susie_rss using a different LD matrix calculated from 1000 Genome phase 1. This time, it converged but it seems the LD reference still doesn't match the GWAS summary statistics. I'm kind of confused. What does the warnning "IBSS algorithm did not converge in 100 iterations" mean? Thanks. Screenshot 2023-02-14 at 5 26 15 PM

gaow commented 1 year ago

What does the warnning "IBSS algorithm did not converge in 100 iterations"

@PhoebeGuo97 it literally means that IBSS algorithm still has not found an optimal solution after 100 iterations so it stops and gives you the solution as is. There may be many reasons for convergence failure but the most likely reason is still LD and summary statistics mismatch. I was told that you are meeting with someone from my team later this week to review this. Hopefully you'll have some more insights after that. We are still testing the pipeline on our own data. So far we find that the SLALOM procedure (a simplified DENTIST method plus some ad hoc processing based on the author's own experiences) works in many cases but can still fail in a small number of GWAS regions we finemapped. It is our plan to dig a bit more and share with the SuSiE developers what we learned and see if we can come up with some recommendations & codes for the community.