stephenslab / susieR

R package for "sum of single effects" regression.
https://stephenslab.github.io/susieR
Other
178 stars 47 forks source link

Expected vs Observed z-scores are not the same #236

Open tamil-acog opened 2 months ago

tamil-acog commented 2 months ago

Hi team,

I am trying to build a pipeline for fine-mapping with Susie. My results are sometimes off. I'll describe about my input data and what are the issues I am facing, please help me resolving that if possible.

I use, UKBB data for the fine mapping. Both my sumstats and LD matrix are from UKBB data.

After, going through some of the discussions in the issues, I found out that I have follow the following,

1) ESTIMATE_RESIDUAL_VARIANCE = False 2) Calculate the LD matrix with built-in R "cor()" function rather than plink.

After, adjusting my pipeline with the above changes, I face the following issues: 1) Expected vs Observed scores plot, still doesn't exactly match even though I have an In-sample LD matrix. 2) It takes a very large time to calculate the correlation matrix using built-in R. Is there a better way to do it? 3) Sometimes I don't get any credible sets. So, what should be an ideal, "coverage" parameter?

Expected vs Observed plot:

Screenshot 2024-08-12 at 1 06 13 PM

Z-scores distribution:

Screenshot 2024-08-12 at 1 07 07 PM
pcarbo commented 2 months ago

@tamil-acog The first thing that jumps out at me is that your association results don't seem very strong. I presume you first ran a basic association analysis (in PLINK, for example)? What were the smallest p-values from this association analysis? If the association results are not strong enoug it may not make sense to perform fine-mapping in this region. (Typically we look for p-values smaller than approximately 1e-8, although this may be different in UK Biobank depending on how the association analysis is conducted.)

tamil-acog commented 2 months ago

Hi Thank you very much for the timely response. I got your point and I checked the p-values and you were right. Thanks

But my concerns are mainly on "Expected vs Observed Z-scores": I checked for other traits, I got some hits there in the credible sets. But, still the "expected vs observed" plot is same as above, though my LD matrix is in-sample.

Some info:

My questions:

pcarbo commented 2 months ago

Hi @tamil-acog, I'm not super familiar with PLINK, but this does look like the right approach. Did you also run your association analysis in PLINK?

I will note that others have encountered challenges in making the z-scores and LD consistent, so you are far from the only one. See for example Issue 207; I recommend searching the Issues on GitHub for other discussion.

It might also be helpful to reviews at the steps we took to generate the assocation statistics and LD matrices for our PLoS Genetics paper. The scripts can be found here.

Hope this helps.