Error in the LD matrix - Githubissues

zhilizheng / SBayesRC

GNU General Public License v3.0

25 stars 5 forks source link

Error in the LD matrix #38

Open JagadishUEF opened 1 month ago

JagadishUEF commented 1 month ago

Hi, I was trying a first run with GWAs summary statistics and encountered the below error. I was using the sparse LD matrix based on hapmap ukbEUR. I had downloaded the matrices available here: https://opain.github.io/GenoPred/Pipeline_prep.html#46_prepare_score_and_scale_files_for_polygenic_scoring_using_sbayesr

Per-SNP window size mean 4206.32 sd 1149.36. LD matrix diagonal mean 0.00156614 sd 0.0484463.

ERROR: The mean of LD matrix diagonal values is expected to be close to one. Something is wrong with the LD matrix!

Analysis finished: Fri Oct 11 13:03:07 2024 Computational time: 0:0:11

Could you please help or if suitable ld matrices are available elsewhere. Thanks !

zhilizheng commented 1 month ago

Hi @JagadishUEF ,

Could you follow our tutorial? The readme of the project should be easy to follow. https://github.com/zhilizheng/SBayesRC

Regards, Zhili

JagadishUEF commented 1 month ago

Thanks for the reply. I tried to follow closely your tutorial and managed to run a first analyses.

The resulting polygenic score was expected to be significant but was not. Hence, I wanted to check with you if running sbayesrc without annotation be equivalent to sbayesr ? And would using either --ldm or --ldm-eigen give identical result ?

I presume the ld-matrix based on different ancestry could be the issue - how to generate ldRef when using summary statistics? Also, it is possible to merge the 3 ld files (from different ancestry) kindly provided in your resources?

Thanks for your time.

zhilizheng commented 1 month ago

Hi @JagadishUEF ,

The results from the SBayesR and SBayesRC is very similar. The prediction accuracy for SBayesRC is slightly lower than SBayesR if it's the cross-validation (not significant).

For the LD matrix, if you have the mixed ancestry, there are different ways to do this. It's better to use in-sample LD data if you have. If not, the proportion may be a problem, so it's good to generate it yourself, with the close ratio of samples in each population. We provided the function to generate this with the genotype data.

Regards, Zhili

JagadishUEF commented 4 weeks ago

Hi Zhili,

Thanks for the response. As I am using summary stats and no genotypes available, any tips would be highly appreciated on how would I generate the LD matrix with a close ratio of samples from each population ?

Also to test with another population, I tried to run using ukbEAS_imputed ld-eigen blocks, but it always stops at block 270 and gives error: segmentation fault (core dumped). I had tried it 3 times and it takes about 3.5 hrs until this block. I believe it isnt memory issue as ukbEUR_imputed (which is larger) runs fine.