privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
183 stars 43 forks source link

NAs with LDpred2 using UKBB and Finngen sumstats #437

Closed Herest closed 11 months ago

Herest commented 11 months ago

Hi Florian,

I'm using LDpred2 for training a T2D model using the UK Bio bank data and summary statistics downloaded from Finngen. I followed the steps written in the most recent tutorial from your GitHub and for the QC of the summary statistics I followed the script you have in another repo, however I'm getting all NAs in the final model using the _snp_ldpred2auto.

Do you have idea what am I doing wrong? I'm also getting a warning when calculating the LD correlation matrix Warning message: NA or NaN values in the resulting correlation matrix. that was not appearing in other tests that I preformed before doing this training. Thanks @privefl

privefl commented 11 months ago

Yeah, you can't have these NaN values in the correlation matrix. It usually means you have variants with no variation at all (MAF = 0); you should remove those (cf. similar issues).

Herest commented 11 months ago

Thank you for your response, I filtered the data with plink to keep only those variants with MAF greater than 0.01, but I got the same result when calculating the correlation matrix and in the training of the model. Any other suggestion?

privefl commented 11 months ago

Please have a look at Matrix::which(is.nan(corr0), arr.ind = TRUE). And report the MAF (e.g. with snp_MAF() for the variant(s) having all NaNs.

Herest commented 11 months ago

Hi again Florian, I fixed the warning about the NAs in the correlation matrix, however, I'm still unable to train the model with auto mode. I keep getting only NAs

privefl commented 11 months ago

Could you say what was the issue, for the record, for people having a similar issue in the future.

For the problem of getting NAs with LDpred2-auto, this has been discussed in other issues here; have you looked at those? Please comment there.

Herest commented 11 months ago

The issue was that there were many SNP with MAF == 0 and missing MAF (i.e. NA). I filtered them using Matrix::which(is.nan(corr0), arr.ind = TRUE) along with snp_MAF().

I also fixed the other issue with the snp_ldpred2_auto function using the solution in this comment.

Thanks a lot for your help