privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
183 stars 43 forks source link

Error in snp_ldsc #464

Closed KK520520 closed 7 months ago

KK520520 commented 7 months ago

Hi, I have been suffering from the error message "Error in if (max(abs(pred - pred0)) < 1e-06) break: missing value where TRUE/FALSE needed" while using (ldsc <- with(df_beta, snp_ldsc(ld, length(ld), chi2 = (beta / beta_se)^2, sample_size = n_eff, blocks = NULL))) after computing the LD correlation matrix as instructed.

I double checked those variants between which NA/NaN were derived in LD matrix calculation and for some cases, it could be explained by the lack of variation in one variant but for many other cases, I don't know why as pearson correlation coefficient can be computed by cor().

Would you please kindly advise why these occur and how to get them solved?

Thank you

privefl commented 7 months ago

So, you still have some NaNs in the LD matrix after having filtered out very low MAFs?

KK520520 commented 7 months ago

So, you still have some NaNs in the LD matrix after having filtered out very low MAFs?

That's correct. My input genotype data and summary statistics have both been QCed to remove variants with low MAF. I, however, am indeed using a small data subset (100 individuals with ~70k variants generated from pruning) for code testing. But I doubt it is due to the smaller size of data?

privefl commented 7 months ago

If you have 1000 individuals with MAF 1%, there is a non-zero prob that a subset of 100 individuals will have maf 0. Also, 100 people is too few to compute an LD matrix.

KK520520 commented 7 months ago

If you have 1000 individuals with MAF 1%, there is a non-zero prob that a subset of 100 individuals will have maf 0. Also, 100 people is too few to compute an LD matrix.

I see your point. Will re-try with larger subsample and hopefully it's gonna get solved automatically. Super!

KK520520 commented 7 months ago

If you have 1000 individuals with MAF 1%, there is a non-zero prob that a subset of 100 individuals will have maf 0. Also, 100 people is too few to compute an LD matrix.

I see your point. Will re-try with larger subsample and hopefully it's gonna get solved automatically. Super!

Just to update: the issues are completely solved when I expand the test sample to N = 1000. All worked well! Thanks :)