privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
186 stars 44 forks source link

`Integer overflow` warning in `snp_cor` function #446

Closed filosi closed 1 year ago

filosi commented 1 year ago

When computing a large LD_matrix I got into the warning message:

Warning message:
In (function (Gna, ind.row = rows_along(Gna), ind.col = cols_along(Gna),  :
integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'

producing an invalid dsCmatrix with NA values in the p slots. This is produced by the snp_cor function, and I believe the stacktrace comes from the bigsnpr/R/corr.R:46 line of code.

In my case it happened with matrix with more than 200000 rows and columns. After QC on the genotype data and gwas data the matching variants are > 6ml, and that's why a huge LD matrix should be computed.

privefl commented 1 year ago

It means there are too many non-zero correlations being stored. Are you using a 3cM window?

filosi commented 1 year ago

Yes indeed

privefl commented 1 year ago

And you have this issue with 200,000 variants?

filosi commented 1 year ago

With all the chromosomes with > 200k variants, yes.

privefl commented 1 year ago

Yes, you cannot use too many variants (e.g. 400K per chromosome); this would give you too many non-zero correlations. This has been discussed in several issues here.

What do you want to use it for? Prediction? Something else?

filosi commented 1 year ago

I just need to use in prediction mode for now.

privefl commented 1 year ago

Yes, so it is not recommended to use either LDpred2 or lassosum2 with so many variants for now. Please use either HapMap3 or HapMap3+ variants as recommended in the tutorial.