privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
186 stars 44 forks source link

Question about not being able to take full advantage of CPU and memory #223

Closed David-OSS closed 3 years ago

David-OSS commented 3 years ago

Hi,

We are trying to impute UK BioBank bed data but we found that the bigsnpr only consumed 3-5% CPU and very few memory. Our machine has 44 physical cores and over 300G memory. Is this a known issue or by design ? any way to improve the HW utilization ?

Thanks

privefl commented 3 years ago

Which of the imputation functions are you talking about?

Why not directly using BGEN data?

David-OSS commented 3 years ago

snp_fastImpute. As for the BGEN, does it have any missing value ?

Thanks

privefl commented 3 years ago

snp_fastImpute() should use 100% CPU (if 100% CPU = 1 core). IIRC, you can parallelize over chromosomes, and I think there is a discussion on how to parallelize further in another issue here.

UKBB BGEN data should not have any missing values, as it is already imputed data.

privefl commented 3 years ago

Any update on this?