xinhe-lab / ctwas

package for the causal TWAS project
https://xinhe-lab.github.io/ctwas/
MIT License
32 stars 12 forks source link

`index_regions` function very slow #14

Open seanjosephjurgens opened 7 months ago

seanjosephjurgens commented 7 months ago

Hi, thanks for your interesting tool!

I have been using cTWAS on a few datasets, and have got interesting results. Overall, I have been able to run most parts of the analysis relatively smoothly. Unfortunately, however, the ctwas_rss function is very slow for me. One of the major bottlenecks, even when using many many cores, is during the two steps that call on index_regions. Specifically, steps for "Adding R matrix info for chrom ..." take extremely long.

I am using an LD matrix reference dataset I created myself based on the tutorial. This includes ~8.3M variants from the EUR subset of GTEx WGS data. The GWAS data also has ~8.3M variants after filtering for overlapping variants. Using this LD matrix and GWAS dataset, all steps run reasonably, with reasonable parameters for my GWAS (prior for genes 0.014, prior for variants 0.001). However, the "Adding R matrix info for chrom ..." steps take easily 10-20 minutes per chromosome, and are run twice in the algorithm, totalling 660 minutes for just this munging step.

I have managed to use multiple cores and Forking to make all other time-consuming steps reasonably efficient. However, the "Adding R matrix info for chrom ..." munging step does not currently support parallel computing.

Can you confirm that such a run time is expected for this step? Is there anything I can do to speed this up? Eg is there a way the parallel cores can be used to speed this up?

Thanks so much in advance! -- Sean

kevinlkx commented 7 months ago

Hi Sean,

Thanks for your comments and suggestions. Yes, you are right. The current version does not support parallelization of that "adding R matrix step" step, and it is indeed time consuming. We have been working on that recently, and will have a more efficient version to be released soon.

Best, Kevin