zhanxw / rvtests

Rare variant test software for next generation sequencing data
131 stars 41 forks source link

extreme slow speed for "--covar" #25

Closed jielab closed 7 years ago

jielab commented 7 years ago

Dear Xiaowei:

Please see the two screenshots below. It took me 54,799 seconds to analyze 225,216 samples and 94,238 SNPs, for a 5MB imputed chunk.

Now, when I use "--siteFile" to limit my analysis to 92 SNPs that are genome-wide significant in a 1MB region and use "--covar" to condition on the imputed dosage of the lead SNP. After 12 hours, the message "Analysis started" has not shown up yet.

So, my SNP number goes down from 94,238 to 92, but the running time might be even longer, after I simply used one covariate! I don't know if RVTESTS can be optimized for this type of analysis. If not, can I adjust the covariate value in R first to create phenotype residual and then run RVTESTS without the "--covar" option?

Thank you & best regards, Jie

capture

capture2

jielab commented 7 years ago

hi, i just tested and found that the problem is with the --siteFile option. please see the two screenshots below, it took 43356 seconds to run a regression on 93 SNPs when I used --siteFile. But when i use bcftools first to extract those 93 SNPs to create a new VCF, which takes a minute, then it only took 109 second to run the same analysis

capture1

capture2

zhanxw commented 7 years ago

This issue is essentially #26 . As I said in #26, when there are just a handful of SNPs, use --rangeFile will be more efficient.

zhanxw commented 7 years ago

Since --covar is not the root cause for this issue, I close it for now.