Closed jielab closed 7 years ago
hi, i just tested and found that the problem is with the --siteFile option. please see the two screenshots below, it took 43356 seconds to run a regression on 93 SNPs when I used --siteFile. But when i use bcftools first to extract those 93 SNPs to create a new VCF, which takes a minute, then it only took 109 second to run the same analysis
This issue is essentially #26 .
As I said in #26, when there are just a handful of SNPs, use --rangeFile
will be more efficient.
Since --covar
is not the root cause for this issue, I close it for now.
Dear Xiaowei:
Please see the two screenshots below. It took me 54,799 seconds to analyze 225,216 samples and 94,238 SNPs, for a 5MB imputed chunk.
Now, when I use "--siteFile" to limit my analysis to 92 SNPs that are genome-wide significant in a 1MB region and use "--covar" to condition on the imputed dosage of the lead SNP. After 12 hours, the message "Analysis started" has not shown up yet.
So, my SNP number goes down from 94,238 to 92, but the running time might be even longer, after I simply used one covariate! I don't know if RVTESTS can be optimized for this type of analysis. If not, can I adjust the covariate value in R first to create phenotype residual and then run RVTESTS without the "--covar" option?
Thank you & best regards, Jie