zhanxw / rvtests

Rare variant test software for next generation sequencing data
129 stars 41 forks source link

Is using setFile slower than geneFile? #69

Open zx8754 opened 5 years ago

zx8754 commented 5 years ago

Sorry, didn't test it thoroughly , but it just "feels" slower, maybe you know the reason? (If not I can create reproducible example.)

I tried standard file as input: --geneFile refFlat_hg19.txt.gz

Then, I created subset of above file with custom filters. Now, using setFile with my custom input set file, instead of geneFile seems slower: --setFile refFlat_hg19_customFilter.txt

Is this expected?

zhanxw commented 5 years ago

That depends on the content of the set file. Internally, the option --setFile will use the index file to read each variant specified, and the option --geneFile will use the index file to locate the gene regions and then process each variant. In your case, maybe --setFile has lots of variants. Since each variant will be look up, the total computation time can be longer than the --geneFile.

zx8754 commented 5 years ago

To clarify, --setFile refFlat_hg19_customFilter.txt is just a subset of refFlat_hg19.txt.gz file. There are no variants, just gene start stop, e.g.:

A1BG    19:58858171-58864865    chr19   58858171    58864865
A1CF    10:52559168-52645435    chr10   52559168    52645435
zhanxw commented 5 years ago

Thanks. In this case, I don't expect --setFile is much slower than --geneFile.