zhengxwen / SeqArray

Data management of large-scale whole-genome sequence variant calls (Development version only)
http://www.bioconductor.org/packages/SeqArray
43 stars 12 forks source link

Suggestion: seqVCF2GDS could use index file to get total number of variants #53

Open jemunro opened 5 years ago

jemunro commented 5 years ago

Generally VCF files are accompanied by an index file, and from this file the number of variants can be obtained. For example with bcftools: bcftools index --nrecords <in.vcf.gz>

This could be used to speed up the seqVCF2GDS function by avoiding the variant counting step. As this step is currently single threaded, it contributes significantly to the run-time when converting large VCF files.