Open fpbarthel opened 5 years ago
Thanks. I agree - vcf2vcf's genotyping feature should be sped up with a BED file. I use a similar strategy to speed up samtools faidx
to pull flanking bps. But that was easy since samtools faidx
can take many regions in command-line.
Speeding up vcf2vcf genotyping will need to remain in my backlog. It will be a while till I can get to it. I'll leave this issue open. In the meantime, look at GetBaseCountsMultiSample. It accepts either VCF or MAF as input, and produces a MAF-like output file.
In vcf2vcf output, DP
is for total depth. In other VCF specs, DP4
lists 4 values for fwd/rev read counts of REF/ALT alleles, but it does not work for multi-allelic ALTs. So mpileup uses ADF
and ADR
instead to represent fwd/rev read counts for all alleles.
One of the really nice features of
vcf2vcf
is the genotyping feature. I don't know any other software that really has this capability. I've tried usingfreebayes
force-calling which is very fast but it doesn't work as expected (skips a handful of variants), andbcftools
/samtools
don't seem to offer this possibility without additional post-processing (asvcf2vcf
essentially does)However, I've found
vcf2vcf
to be too slow when genotyping a large number of variants. I wonder if this could be made more efficient by usingsamtools mpileup
on a bed file covering all of the features in the VCF file and post-processing the results, rather than callingsamtools mpileup
separately for each variant in the VCF file?Unrelated, but wondering why the
DP
tag is chosen overDP4
to output read depth?