samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
652 stars 240 forks source link

Should changing the haplotype setting result in a different number of sequences in the final consensus? #1661

Open JamieAlfieri opened 2 years ago

JamieAlfieri commented 2 years ago

I ran similar commands with the same data, only changing the -H setting.

bcf consensus -H 1 -f reference.fa subject.vcf.gz > consensus_H1.fasta bcf consensus -H 2 -f reference.fa subject.vcf.gz > consensus_H2.fasta bcf consensus -H A -f reference.fa subject.vcf.gz > consensus_HA.fasta

subject_H1.fasta contains 8063 sequences (same as the reference) subject_H2.fasta contains 1946 sequences. subject_HA.fasta contains 1946 sequences.

I thought the consensus sequence will contain the reference allele for homozygous genotypes (if there is not an alternate allele (-H A) or if there is not a second allele (-H 2)). But if this were the case, the number of sequences should be equal.

pd3 commented 2 years ago

Hmm, I don't think it should and can't think of a good reason where the bug would come from. Are there many haploid genotypes in the VCF? Any chance you could provide a small test case to reproduce the problem?

Importantly, what version of bcftools are you using?