Open JamieAlfieri opened 2 years ago
Hmm, I don't think it should and can't think of a good reason where the bug would come from. Are there many haploid genotypes in the VCF? Any chance you could provide a small test case to reproduce the problem?
Importantly, what version of bcftools are you using?
I ran similar commands with the same data, only changing the -H setting.
bcf consensus -H 1 -f reference.fa subject.vcf.gz > consensus_H1.fasta
bcf consensus -H 2 -f reference.fa subject.vcf.gz > consensus_H2.fasta
bcf consensus -H A -f reference.fa subject.vcf.gz > consensus_HA.fasta
subject_H1.fasta contains 8063 sequences (same as the reference) subject_H2.fasta contains 1946 sequences. subject_HA.fasta contains 1946 sequences.
I thought the consensus sequence will contain the reference allele for homozygous genotypes (if there is not an alternate allele (-H A) or if there is not a second allele (-H 2)). But if this were the case, the number of sequences should be equal.