Open mbhall88 opened 6 months ago
These variants are not overlapping. For example, the insertion in the first example inserts A between the positions 28728 and 28729.
Ahhh okay I see what you mean with the first example. But am struggling to understand how the second example isn't overlapping?
It's exactly the same case. The first happens between 30586 and 30587, 30588 is not affected by the insertion.
Alright, so I did some playing and testing and it seems the order of the variants impacts what bcftools consensus
does and how bcftools
views them in terms of overlapping variants.
As a way of illustrating, let's reframe the variants for clarity and provide an example fasta sequence
First example:
>chromosome
GTC
chromosome 2 62e866df T A . PASS . GT 1/1
chromosome 2 e4388ae3 T TA . PASS . GT 1/1
bcftools consensus
would turn this into
>chromosome
GAAC
BUT, if I switch the order of the variants I get
>chromosome
The site chromosome:2 overlaps with another variant, skipping...
GTAC
And indeed, if I run bcftools +remove-overlaps
on the first example, with the order swapped, both variants are removed from the VCF.
Second example:
>chromosome
CGTAGT
chromosome 3 60f6a01c T TAACAACACTTGG . PASS . GT 1/1
chromosome 5 a1f621d4 G A . PASS . GT 1/1
bcftools consensus
would turn this into
>chromosome
CGTAACAACACTTGGAAT
The ordering situation from the first example obviously doesn't apply here given the positions are different.
tl;dr make sure you run bcftools +remove-overlaps
before running bcftools consensus
Me again.
I've just run into an example where bcftools +remove-overlaps
doesn't deem the variants overlapping, but bcftools consensus
does
chromosome_2 1561582 0cb9eac6 A ATTTCTTTTGATAAGAAAGTATTAAGTG . PASS . GT 1/1
chromosome_2 1561582 4bc9851a A AT . PASS . GT 1/1
Sorry to be a pest @pd3, not sure if this dropped off your radar (I imagine you get a lot of notifications)?
Is my last example a bug?
I'm not sure if I am misunderstanding how
bcftools +remove-overlaps
works, but I still have some overlapping variants in the output afterwardse.g.,
and