Closed ktmeaton closed 1 year ago
On my first run-through of validation, no positive or negative controls in controls
or controls-gisaid
fail this filter.
In some more expanded testing, this is helping to remove some delta/delta false positives.
I came across a large number of sequences that came back as highly confident BA.5.2/BA.5.3 recombinants. Except, there is substantial allele conflict (intermissions) in the 3' end of the genome (16935 onwards). I realized that I didn't implement logic to use alleles outside the identified regions.
I think these should be considered intermissions, in the sense that they conflict with the evidence for recombination. Not quite a direct conflict as a mismatched allele in a parental region. But still, they are "noisy".
So far, all designated recombinants pass this new logic EXCEPT XAV
(Issue #104). Previously, there was the ref allele 21789C
that lengthed out the BA.2 section. Now, that is no longer BA.2 diagnostic (maybe BA.2.75 has thrown that off?).
However, if we set the populations to BA.2
and BA.5.2
, the BA.2
signal is strengthened, but so is the noise slightly.
I'm weighing too options:
BA.5.2
be a candidate parent.XAV
as an auto-pass based on the 3' noise and numerous reversions.There is an edge case where this will cause false negatives, when there are additional spurious parents reported sc2rf. For example XBL. My proposed solution is to disable the intermission_allele_ratio filter when there were more parents originally than the number of filtered parents.
I wonder if the ratio of intermissions to diagnostic alleles could be useful to rule out false positives. The filter could be that there must be fewer intermissions than alleles from the "minor" parent.
In this example, there are 3 alleles that could be oming from a "minor" parent
BA.2.3.20
(12310G
,16616C
,17678T
). And most strains have 3 intermissions (6979T
,27012C
,27513C
).