Closed ffranr closed 7 years ago
I've added a further test to https://github.com/iqbal-lab-org/gramtools/blob/current_work_rff/libgramtools/tests/test_bidir_search.cpp
The test is called: ReadEndsInSecondAllele_AlleleNumIncludedInSitesAlleleVector
.
This test shows that this issue does not exist when the read ends in a variant site allele.
this is handled when the coverages are calculated, the allele_mask is used to check whether the read starts in the middle of an allele
Is there a technical reason as to why it can't be handled in bidir_search_bwd
?
I don't think so, it could be done at the end, after all characters have been matched. But you'd make some redundant pushes to the allele vectors. Not sure it would actually impact performance though
Thanks Sorina!
Consider a read which matches exactly over a single variant site. The beginning of the read matches within a single allele site. The read matches across the variant site boundary into a non-variant region.
In this case the starting allele number is not included in the
Sites
data structure as populated bybidir_search_bwd(.)
. The following unit tests exemplifies what I mean: https://github.com/iqbal-lab-org/gramtools/blob/current_work_rff/libgramtools/tests/test_bidir_search.cpp#L470 (Test name:ReadStartsInSecondAllele_AlleleMissingFromSitesAlleleVector
)This issue seems to exist regardless of the parity of the variant site marker which the read crosses.
The following comments are relevant: https://github.com/iqbal-lab-org/gramtools/blob/master/src/test/unittest_bidir_search_bwd_fwd.cpp#L511 https://github.com/iqbal-lab-org/gramtools/blob/master/src/test/unittest_bidir_search_bwd_fwd.cpp#L435