iqbal-lab-org / gramtools

Genome inference from a population reference graph
MIT License
92 stars 15 forks source link

Beginning of read matching within allele site and crossing into a non-variant region #70

Closed ffranr closed 7 years ago

ffranr commented 7 years ago

Consider a read which matches exactly over a single variant site. The beginning of the read matches within a single allele site. The read matches across the variant site boundary into a non-variant region.

In this case the starting allele number is not included in the Sites data structure as populated by bidir_search_bwd(.). The following unit tests exemplifies what I mean: https://github.com/iqbal-lab-org/gramtools/blob/current_work_rff/libgramtools/tests/test_bidir_search.cpp#L470 (Test name: ReadStartsInSecondAllele_AlleleMissingFromSitesAlleleVector)

This issue seems to exist regardless of the parity of the variant site marker which the read crosses.

The following comments are relevant: https://github.com/iqbal-lab-org/gramtools/blob/master/src/test/unittest_bidir_search_bwd_fwd.cpp#L511 https://github.com/iqbal-lab-org/gramtools/blob/master/src/test/unittest_bidir_search_bwd_fwd.cpp#L435

ffranr commented 7 years ago

I've added a further test to https://github.com/iqbal-lab-org/gramtools/blob/current_work_rff/libgramtools/tests/test_bidir_search.cpp The test is called: ReadEndsInSecondAllele_AlleleNumIncludedInSitesAlleleVector.

This test shows that this issue does not exist when the read ends in a variant site allele.

sm0179 commented 7 years ago

this is handled when the coverages are calculated, the allele_mask is used to check whether the read starts in the middle of an allele

ffranr commented 7 years ago

Is there a technical reason as to why it can't be handled in bidir_search_bwd?

sm0179 commented 7 years ago

I don't think so, it could be done at the end, after all characters have been matched. But you'd make some redundant pushes to the allele vectors. Not sure it would actually impact performance though

ffranr commented 7 years ago

Thanks Sorina!