Bug on CPU adaptative search

achacond commented 6 years ago

The FMI-index CPU adaptative search reports wrong intervals in a very specific cases. That specific case occurs when a relative short exact match covers all the read. I am guessing that the root cause could be related to the FMI LUT table implementation. Where the exact match reported (covering all the read) is smaller than number of LUT levels.

Example Mapping against the Whole Human Genome the next read: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACAGACAGC CPU reports all the BWT as result interval search space. L=0, R=5794622171 num_candidates=5794622171 GPU reports the next interval. L=3464076233, R=3464106606 num_candidates=30373

achacond commented 6 years ago

I can confirm this behaviour disabling the LUT table on adaptative FMI search for region_profile_query_character. In that case mapper reports expected results on SAM output. Disabling lines from 120 to 126 on file src/filtering/region_profile/region_profile.c

smarco commented 6 years ago

I have check on this report. The LUT table implemented within the CPU workflow of the mapper uses a restricted query depth threshold. For that reason, any query to the LUT that leads to an interval with more than 20 candidates (region threshold) is not computed. This minimum depth threshold for the Human Genome is 9 characters. Thus, there is no interest in computing the query for that interval accurately. Moreover, if computed like the GPU does, this interval leading to 30373 candidates would be discarded in the downstream mapping workflow. For that reason, I close the issue as it doesn't affect the expected behaviour of the mapper.

smarco / gem3-mapper

Bug on CPU adaptative search #3