Open joaomcteixeira opened 5 years ago
Has discussed with @AlaaShamandy , Mismatch information does not need to be stored in the search result but, a traceability function must be implemented to retrieve that encoded mismatch information from the structural database.
Mismatch searching has been implemented.
Our inputs would be: input_idp_sequence
, database
, minimum_chunk_size
which defaults to 3 and mismatch_percentage
which is the maximum percentage allowed, defaulting to 0.
How it's done:
1) As Joao mentioned, no mismatches within the minimum_chunk are allowed. 2) We start searching after we have found the minimum_chunk 2) At this point, two scenarios can arise: a) A match was found: we continue and go to 2 b) A mismatch was found: we add the mismatch in our result set if it's allowed and check if we have reached our maximum mismatch_percentage. If we have, then we stop. Otherwise we continue and go to 2
@AlaaShamandy has this been implemented in #17? If so please close this issue.
We will discuss later on the usage of that mismatch parameter, if to leave it outside the user interface, if bringing it to the surface or leave it dormant. Just a question, does it slows down the search/match algorithm if mismatch_percentage
is set to 0
, that is allow NO mismatch?
As stated in the REQUIREMENTS version cd2f4ab32b5ff8787cd51d5653563c645b8d1162, it is important to allow sequence mismatch to extend the search space.
percentage
parameter in the search and sequence match algorithm that evaluates the match toTrue
is the mismatch is<=
thepercentage
.@AlaaShamandy please add additional explanation to this discussion.