Mismatch percentage - Githubissues

julie-forman-kay-lab / IDPConformerGenerator

Build conformational representations of Intrinsically Disordered Proteins and Regions by a guided sampling of the protein torsion space

https://idpconformergenerator.readthedocs.io/

Apache License 2.0

19 stars 6 forks source link

Mismatch percentage #7

Open joaomcteixeira opened 5 years ago

joaomcteixeira commented 5 years ago

As stated in the REQUIREMENTS version cd2f4ab32b5ff8787cd51d5653563c645b8d1162, it is important to allow sequence mismatch to extend the search space.

Implement a percentage parameter in the search and sequence match algorithm that evaluates the match to True is the mismatch is <= the percentage.
mismatch can occur at any position in the search string under analysis.
there should be no mismatch within the minimum chunk size (comment by @AlaaShamandy )

@AlaaShamandy please add additional explanation to this discussion.

joaomcteixeira commented 5 years ago

Has discussed with @AlaaShamandy , Mismatch information does not need to be stored in the search result but, a traceability function must be implemented to retrieve that encoded mismatch information from the structural database.

AlaaShamandy commented 5 years ago

Mismatch searching has been implemented.

Our inputs would be: input_idp_sequence, database, minimum_chunk_size which defaults to 3 and mismatch_percentage which is the maximum percentage allowed, defaulting to 0.

How it's done:

1) As Joao mentioned, no mismatches within the minimum_chunk are allowed. 2) We start searching after we have found the minimum_chunk 2) At this point, two scenarios can arise: a) A match was found: we continue and go to 2 b) A mismatch was found: we add the mismatch in our result set if it's allowed and check if we have reached our maximum mismatch_percentage. If we have, then we stop. Otherwise we continue and go to 2

joaomcteixeira commented 5 years ago

@AlaaShamandy has this been implemented in #17? If so please close this issue.

We will discuss later on the usage of that mismatch parameter, if to leave it outside the user interface, if bringing it to the surface or leave it dormant. Just a question, does it slows down the search/match algorithm if mismatch_percentage is set to 0, that is allow NO mismatch?