meetU-MasterStudents / 2019---2020-partage

For exchanging material and doc
2 stars 3 forks source link

Glob-Loc algorithm in a nutshell #21

Open elolaine opened 4 years ago

elolaine commented 4 years ago

The glob-loc alignement algorithm is a variant of the global alignment algorithm. It applies when the 2 sequences are of different length. That's typically what you are going to have, when for example your query is much longer than the HOMSTRAD family you want to find, and reciprocally.

Let's say we want to align s1 with s2 where the length of s1 is smaller than the length of s2. In the alignment matrix, the columns correspond to the letters from s2 and the rows correspond to the letters from s1.

The 2 notable differences from the global alignment algorithm are the following:

(1) It costs nothing to put gaps before the start of the smallest sequence. In other words, we put zeros in the first row (corresponding to any letter from s2 and the gap in s1);

(2) Instead of doing the backtracking from the cell in the right low corner, we do it from the cell displaying the maximum score in the last row (corresponding to the last letter of s1).

See the following snapshots for a detailed description.

image image
florianecoulmance commented 4 years ago

Hello Elodie,

My team and me did the globloc alignment.

However, we added in the new score matrix a random value of match and mismatch.

We plan to modify this by not adding any match score and just considering that match score = the score in our profile_profile previously created.

But for the mismatch we are not sure if we are suppose to take it such as mismatch = - (score in profile_profile). Or should we take a random value ? Or if it has to be determined, how do we do this ?

Best regards,

Floriane and team 3

elolaine commented 4 years ago

Hi Floriane,

As you are dealing with profiles, and not individual sequences, there is no notion of match nor mismatch. There is only a score computed between one position from a profile and one position from the other profile. A "match" means that you have the same letter at the two positions, but here you do not have letters, you have vectors of probabilities. So there is no need to put a minus sign for a "mismatch"... what would be a mismatch? Maybe I'm missing something here...? And what has this to do with the globloc alignment? The only difference with the global alignment in the scoring matrix is that you add some more zeros... so I don't see match nor mismatch playing a role. Be careful that the info given above is valid for a sequence-sequence alignment and must be adapted for a profile-profile alignment.

I hope I could help. Don't hesitate to be more specific.

Best Elodie