rcsb / symmetry

:ferris_wheel: Detect, analyze, and visualize protein symmetry
GNU Lesser General Public License v2.1
26 stars 16 forks source link

Use sequence similarity contribution in the MC score #73

Open lafita opened 8 years ago

lafita commented 8 years ago

If the sequence similarity option is activated in the CE self-alignment, the sequence contribution should be conserved in the optimization step (otherwise the effect of the option can be lost).

lafita commented 8 years ago

For now, we are postponing this issue because there is not a clear use case and there are other more important feature requests (with higher priority). In the discussion about it, we concluded:

  1. The structure and sequence scores need to be combined with a parameter that the user can choose.
  2. Let's call this parameter lambda, which ranges from 0 to 1 and where 0 means only the structure score is considered and 1 means that only the sequence score is considered. Values in between define the ratio of importance between structure and sequence scores, so that lambda equal to 0.5 means that both scores have equal importance.
  3. For the latter to apply, the scores need to be normalized by the maximum score of an aligned position, which is the parameter C of structural similarity function and the maximum value of the SubstitutionMatrix for the sequence score. (Note that the CECalculator does not combine the scores in this way, but rather sums a constant times the sequence score directly to the structure score, so to be consistent we should change that function too).
  4. The function would look like:
  Score = (Sstr / MaxSstr) * (1 - lambda) + (Sseq / MaxSseq) * lambda

In order to implement this feature, the following needs to be done:

  1. Implement a method in MultipleAlignmentTools that converts a MultipleAlignment to a MultipleSequenceAlignment. A template exists now.
  2. Add a method to MultipleAlignmentScorer that calculates the sequence alignment score of a MultipleAlignment (possibly converting it to a MSA and using the biojava scoring functions in alignment module).
  3. Add a new parameter for lambda in CeSymmParameters. When computing the score of a MultipleAlignment in McOptimizer, use the appropiate ratio and normalization as described here.
sbliven commented 8 years ago

Nice summary, @lafita. I do think that the normalization factors could be worked out by looking at the distributions of scores over a large set of pairwise comparisons. This might be a nice student project.