Open lafita opened 8 years ago
For now, we are postponing this issue because there is not a clear use case and there are other more important feature requests (with higher priority). In the discussion about it, we concluded:
SubstitutionMatrix
for the sequence score. (Note that the CECalculator
does not combine the scores in this way, but rather sums a constant times the sequence score directly to the structure score, so to be consistent we should change that function too).Score = (Sstr / MaxSstr) * (1 - lambda) + (Sseq / MaxSseq) * lambda
In order to implement this feature, the following needs to be done:
MultipleAlignmentTools
that converts a MultipleAlignment
to a MultipleSequenceAlignment
. A template exists now.MultipleAlignmentScorer
that calculates the sequence alignment score of a MultipleAlignment
(possibly converting it to a MSA and using the biojava scoring functions in alignment module).CeSymmParameters
. When computing the score of a MultipleAlignment
in McOptimizer
, use the appropiate ratio and normalization as described here.Nice summary, @lafita. I do think that the normalization factors could be worked out by looking at the distributions of scores over a large set of pairwise comparisons. This might be a nice student project.
If the sequence similarity option is activated in the CE self-alignment, the sequence contribution should be conserved in the optimization step (otherwise the effect of the option can be lost).