rcsb / symmetry

:ferris_wheel: Detect, analyze, and visualize protein symmetry
GNU Lesser General Public License v2.1
26 stars 16 forks source link

Poor handling of short repeats #88

Open sbliven opened 8 years ago

sbliven commented 8 years ago
  1. The --simple output gives the message "Refinement was not significant (TM=0.63)" (e.g. for 1l0s.A) if the refinement doesn't meet the minlen threshold. It should give a more informative message like "Repeats were too short (core-length=5)"
  2. (Bonus) Short repeats should be combined to make longer repeats, rather than rejecting it altogether.
lafita commented 8 years ago

I see some complications in combining short repeats into longer ones, because the final length of the repeats is only known after optimization. Thus, it would require an additional optimization after merging some of the repeats.

In addition, we would need to know the optimal factor of the number of repeats to combine. For example:

On the one hand I agree that the repeat number and length is very much dependent on the optimal self-alignment, and that is a problem for short OPEN repeats. The order detector has less information than for CLOSED repeats. On the other hand, this only applies to very short repeats (less than 15 residues), which have generally poor signal and are better detected by other algorithms (solenoid or TPR detection), so CE-Symm is not the optimal approach for detecting them.