Open lafita opened 8 years ago
So one approach would be to define this as a nonlinear optimization problem.
Each symmetry group is characterized by one or two generators. Each generator has three degrees of freedom (e.g. euler angles or unit quaternion components), although the second generator may be constrained to certain angles relative to the first. So there are at most six rotational degrees of freedom plus three translational degrees (for the origin placement). Thus, all elements of the symmetry group can be written as functions of at most nine variables. These functions are at a minimum a bounded-order polynomial (since elements are products of the generators), but would also include trigonometric terms for an euler-angle approximation. It may be possible to avoid this using clever selection of variables (maybe using quaternions).
So we have some function _G_i(a_1,...,a9) for each group element, which is at a minimum polynomial. Our final objective function, the average RMSD after applying each element to the corresponding atoms in the alignment, is also a (trigonomial) polynomial over those points. We could in principle write out this function ahead of time for each symmetry group (or use a program like sage-math to generate it).
I know that nonlinear optimization is very hard, but it's possible it would be quite feasible for our problem. This is especially true since we would have a very good initial guess. We should check out if there are any Java libraries that use this.
The problem is described as follows:
Given a symmetry group (Cn,Dn,T,O,I) and the structural alignment (residue equivalencies) between the symmetric units (subunits, repeats), find the symmetry group axes that minimize the RMSD between the superposition (applying the group axes to each unit) of all the units.
The number of parameters to determine is the number of entries of the 4D transformation matrix of the generators of the symmetry group (12). Cyclic symmetry has a single generator and the other symmetries have two generators, but there are some restrictions that make parameter dependencies:
An analytical solution to the problem might be hard, because the generators can be applied multiple times (which squared parameters appear).