Open FarnazH opened 3 years ago
The following markdown is rendered in the attached PDF.
The following notes on generalized Procrustes Analysis are based on: I. Borg and Patrick J. F. Groenen, Modern Multidimensional Scaling (2nd ed.) (Springer, NY, 2005). Chapter 21.
While scaling and translation are complicated (indeed, they are complicated for any non-orthogonal transformation), in Procrustes
we've elected to use a simple scaling/translation methodology. So we will assume that the input matrices, $\mathbf{A}_k$ are scaled and translated by the normal rules, if that is desired. The following method corresponds to the first approach mentioned in the book of Borg and Groenen.
The objective function is obtained by transforming multiple matrices in generalized Procrustes analysis. So:
$$
\underbrace{\min}_{\mathbf{T}_1, \mathbf{T}_2, \ldots \mathbf{T}K} \sum{1 \le k < l \le N} \left\Vert \mathbf{A}_k \mathbf{T}_k - \mathbf{A}_l \mathbf{T}l \right\Vert^2
= \underbrace{\min}{\mathbf{T}_1, \mathbf{T}_2, \ldots \mathbf{T}K} \sum{1 \le k < l \le N} \text{Tr} \left[\left(\mathbf{A}_k \mathbf{T}_k - \mathbf{A}_l \mathbf{T}_l \right)^\dagger \left(\mathbf{A}_k \mathbf{T}_k - \mathbf{A}_l \mathbf{T}l \right) \right]
$$
can be solved by rewriting this as
$$
\underbrace{\min}{\mathbf{T}_1, \mathbf{T}_2, \ldots \mathbf{T}K} \frac{1}{2}
\left\Vert \sum{1 \le k \le K} \left(\mathbf{A}_k \mathbf{T}k - \sum{\substack{1 \le l \le N \ l \ne k}} \mathbf{A}_l \mathbf{T}l \right) \right\Vert^2
$$
Then you can decide to optimize this expression either by reference to the mean of the matrices or by optimizing one transformation at a time. The former is arguably easier to implement (and generalize), though it isn't what we are doing right now. In that case, for each $k=1,2,\ldots,K$, one solves the single-matrix Procrustes problem,
$$
\underbrace{\min}{\mathbf{T}_k} \left\Vert \left(\mathbf{A}_k \mathbf{T}k^{(i)} - \sum{\substack{1 \le l \le N \ l \ne k}} \mathbf{A}_l \mathbf{T}_l^{(i-1)} \right) \right\Vert^2
$$
where it is sensible (but not required) to initialize the transformations to the identity, $\mathbf{T}^{(0)}_k = \mathbf{I}$. As long as the optimization is linear, this algorithm converges, and is equivalent to the usual approach.
The nice thing about this algorithm is that it generalizes to any generalized Procrustes problem. If you pass multple matrices to any Procrustes method, then you end up with the generic problem $$ \underbrace{\min}_{\substack{\mathbf{T}_1, \mathbf{T}_2, \ldots \mathbf{T}_K \ \mathbf{S}_1, \mathbf{S}_2, \ldots \mathbf{S}K}} \sum{1 \le k < l \le N} \left\Vert \mathbf{S}_k \mathbf{A}_k \mathbf{T}_k - \mathbf{S}_l\mathbf{A}_l \mathbf{T}l \right\Vert^2 $$ which can be (approximately) solved by iteratively performing, for $k=1,2,\ldots,K$, $$ \underbrace{\min}{\mathbf{T}_k} \left\Vert \left(\mathbf{S}_l^{(i)}\mathbf{A}_k \mathbf{T}k^{(i)} - \sum{\substack{1 \le l \le N \ l \ne k}} \mathbf{S}_l^{(i-1)}\mathbf{A}_l \mathbf{T}_l^{(i-1)} \right) \right\Vert^2 $$ until it converges.
So the short answer is that any Procrustes method can be generalized; the same algorithm(s) work in all cases. But the convergence is only guaranteed for one-sided methods I think, and maybe even only one-sided orthogonal. Other cases are less prevalent, though by no means verboten.
There was similar work a long time ago by Carbo-Dorca, who wanted to find a way to align/pose molecules so that a single transformation (for each molecule) maximized the resemblance of all molecule-pairs, collectively. I don't know the reference I can write him for it.
Thanks for the notes, making things clear.
@PaulWAyers and me discussed this before and we can generalize this by using all the modules which have been built. This is going to be a very interesting feature. It should be fast for single-sided Procrustes but can be slow for two-sided Procrustes methods, such as two-sided permutation Procrustes.
So, the short question is do we need to generalize it or shall we just implement the generalized orthogonal Procrustes for now? I can implement it once we agreed on which algorithm we are going to generalize. Thanks.
@PaulWAyers @FarnazH
It is a good question. I would probably attempt to implement a very generalized method, with
Then you could
A new update on the notes (see attached PDF).
I strongly feel that this issue should not be implemented for this release; it is not that it is difficult to write code for this issue, but writing well-tested well-documented code is going to take some effort. What is currently there is going to be irrelevant to this more general approach, which will need to be written from scratch.
The documentation of the code should indicate that this is a preliminary untested implementation. The paper should mention generalized Procrustes as a future prospect (perhaps together with Issue #https://github.com/theochem/procrustes/issues/49 and https://github.com/theochem/procrustes/issues/32 ).
Note that this issue should only be addressed after all other one- and two-sided methods are in great shape, because it uses them internally!!
====
The following notes on generalized Procrustes Analysis are based on: I. Borg and Patrick J. F. Groenen, Modern Multidimensional Scaling (2nd ed.) (Springer, NY, 2005). Chapter 21.
While scaling and translation are complicated (indeed, they are complicated for any non-orthogonal transformation), in Procrustes
we've elected to use a simple scaling/translation methodology. So we will assume that the input matrices, $\mathbf{A}_k$ are scaled and translated by the normal rules, if that is desired. The following method corresponds to the first approach mentioned in the book of Borg and Groenen.
The objective function is obtained by transforming multiple matrices in generalized Procrustes analysis. So:
$$
\underbrace{\min}_{\mathbf{T}_1, \mathbf{T}_2, \ldots \mathbf{T}K} \sum{1 \le k < l \le N} \left\Vert \mathbf{A}_k \mathbf{T}_k - \mathbf{A}_l \mathbf{T}l \right\Vert^2
= \underbrace{\min}{\mathbf{T}_1, \mathbf{T}_2, \ldots \mathbf{T}K} \sum{1 \le k < l \le N} \text{Tr} \left[\left(\mathbf{A}_k \mathbf{T}_k - \mathbf{A}_l \mathbf{T}_l \right)^\dagger \left(\mathbf{A}_k \mathbf{T}_k - \mathbf{A}_l \mathbf{T}l \right) \right]
$$
can be solved by rewriting this as
$$
\underbrace{\min}{\mathbf{T}_1, \mathbf{T}_2, \ldots \mathbf{T}K} \frac{1}{2}
\left\Vert \sum{1 \le k \le K} \left(\mathbf{A}_k \mathbf{T}k - \sum{\substack{1 \le l \le N \\ l \ne k}} \mathbf{A}_l \mathbf{T}l \right) \right\Vert^2
$$
Then you can decide to optimize this expression either by reference to the mean of the matrices or by optimizing one transformation at a time. The former is arguably easier to implement (and generalize), though it isn't what we are doing right now. In that case, for each $k=1,2,\ldots,K$, one solves the single-matrix Procrustes problem,
$$
\underbrace{\min}{\mathbf{T}_k^{(i)}} \left\Vert \left(\mathbf{A}_k \mathbf{T}k^{(i)} - \sum{\substack{1 \le l \le N \\ l \ne k}} \mathbf{A}_l \mathbf{T}_l^{(i-1)} \right) \right\Vert^2
$$
where it is sensible (but not required) to initialize the transformations to the identity, $\mathbf{T}^{(0)}_k = \mathbf{I}$. As long as the optimization is linear, this algorithm converges, and is equivalent to the usual approach. This notation is a little bad, because in actually, for $l < k$, you would be using the latest transformation, $\mathbf{T}_l^{(i)}$ in the summation.
The nice thing about this algorithm is that it generalizes to any generalized Procrustes problem. If you pass multple matrices to any Procrustes method, then you end up with the generic problem $$ \underbrace{\min}_{\substack{\mathbf{T}_1, \mathbf{T}_2, \ldots \mathbf{T}_K \\ \mathbf{S}_1, \mathbf{S}_2, \ldots \mathbf{S}K}} \sum{1 \le k < l \le N} \left\Vert \mathbf{S}_k \mathbf{A}_k \mathbf{T}_k - \mathbf{S}_l\mathbf{A}_l \mathbf{T}l \right\Vert^2 $$ which can be (approximately) solved by iteratively performing, for $k=1,2,\ldots,K$, $$ \underbrace{\min}{\mathbf{T}_k^{(i)}} \left\Vert \left(\mathbf{S}_k^{(i-1)}\mathbf{A}_k \mathbf{T}k^{(i)} - \sum{\substack{1 \le l \le N \\ l \ne k}} \mathbf{S}_l^{(i-1)}\mathbf{A}_l \mathbf{T}l^{(i-1)} \right) \right\Vert^2 $$ then $$ \underbrace{\min}{\left(\mathbf{S}_k^{(i)}\right)^{\dagger}} \left\Vert \left(\left(\mathbf{T}_k^{(i)}\right)^{\dagger}\mathbf{A}_k^{\dagger} \left(\mathbf{S}k^{(i)}\right)^{\dagger} - \sum{\substack{1 \le l \le N \\ l \ne k}} \left(\mathbf{T}_l^{(i-1)}\right)^{\dagger}\mathbf{A}_l^{\dagger} \left(\mathbf{S}_l^{(i-1)}\right)^{\dagger} \right) \right\Vert^2 $$ until it converges. This notation is a little bad, because in actually, for $l < k$, you would be using the latest transformations, $\mathbf{T}_l^{(i)}$ and $\mathbf{S}_l^{(i)}$ in the summation. But breaking the sum into two parts just to make this explicit seems undesirable.
$$ \underbrace{\min}_{\substack{\mathbf{T}_1, \mathbf{T}_2, \ldots \mathbf{T}_K \\ \mathbf{S}_1, \mathbf{S}_2, \ldots \mathbf{S}K}} \sum{1 \le k < l \le N} \left\Vert \mathbf{S}_k \mathbf{A}_k \mathbf{T}_k - \mathbf{S}_l\mathbf{A}_l \mathbf{T}_l \right\Vert^2 $$
Note: the traditional problem corresponds to $N_A = 2$ with $\mathbf{S}_2 = \mathbf{T}_2 = \mathbf{I}$.
A
A list of matrices, ${\mathbf{A}}_{k=1}^{N_A}$. These are matrices to be transformed.S_type
A list of length $N_A$ specifying the types of left-hand-side transformations for each $\mathbf{A}_k$ matrix. If only one element (not a list) is passed, ideally we could just assume every element in the list was equal to that element. The types we currently support are:
T_type
A list of length $N_A$ specifying the types of right-hand-side transformations. If only one element (not a list) is passed, ideally we could just assume that every element in the list was equal to that element. The types we currently support are:S_eq_T
A list of logical variables indicating whether the left-hand-side and right-hand-side transformations are the same, $\mathbf{S}_k = \mathbf{T}_k$. Optional argument; default to False
. S
and T
Initial guesses for the $\mathbf{S}_k$ and $\mathbf{T}_k$. Default to identity. We do not help users set this up; if users want to do some fancy-schmancy permutation Procrustes guessing, they are responsible for doing that themselves, potentially using utilities we have already provided. We may provide utilities (later) to help with this but it is not part of this issue/functionality.S_type
and T_type
should be non-identity, or otherwise there is nothing to do.S_eq_T(k) = True
, then since the two transformations are supposed to be the same, it must be that S_type(k) = T_type(k)
. Otherwise print an informative error message and fail.This algorithm is a greedy-ish fixed-point iteration algorithm. When the problem is convex, the solution is found. Otherwise a local minima is found.
S_eq_T(i) = True
, then use the two-sided Procrustes method of type $\mathbf{T}{\text{type},k}$ to solve the problem:
$$
\underbrace{\min}{\mathbf{T}_k^{(i)}} \left\Vert \left(\left(\mathbf{T}_k^{(i)}\right)^{\dagger}\mathbf{A}_k \mathbf{T}k^{(i)} - \sum{\substack{1 \le l \le N \\ l \ne k}} \mathbf{S}_l^{(i-1)}\mathbf{A}_l \mathbf{T}_l^{(i-1)} \right) \right\Vert^2
$$S_eq_T(i) = True
, then use the two-sided Procrustes method of type $\mathbf{T}{\text{type},k}$ to solve the problem:
$$
\underbrace{\min}{\mathbf{T}_k^{(i)}} \left\Vert \left(\left(\mathbf{T}_k^{(i)}\right)^{\dagger}\mathbf{A}_k \mathbf{T}k^{(i)} - \sum{\substack
{1 \le l \le N \ \ l \ne k}} \mathbf{S}_l^{(i-1)}\mathbf{A}_l \mathbf{T}_l^{(i-1)} \right) \right\Vert^2
$$We may need to add support of missing values for generalized Procrustes analysis at some point, #110.
The missing-values correction can be appended as an "outer loop" where one fixes the missing values for each matrix, in turn, and loops over the Procrustes problem for a selection of the missing values. The initial choice of the missing values could be the mean of the values that are present for that entry (of the other matrices) or zeros.
A short-term enhancement is to generalize our implementation to permit both orthogonal Procrustes, rotation Procrustes, and "generic" Procrustes. These all have easy "inside" loops. Or even every one-sided Procrustes (using this algorithm). We (think we) have a use case for "generic Procrustes" with @ramirandaq .
I a bit confused with the generalized Procrustes method. I understand that every matrix has a right-hand-side transformation, but does it needs to be an orthogonal transformation (as implemented)? The documentation does not say anything about it and just used T. See: https://github.com/theochem/procrustes/blob/master/procrustes/generalized.py#L87