theochem / procrustes

Python library for finding the optimal transformation(s) that makes two matrices as close as possible to each other.
https://procrustes.qcdevs.org/
GNU General Public License v3.0
109 stars 20 forks source link

Refactor permutation #97

Closed FarnazH closed 3 years ago

FarnazH commented 3 years ago

I built upon @Ali-Tehrani's refactoring of permutation_2sided in this PR. This is mostly adding more comments and minor refactoring to make the code even more clear. I am making a major refactoring of the permutation_2sided function, based on conversations with @PaulWAyers & @Ali-Tehrani, but I wanted to share this first.

FarnazH commented 3 years ago

@fwmeng88, @Ali-Tehrani, and @PaulWAyers: I worked on the (major) refactoring of permutation_2sided, and to make these changes be finalized faster, I am pushing those comments here for review (instead of waiting for this PR to be reviewed/merged before sharing the new refactoring).

FarnazH commented 3 years ago

@fwmeng88, @Ali-Tehrani, and @PaulWAyers: The refactored permutation_2sided is ready for review. The goal was to make this function more flexible. The current implementation allows one to compute approximate permutation methods as easily as more expensive methods. It also makes it easy to add new algorithms. Please be critical in your review :-)

The docstring needs to be updated (for both arguments & notes) to better describe the code functionality, but I didn't want to do it until we all agree on the API changes, etc. In the meanwhile, I am double-checking the implemented algorithms against the paper's appendix to fix any discrepancies.

Ali-Tehrani commented 3 years ago

I haven't found any logical bugs, and this improved code is really well-done.

Ali-Tehrani commented 3 years ago

For the notes, we could do something similar to SciPy. Below is a template.

CLICK HERE

Define the problem and `P_1` and `P_2` in the header. Place `guess_p2` before `guess_p1` due to their importance or swap their names. Examples ------------- Do an example using one of the umeyama, then using nmf. See Also ------------ kopt_heuristic_single Search all k-fold permutations to minimize any function for single transformation. kopt_heuristic_double Search all k-fold permutations to minimize any function for double transformation. softassign Minimize the generalized quadratic assignment problem. Notes -------- Explain single (one-sided) and double (two-sided) permutation Procrustes problem. **Single Transformation** The method "approx-normal1" found in [This paper] is @PaulWAyers @FarnazH . This is useful when one wants a decent initial guess, while being faster than `approx-umeyama` and `approx-umeyama-svd`. The method "approx-normal2" found in [This paper] is @PaulWAyers @FarnazH ... This is useful when one wants a decent initial guess, while being faster than `approx-umeyama` and `approx-umeyama-svd`. The method "approx-umeyama" found in [3] computes the closest permutation matrix to :math:`|U_A| |U_B|.T`, where U_A, U_B are the eigenvectors of A, B, respectively and |A| is the matrix with absolute value applied to each matrix entry. This is useful when one wants a decent initial guess. The method "approx-umeyama-svd" is a variate of "approx-umeyama" which computes the closest permutation matrix to :math:`|U| |V|.T`, where U,V are the left and right unitary matrices of the singular value decomposition of :math:`|U_A| |U_B|.T`. This is useful when one wants a better initial guess but has higher computational time than "approx-umeyama". The parameter `lapack_driver` is used for the singular value decomposition. The method "nmf" found in [1] is an iterative fixed-point, gradient-based method that updates the permutation starting with 'guess_p2'. This is the recommended method of choice while providing an initial guess from the methods above. The method "k-opt" is a greedy, local search that searches over all k-fold permutations of the initial guess `guess_p2`. This search is slow when kopt is large. All permutations are searched when kopt is equal to the rows of :math:`A`. This is useful to refine the results from the other algorithms. The method "soft-assign" in [4] is a gradient-based iterative method that solves the permutation problem over doubly stochastic matrices, then finds the closest permutation matrix. The initial guess `guess_p2` can be provided to this algorithm. This is useful when other methods provide poor results. **Double Transformation** The method "flip-flop" based from [2] is an iterative method that updates :math:`P_1, P_2` while keeping the other fixed, starting with the initial guesses `guess_p1` and `guess_p2`. This is done by solving the one-sided permutation problem. The method "k-opt" is a greedy, local search that searches over all k-fold permutations of the rows of `guess_p1` and all k-fold permutation of the rows of `guess_p2`. This search is slow when kopt is large. This is useful to refine the results from the other algorithm. References --------------- ... [4] Rangarajan, A., Yuille, A. L., Gold, S., & Mjolsness, E. (1997). A convergence proof for the softassign quadratic assignment algorithm. Advances in neural information processing systems, 620-626. [5] This paper

FarnazH commented 3 years ago

Thanks for the reviews, and thanks @Ali-Tehrani for the docstring template; that is very useful/helpful. Just not to hold things back any longer, I think we can go on and merge this PR, so @Ali-Tehrani can implement the soft-assign method. I will take care of the docstring in another PR. I will let you merge @fwmeng88 wherever you are ready.

FanwangM commented 3 years ago

Thanks for the big improvement! I am going to merge this PR.