theochem / procrustes

Python library for finding the optimal transformation(s) that makes two matrices as close as possible to each other.
https://procrustes.qcdevs.org/
GNU General Public License v3.0
109 stars 20 forks source link

Refactor permutation and permutation test #85

Closed Ali-Tehrani closed 3 years ago

Ali-Tehrani commented 3 years ago

This pull request is about Two-sided permutation and its tests.

Key changes to permutation.py file are 1) The function document of two-sided permutation were updated. 2) permutation_2sided_explicit was removed, from what I remember in the discussions was because it was equivalent to permutation_2sided when single=True and kopt=size of A.

Key changes to the tests are 1) Added a random permutation generator and generalize most tests to different matrix sizes. 2) Updated certain tests to correctly match what it is testing. In particular, testing the mode parameter requires the matrices to be symmetric. 3) Tests relating to permutation_2sided_explicit were converted to tests using the permutation_2sided when single=True and when kopt is equal to the size of the matrices. 3) Tests relating to Two-sided permutation with double transformation were untouched. I had no success in generalizing the flip-flop style algorithm from the thesis to different matrix sizes. It even failed for small matrices. I added in the docs (which is also found in the thesis) that it does not guarantee a global optima and an additional k-opt search might be needed. 4) Some tests were removed because they became redundant from the previous tests that were randomized.

I had to reduce the number of tests and size. Currently, it takes approximately one minute to complete all of the permutation tests. One or two additional tests were added to increase code coverage. The procrustes notes and the readthedocs may need to change, I haven't checked it.

PaulWAyers commented 3 years ago

I think it is better not to have a brute-strength permutation method as it can be terribly expensive and people might invoke it without understanding. When you force people to read the documentation and choose a large k-value to do brute-strength methods, at least you can force a tiny bit of awareness of what they are getting themselves into.....

So I'm in favor of documenting (I think this is already true enough) that when k > max(n_rows,n_cols) then brute-strength is in fact done. The method is there for those who want it, then, but the trap that people may stumble blindly into is more hidden.

FarnazH commented 3 years ago

@fwmeng88 can you please wait for all people assigned to review a PR to give their comments before merging? We are finalizing the code, so it is important for all of us to go through it and be meticulous.