Closed TensorDuck closed 7 years ago
This appears to be the symmetrization problem again that was related to an mdtraj issue. No idea why it would pop up again in 2.3.1. @marscher, can you have a look? @tensorduck, is this behavior specific to this example, or do you see that problem also in other examples?
Am 07/02/17 um 23:10 schrieb Justin Chen:
I am getting a weird failure on using TICAv.2.3.1 after upgrading, tica was working on this data set previously.
Error is long, due to it being a recursion error, but the last bit of the traceback is this:
|File "/home/jc49/miniconda2/lib/python2.7/site-packages/pyemma/coordinates/transform/tica.py", line 55, in _lazy_estimation tica_obj._diagonalize() File "/home/jc49/miniconda2/lib/python2.7/site-packages/pyemma/coordinates/transform/tica.py", line 297, in _diagonalize timescales = self.timescales File "
", line 2, in timescales File "/home/jc49/miniconda2/lib/python2.7/site-packages/pyemma/coordinates/transform/tica.py", line 55, in _lazy_estimation tica_obj._diagonalize() File "/home/jc49/miniconda2/lib/python2.7/site-packages/pyemma/coordinates/transform/tica.py", line 291, in _diagonalize eigenvalues, eigenvectors = eig_corr(self._covar.cov, self._covar.cov_tau, self.epsilon, sign_maxelement=True) File "/home/jc49/miniconda2/lib/python2.7/site-packages/pyemma/_ext/variational/solvers/direct.py", line 152, in eig_corr L = spd_inv_split(C0, epsilon=epsilon, method=method, canonical_signs=True) File "/home/jc49/miniconda2/lib/python2.7/site-packages/pyemma/_ext/variational/solvers/direct.py", line 63, in spd_inv_split assert _np.allclose(W.T, W), 'C0 is not a symmetric matrix' File "/home/jc49/miniconda2/lib/python2.7/site-packages/numpy/core/numeric.py", line 2372, in allclose res = all(isclose(a, b, rtol=rtol, atol=atol, equal_nan=equal_nan)) File "/home/jc49/miniconda2/lib/python2.7/site-packages/numpy/core/numeric.py", line 2453, in isclose return within_tol(x, y, atol, rtol) File "/home/jc49/miniconda2/lib/python2.7/site-packages/numpy/core/numeric.py", line 2435, in within_tol with errstate(invalid='ignore'): File "/home/jc49/miniconda2/lib/python2.7/site-packages/numpy/core/numeric.py", line 2968, in enter self.oldstate = seterr(**self.kwargs) RuntimeError: maximum recursion depth exceeded | I have an example attached in a tar-ball with a python script |run_msm.py| that reproduces the error for me, all necessary data files included. I also have my conda_list.txt
test-ticafail.tar.gz https://github.com/markovmodel/PyEMMA/files/759185/test-ticafail.tar.gz conda_list.txt https://github.com/markovmodel/PyEMMA/files/759186/conda_list.txt
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/1037, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQk-yWlLdiNEdHlmTRL_LtXxjfqrUks5raOvdgaJpZM4L6G4l.
--
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
I see this in a few similar trajectories, all of them coarse-grained models though. I will test with some all-atom data as well and see if there is a difference.
@TensorDuck: thanks for taking the time to report this with a reproducing example.
@franknoe: maximum-recursion-depth reached...This has nothing to do with mdtraj. Please tell me to which specific issue you are referring to?
Its an endless recursion, because of the decorator voodoo of lazy estimation. I introduced this code for partial_fit support to avoid early diagonalization and only perform it, if properties depending on the result are accessed. But in the recent version the decorator is used incorrectly. To clarify this:
TICA gets estimated, which calls diagonalize. So far so, so good. But within the new implementation of diag(), the lazy property self.timescales is accessed. So it will again call diagonolize, because self._estimated is still not true (this would happen only if the diagonalization is finished).
It is a total fail of our test suite. A total disaster 👎 The funny thing is really, why this even works in most cases. The only condition in which it can work is, If tica_obj._estimated is true before diag() is invoked the decorator does not call it again. So this condition is met most of the time.
a snippet like this succeeds: inp=pyemma.coordinates.source(np.random.random((100,3))) pyemma.coordinates.tica(inp,1).get_output()
but this one re-produces the error: pyemma.coordinates.tica(r,1,commute_map=True, kinetic_map=False).get_output()
So this new feature is simply not well tested. The commute_map=True code branch accesses self.timescales -> recursive call to diag()... just use what have computed in eig_corr like
if self.commute_map:
timescales = -self.lag / np.log(np.abs(eigenvalues))
BTW: this is definition of the property self.eigenvalues, but it leads to NaN in output:
import numpy as np
import pyemma
r=pyemma.coordinates.source(x)
x=np.random.random((100,3))
r=pyemma.coordinates.source(x)
########
In [10]: pyemma.coordinates.tica(r,1,commute_map=True, kinetic_map=False).eigenvalues
/home/marscher/workspace/pyemma/pyemma/coordinates/transform/tica.py:301: RuntimeWarning: invalid value encountered in sqrt
eigenvectors *= np.sqrt(regularized_timescales / 2)
Out[10]: array([ 0.2235185 , -0.20373609, -0.12104746])
In [11]: pyemma.coordinates.tica(r,1,commute_map=True, kinetic_map=False).timescales
/home/marscher/workspace/pyemma/pyemma/coordinates/transform/tica.py:301: RuntimeWarning: invalid value encountered in sqrt
eigenvectors *= np.sqrt(regularized_timescales / 2)
Out[11]: array([ 0.66744041, 0.62856324, 0.47358068])
so we have real eigenpairs. But how can the output then be not valid? I think a lot more things have been broken recently and are most likely also untested.
Sorry @TensorDuck that you ran into this problem! You must have wasted hours of cpu time to reach 1000 recursions (diagonalizing your matrix takes 13 seconds on my old box ~ I would have wasted 4 hours). Please don't use the commute_map=True flag by now - it is broken.
PS: untested code is broken code. Probably we should go back to have 3 release candidates again (like for branch 2.1) before we trigger the release button.
@franknoe sorry, we should have done a better job.
Ah, I see. So I wasn't going crazy. I think I will probably roll back a few versions for now then.
On 08.02.2017 04:04, Justin Chen wrote:
Ah, I see. So I wasn't going crazy. I think I will probably roll back a few versions for now then. I just wanted to point out how serious is the issue is. If people use this flag in analysis on a cluster, they will just hit the wall time.
This has been introduced in the stress of paper deadlines/phd thesis wrap up. Please accept our apologies.
I recommend 2.2.7 at the moment.
PS: sorry if you felt offended by that comment [I have toothache and its 4 o'clock in the morning].
I will roll back to 2.2.7 then. Thank you to everyone for y'all's help.
Not sure I understand, I'm also not sure if a decorator is a good way to do this. Decorator code is hard to read for most developers and programming logic should be contained in easily accessible code, ideally in the class in which it acts.
In any case I agree about the general stuff, i.e. release-candidates or devel versions (whichever is easier to do) that are only available on demand and have been used for a while in the group before the public release is made, and code testing. Also I hope to have more time for helping with software soon, I think many recent errors are due to a too complicated software design and class dependencies in the base framework.
Practically, since TICA is a very basic feature we need a functional version rather quick. @fabian-paul, can you help with this?
Am 08/02/17 um 03:38 schrieb Martin K. Scherer:
@TensorDuck https://github.com/TensorDuck: thanks for taking the time to report this with a reproducing example.
@franknoe https://github.com/franknoe: maximum-recursion-depth reached...This has nothing to do with mdtraj. Please tell me to which specific issue you are referring to?
Its an endless recursion, because of the decorator voodoo of lazy estimation. I introduced this code for partial_fit support to avoid early diagonalization and only perform it, if properties depending on the result are accessed. But in the recent version the decorator is used incorrectly. To clarify this:
TICA gets estimated, which calls diagonalize. So far so, so good. But within the new implementation of diag(), the lazy property self.timescales is accessed. So it will again call diagonolize, because self._estimated is still not true (this would happen only if the diagonalization is finished).
It is a total fail of our test suite. A total disaster 👎 The funny thing is really, why this even works in most cases. The only condition in which it can work is, If tica_obj._estimated is true before diag() is invoked the decorator does not call it again. So this condition is met most of the time.
a snippet like this succeeds: inp=pyemma.coordinates.source(np.random.random((100,3))) pyemma.coordinates.tica(inp,1).get_output()
but this one re-produces the error: pyemma.coordinates.tica(r,1,commute_map=True, kinetic_map=False).get_output()
So this new feature is simply not well tested. The commute_map=True code branch accesses self.timescales -> recursive call to diag()... just use what have computed in eig_corr like
|if self.commute_map: timescales = -self.lag / np.log(np.abs(eigenvalues)) |
BTW: this is definition of the property self.eigenvalues, but it leads to NaN in output:
|import numpy as np import pyemma r=pyemma.coordinates.source(x) x=np.random.random((100,3)) r=pyemma.coordinates.source(x) ######## In
kinetic_map=False).eigenvalues /home/marscher/workspace/pyemma/pyemma/coordinates/transform/tica.py:301: RuntimeWarning: invalid value encountered in sqrt eigenvectors = np.sqrt(regularized_timescales / 2) Out10: array([ 0.2235185 , -0.20373609, -0.12104746]) | |In [11]: pyemma.coordinates.tica(r,1,commute_map=True, kinetic_map=False).timescales /home/marscher/workspace/pyemma/pyemma/coordinates/transform/tica.py:301: RuntimeWarning: invalid value encountered in sqrt eigenvectors = np.sqrt(regularized_timescales / 2) Out[11]: array([ 0.66744041, 0.62856324, 0.47358068]) |
so we have real eigenpairs. But how can the output then be not valid? I think a lot more things have been broken recently and are most likely also untested.
Sorry @TensorDuck https://github.com/TensorDuck that you ran into this problem! You must have wasted hours of cpu time to reach 1000 recursions (diagonalizing your matrix takes 13 seconds on my old box ~ I would have wasted 4 hours). Please don't use the commute_map=True flag by now - it is broken.
PS: untested code is broken code. Probably we should go back to have 3 release candidates again (like for branch 2.1) before we trigger the release button.
@franknoe https://github.com/franknoe sorry, we should have done a better job.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/1037#issuecomment-278212295, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQptO-7LVUh1Egj-eEQVhjLmcH1rBks5raSqYgaJpZM4L6G4l.
--
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
I don't think you need my help. @marscher correctly identified the problem. commute_map simply wasn't tested.
Concerning @marscher comment on the RuntimeWarning, there seems to be a general problem with @franknoe 's regularization formula. Consider this example:
lag = 1.0
timescales = np.array([1000,1,0.1])
print 0.5 * timescales * np.tanh(np.pi * ((timescales - lag) / lag) + 1)
[ 5.00000000e+02 3.80797078e-01 -4.74787185e-02]
This produces negative numbers, which I think wasn't intended. Let me ponder on this...
Is it only a commute map problem, or is it a general problem with partial fit.
Am 08/02/17 um 14:29 schrieb fabian-paul:
I don't think you need my help. @marscher https://github.com/marscher correctly identified the problem. commute_map simply wasn't tested.
Concerning @marscher https://github.com/marscher comment on the RuntimeWarning, there seems to be a general problem with @franknoe https://github.com/franknoe 's regularization formula. Consider this example:
lag= 1.0 timescales= np.array([1000,1,0.1]) print 0.5 timescales np.tanh(np.pi* ((timescales- lag)/ lag)+ 1) |[ 5.00000000e+02 3.80797078e-01 -4.74787185e-02] |
This produces negative numbers, which I think wasn't intended. Let me ponder on this...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/1037#issuecomment-278328817, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQsIaXjCqK5h6dTtFAqdOUQM8W6r-ks5racNHgaJpZM4L6G4l.
--
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
On 08.02.2017 14:34, Frank Noe wrote:
Is it only a commute map problem, or is it a general problem with partial fit. Fabians example is pure numpy ;-) How do you relate it to partial_fit?
Am 08/02/17 um 14:29 schrieb fabian-paul:
I don't think you need my help. @marscher https://github.com/marscher correctly identified the problem. commute_map simply wasn't tested.
Concerning @marscher https://github.com/marscher comment on the RuntimeWarning, there seems to be a general problem with @franknoe https://github.com/franknoe 's regularization formula. Consider this example:
lag= 1.0 timescales= np.array([1000,1,0.1]) print 0.5 timescales np.tanh(np.pi* ((timescales- lag)/ lag)+ 1) |[ 5.00000000e+02 3.80797078e-01 -4.74787185e-02] |
This produces negative numbers, which I think wasn't intended. Let me ponder on this...
Your explanation was that there is a problem that depends upon whether the estimator has been estimated or not. This doesn't sound like it it specific to the commute map.
Am 08/02/17 um 15:12 schrieb Martin K. Scherer:
On 08.02.2017 14:34, Frank Noe wrote:
Is it only a commute map problem, or is it a general problem with partial fit. Fabians example is pure numpy ;-) How do you relate it to partial_fit?
Am 08/02/17 um 14:29 schrieb fabian-paul:
I don't think you need my help. @marscher https://github.com/marscher correctly identified the problem. commute_map simply wasn't tested.
Concerning @marscher https://github.com/marscher comment on the RuntimeWarning, there seems to be a general problem with @franknoe https://github.com/franknoe 's regularization formula. Consider this example:
lag= 1.0 timescales= np.array([1000,1,0.1]) print 0.5 timescales np.tanh(np.pi* ((timescales- lag)/ lag)+ 1) |[ 5.00000000e+02 3.80797078e-01 -4.74787185e-02] |
This produces negative numbers, which I think wasn't intended. Let me ponder on this...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/1037#issuecomment-278338828, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQhB27fWLmKXfyGM89kuq-PG-BuzFks5rac05gaJpZM4L6G4l.
--
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
@franknoe : it's only a commute map problem. I've added a combined test #1039 that tests commute_map with partial fit. @marscher : the numpy code is a copy from https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/coordinates/transform/tica.py#L300 The point I tried to make it that the code is ok, but the math is wrong.
I think @marscher 's explanation is partially (sic!) wrong "The only condition in which it can work is, If tica_obj._estimated is true before diag() is invoked the decorator does not call it again. So this condition is met most of the time."
I don't agree. This condition is not met most of the time. When _diagonalize() is entered self._estimated usually is False. (I checked this by adding assert not self._estimated
as the first line of _diagonalize.)
It's a commute map problem.
For the regularization, a quick'n'dirty way of solving the problem of negative implied time scales would be to truncate the regularized timescales at zero:
0.5 * timescales * np.maximum(np.tanh(np.pi * ((timescales - lag) / lag) + 1),0))
Talking about hyperbolic functions... why not regularize with the hyperbola directly, without any mental gymnastics?
lag*np.sqrt((np.maximum((timescales/lag)**2 - 1,0)))
I am getting a weird failure on using TICAv.2.3.1 after upgrading, tica was working on this data set previously.
Error is long, due to it being a recursion error, but the last bit of the traceback is this:
I have an example attached in a tar-ball with a python script
run_msm.py
that reproduces the error for me, all necessary data files included. I also have my conda_list.txttest-ticafail.tar.gz conda_list.txt