markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
308 stars 118 forks source link

AssertionError: C0 is not a symmetric matrix #891

Closed sonyahanson closed 7 years ago

sonyahanson commented 8 years ago

We are getting an Assertion Error when trying to run pyemma.coordinates.pipeline

    # do featurization + tICA by streaming over size-1000 "chunks"
    source = pyemma.coordinates.source(fnames, features = feat)
    tica = pyemma.coordinates.tica(lag = 10, kinetic_map = True, var_cutoff = 0.95)
    stages = [source, tica]
    pipeline = pyemma.coordinates.pipeline(stages, chunksize = 1000)

Have you seen this before? Let us know if you need more information. Tagging @maxentile, in case he can help clarify anything.

calculate mean+cov: 100% (1591/1591) [#############################] eta 00:00 |Traceback (most recent call last):
  File "pipeline.py", line 190, in <module>
    run_pipeline(fnames, project_name = project_name)
  File "pipeline.py", line 101, in run_pipeline
    pipeline = pyemma.coordinates.pipeline(stages, chunksize = 1000)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/coordinates/api.py", line 435, in pipeline
    p.parametrize()
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/coordinates/pipelines.py", line 146, in parametrize
    element.estimate(element.data_producer, stride=self.param_stride)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/coordinates/transform/tica.py", line 257, in estimate
    return super(TICA, self).estimate(X, **kwargs)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/coordinates/transform/transformer.py", line 190, in estimate
    super(StreamingTransformer, self).estimate(X, **kwargs)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/_base/estimator.py", line 348, in estimate
    self._model = self._estimate(X)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/coordinates/transform/tica.py", line 297, in _estimate
    self._diagonalize()
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/coordinates/transform/tica.py", line 308, in _diagonalize
    eigenvalues, eigenvectors = eig_corr(self.cov, self.cov_tau, self.epsilon)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/pyEMMA-2.2.2-py2.7-linux-x86_64.egg/pyemma/util/linalg.py", line 151, in eig_corr
    assert np.allclose(C0.T, C0), 'C0 is not a symmetric matrix'
AssertionError: C0 is not a symmetric matrix
marscher commented 8 years ago

Actually I've never seen this behaviour before and it looks like something went horribly wrong... Do you also write to trajectories, while pyemma reads them?

jchodera commented 8 years ago

@sonyahanson: Was this error still occurring on staged data, or only when the trajectories were still being synced from FAH?

sonyahanson commented 8 years ago

This error was still occurring on staged data.

marscher commented 8 years ago

@fnueske Did you ever observed that the unlagged covariance matrix is not symmetric? We also use covartools from variational to calculate this matrix.

@sonyahanson Can you please save the covariance matrix and send it to us or attach it to this issue? Maybe the check is too strict.

np.save('cov.npy', tica.cov)

fnueske commented 8 years ago

No, I can't recall having observed this. We need to look at the example to find out what's happening.

Am 09.08.16 um 16:54 schrieb Martin K. Scherer:

@fnueske https://github.com/fnueske Did you ever observed that the unlagged covariance matrix is not symmetric? We also use covartools from variational to calculate this matrix.

@sonyahanson https://github.com/sonyahanson Can you please save the covariance matrix and send it to us or attach it to this issue? Maybe the check is too strict.

np.save('cov.npy', tica.cov)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/891#issuecomment-238579318, or mute the thread https://github.com/notifications/unsubscribe-auth/AK98rv8dCi9ZTDdfC8VUPGn_oJ9W1B7Yks5qeJSjgaJpZM4JfXY8.

fabian-paul commented 8 years ago

I have a hypothesis: lately I've encountered some NaN that came out of my tica transformations. I've not been able to replicate this in a systematic way or narrow down the cause. It only affected a few frames, so I just threw away that data. Since NaN != NaN, every correlation matrix that contains one NaN entry must be non-symmetric. The causes for NaN in the data can be many, since we use np.empty all over the place to allocate new numpy arrays and a binary string of ones is a representation of NaN (np.array([0xFFFFFFFF], dtype=np.int32).view('f')), it's not unlikely that some numpy array which was allocated too large / not filled completely due to some off by n errors in indexing might contain a NaN. Is suggest that we should add some NaN checks to the code to catch errors early.

gph82 commented 8 years ago

I confirm this, but it is very hard to reproduce. I j think the NaNs appear at values integer of the chunksize Sent from my phone Apologies for brevity and typos

marscher commented 8 years ago

We could add this check in a very general interface of the streaming pipeline. However it is not trivial what should happen if a NaN is detected. Should we just discard this frame then and hope for the best or raise a error?

One could also think of a NaN-handler class which can be set by the user to decide what should happen. Eg. replace with a sane value or omit.

Discarding stuff will most likely brake assumptions about output lengths. So output arrays we be allocated bigger than available data, which could then itself introduce a new NaN as Fabian showed. So we should always initialize output arrays with np.zeros.

fabian-paul commented 8 years ago

I would rather initialize arrays with NaN and try to fix all indexing errors.

fabian-paul commented 8 years ago

In that case that the input data of the user contains NaNs: from the top of my head I don't see any reason why we should allow this.

franknoe commented 8 years ago

It's fairly simple - First of all we need to find out why the symmetry check fails, and then it needs to be avoided, unless it's directly due to the user input.

Other remarks:

Initialization: C0 is just a sum of matrix products. So it's reasonable to initialize it with 0's as long as no data have been used.

Input checks: If the input data contains NaNs, that shouldn't be allowed, but that would be very strange input, and testing for every possible input weirdness before trying any computation with it is too costly. At the stage of eig_corr it's relatively economic to check though, so one could simply come up with a more informative error message, if the NaN hypothesis is even true.

Damn, I found myself online again...

Am 10/08/16 um 13:40 schrieb fabian-paul:

In that case that the input data of the user contains NaNs: from the top of my head I don't see any reason why we should allow this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/891#issuecomment-238842003, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQjJbF-hyJB2eR5NnxLpO7qpuFSaFks5qebjIgaJpZM4JfXY8.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

marscher commented 8 years ago

Am 10.08.2016 um 13:51 schrieb Frank Noe:

It's fairly simple - First of all we need to find out why the symmetry check fails, and then it needs to be avoided, unless it's directly due to the user input.ix @sonyahanson we would need some more info to investigate this. Kindly ask you to send us the cov attribute of TICA.

Other remarks:

Initialization: C0 is just a sum of matrix products. So it's reasonable to initialize it with 0's as long as no data have been used. It is initialized with zeros in covartools. There is no single call to numpy.empty.

Input checks: If the input data contains NaNs, that shouldn't be allowed, but that would be very strange input, and testing for every possible input weirdness before trying any computation with it is too costly. At the stage of eig_corr it's relatively economic to check though, so one could simply come up with a more informative error message, if the NaN hypothesis is even true.

If you say, that should not be allowed, how would we then disallow it, if we don't check for that?

I would question, if the time cost for the check (which can be switched on/off) would not be dominated by the time of the user, if he or she would given some hint, what is actually broken.

Damn, I found myself online again... Welcome back. But enjoy your holiday and go diving :)

marscher commented 8 years ago

Am 10.08.2016 um 11:40 schrieb fabian-paul:

I have a hypothesis: lately I've encountered some NaN that came out of my tica transformations. I've not been able to replicate this in a systematic way or narrow down the cause. It only affected a few frames, so I just threw away that data. Since NaN != NaN, every correlation matrix that contains one NaN entry must be non-symmetric. The causes for NaN in the data can be many, since we use np.empty all over the place to allocate new numpy arrays and a binary string of ones is a representation of NaN (|np.array([0xFFFFFFFF], dtype=np.int32).view('f')|), it's not unlikely that some numpy array which was allocated too large / not filled completely due to some off by n errors in indexing might contain a NaN. Is suggest that we should add some NaN checks to the code to catch errors early.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/891#issuecomment-238816646, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKZL0zvdlVJZnawPAdWEXSkrRdnNJhMks5qeZycgaJpZM4JfXY8.

 def test_nan_input(self):
     data = np.random.random((1000, 3))
     data[-1, 0] = np.nan
     tica(data, lag=1)

fails with the exact same error message.

jchodera commented 8 years ago

Thanks, everybody!

Our trajectories should never have NaNs in them---the FAH machinery prevents this from happening---but I wonder if there might be some issues with incompletely-rsynced HDF5 trajectory data. I don't know if the HDF5 reader interprets incomplete final dataframes as containing some NaNs. We can check this with a simple Python script that scans the trajectories to see if there are any frames interpreted as NaNs, but it might be good to have some sort of check/filter in pyemma as well, as you suggest.

marscher commented 8 years ago

Am 10.08.2016 um 15:37 schrieb John Chodera:

Thanks, everybody!

Our trajectories should never have |NaN|s in them---the FAH machinery prevents this from happening---but I wonder if there might be some issues with incompletely-|rsync|ed HDF5 trajectory data. I don't know if the HDF5 reader interprets incomplete final dataframes as containing some |NaN|s. We can check this with a simple Python script that scans the trajectories to see if there are any frames interpreted as |NaN|s, but it might be good to have some sort of check/filter in |pyemma| as well, as you suggest.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/891#issuecomment-238868709, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKZLxry3-VtCJv8XJYMYJ482pKnK8XQks5qedQmgaJpZM4JfXY8.

The problem with re-syncing is exactly what leads to error in #892 and most probably also in this case. You can not assure, that the data (structure) is in a sane state, while it is being written to, as long the underlying data structure supports it. I found that a HDF5 access needs to be set up differently, if parallel read/write access is desired and I'm not sure if pytables (the backend of mdtraj) will use this, if not set.

sonyahanson commented 8 years ago

Thanks for your insights here, and sorry for the delay, but I only see this error when running a script that takes a while to run. I have now been able to reproduce the error while outputting the covariance matrix.

I'm not quite sure what to look for, but this is what the covariance matrix looks like:

array([[  2.67989663e-01,   1.76413777e-03,  -1.22455547e-04, ...,
         -4.53315917e-04,  -3.54789756e-03,   2.11741200e-04],
       [  1.76413777e-03,   7.31080394e-01,   2.27321636e-03, ...,
         -1.71696876e-03,   8.46309794e-03,  -3.75501541e-03],
       [ -1.22455547e-04,   2.27321636e-03,   3.24663820e-02, ...,
          3.78398266e-04,   3.66281740e-03,  -9.43233003e-04],
       ..., 
       [ -4.53315917e-04,  -1.71696876e-03,   3.78398266e-04, ...,
          7.18561970e-02,  -6.05389234e-03,   3.28629266e-05],
       [ -3.54789756e-03,   8.46309794e-03,   3.66281740e-03, ...,
         -6.05389234e-03,   6.87432327e-01,  -1.37805852e-01],
       [  2.11741200e-04,  -3.75501541e-03,  -9.43233003e-04, ...,
          3.28629266e-05,  -1.37805852e-01,   1.43177979e-01]])

I've also sent the full cov.npy file to @marscher.

Let me know if there's anything else I can provide!

jchodera commented 8 years ago

Are you able to tell if the matrix has NaNs in it? I think this is np.any(np.isnan(C0)) or similar.

sonyahanson commented 8 years ago
np.any(np.isnan(covariance_matrix)) 
False
sonyahanson commented 8 years ago

Actually just realized this might be the wrong file...

sonyahanson commented 8 years ago

Sorry about that, I have reason to believe that this is the covariance matrix generated while running the same script on a smaller subset of trajectories for which I don't see this error. Doing this again and also outputting tica.cov_tau.

jchodera commented 8 years ago

Thanks, @sonyahanson!

sonyahanson commented 8 years ago

A few updates here: I was not able to actually output tica.cov, since the error occurs during the creation of the tica object.

   stages = [source, tica]
-> pipeline = pyemma.coordinates.pipeline(stages, chunksize = 1000)

Suggestions on an alternative way to figure out what's going on are welcome. We are currently looking into if there is a problem with the trajectories themselves...

franknoe commented 8 years ago

Just wondering, does this only occur when you use the pipeline? You don't need to use the pipeline for this example.

Am 12/08/16 um 18:23 schrieb sonyahanson:

A few updates here: I was not able to actually output |tica.cov|, since the error occurs during the creation of the tica object.

stages=  [source, tica]

-> pipeline= pyemma.coordinates.pipeline(stages,chunksize = 1000)

Suggestions on an alternative way to figure out what's going on are welcome. We are currently looking into if there is a problem with the trajectories themselves...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/891#issuecomment-239492237, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQu1Gql7B65zxfqvQyj9za4JQFXfBks5qfJ4BgaJpZM4JfXY8.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

jchodera commented 8 years ago

I don't see how this could be a problem with the trajectories. We've established that these are complete trajectories that are not being appended to and the resulting C0 has no NaNs. This should be within the valid domain of the code.

sonyahanson commented 8 years ago

At the moment the error has only occurred using the pipeline. Since it seems to be occurring only for the full set of trajectories and for the exact way we are picking our features, it is possible we won't be able to do it without the pipeline due to the amount of memory required, but we'll look into it.

marscher commented 8 years ago

The default chunk size of the pipeline is very small (100). It could be that with such a large data set numerical errors ruin the final matrices, since we chop down the summation to much operations than necessary. Actually we wanted to change the chunksize parameter such, that it uses a sane amount of memory, eg. number of floats per chunk.

@sonyahanson Could you confirm this by increasing the chunksize to eg. 10000?

sonyahanson commented 8 years ago

Running this, sorry for the delay. Just to note (mentioned up top) that the chunksize was already slightly higher than default (chunksize = 1000).

marscher commented 8 years ago

Than probably this is not the issue...

marscher commented 8 years ago

@sonyahanson Could you please install from the debug branch like this to get a dump of the correlation matrices?

conda create -n debug_tica pyemma-dev
source activate debug_tica
pip install git+https://github.com/markovmodel/pyemma@debug_tica

This will create a new environment with the devel version of pyemma (so all dependencies are dragged in) and the installs the debug branch via pip.

sonyahanson commented 8 years ago

Will do! Thanks!

franknoe commented 8 years ago

Apparently this issue still persists, and @TensorDuck ran into it. Justin - can you provide a minimal example to reproduce it?

TensorDuck commented 8 years ago

I have an example (DE Shaw's Ubiquitin) that can reproduce this. However, the example takes 20 minutes to run. Some notes about how I got this error to be repeatable:

  1. Trajectories must be long, if you keep pre-striding the data externally, the error goes away if you stride too much. I have strided it as much as I can before the error goes away in my example.
  2. It only happens when I have 2 or more input trajectories.
  3. It only happens if i use coor.source() and not when I use coor.load()
  4. I have to use all calpha distances. Use some subset of distances causes the error to go away.

I have the example files (100mb), and I'll send it over to FU.

franknoe commented 8 years ago

/group/ag_cmb/scratch/cclementi/ubiquitin-test-symmetry.tar.gz

The text in the notebook debug_test.ipynb describes what the issue is about.

Am 13/10/16 um 05:31 schrieb Justin Chen:

I have an example (DE Shaw's Ubiquitin) that can reproduce this. However, the example takes 20 minutes to run. Some notes about how I got this error to be repeatable:

  1. Trajectories must be long, if you keep pre-striding the data externally, the error goes away if you stride too much. I have strided it as much as I can before the error goes away in my example.
  2. It only happens when I have 2 or more input trajectories.
  3. It only happens if i use |coor.source()| and not when I use |coor.load()|
  4. I have to use all calpha distances. Use some subset of distances causes the error to go away.

I have the example files (100mb), and I'll send it over to FU.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/891#issuecomment-253404367, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQtA5xYijohphAIjLZ4c2WVM6OxVZks5qzaYSgaJpZM4JfXY8.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

marscher commented 8 years ago

Unfortunately can not reproduce the error with notebook on a recent version (2.2.6+12.gd939b39.dirty). I use OpenBLAS for NumPy operations. I'll retry with MKL ...

TensorDuck commented 8 years ago

My version is: pyemma v 2.2.6 np110py27_0 omnia according to conda. I use the intel MKL library: imkl/11.2.2.164 (I think)

If the issue is not with pyemma, then is it possible its with the libraries I use?

marscher commented 8 years ago

Still no luck in reproducing the error. I re-tried with MKL backend. But I also used the recent numpy release 1.11. Can you please upload the contents of conda list > env.txt?

marscher commented 8 years ago

Okay, with numpy=1.10 it fails.

marscher commented 8 years ago

So the culprit is the RunningMoments/Moments.combine method, which introduces a non-finite number in the running estimate of the covariance. I will try to get more information...

17-10-16 16:38:16 pyemma.coordinates.transform.tica.TICA[1] DEBUG    will use [122763 123520] total frames for pyemma.coordinates.transform.tica.TICA[1]
17-10-16 16:38:16 pyemma.coordinates.transform.tica.TICA[1] DEBUG    using 11 moments for 2464 chunks
17-10-16 17:28:11 pyemma.coordinates.transform.tica.TICA[1] ERROR    itraj=1, pos=108600
Traceback (most recent call last):
  File "/home/mi/marscher/workspace/pyemma/pyemma/coordinates/transform/tica.py", line 294, in _estimate
    self._covar.add(X, Y)
  File "/home/mi/marscher/workspace/pyemma/pyemma/coordinates/estimators/covar/running_moments.py", line 249, in add
    self.storage_XX.store(Moments(w, s_X, s_X, C_XX))
  File "/home/mi/marscher/workspace/pyemma/pyemma/coordinates/estimators/covar/running_moments.py", line 151, in store
    self.storage[-1].combine(M, mean_free=self.remove_mean)
  File "/home/mi/marscher/workspace/pyemma/pyemma/coordinates/estimators/covar/running_moments.py", line 58, in combine
    assert np.all(np.isfinite(self.Mxy))
AssertionError
marscher commented 8 years ago
> /home/mi/marscher/workspace/pyemma/pyemma/coordinates/estimators/covar/running_moments.py(58)combine()
-> assert np.all(np.isfinite(self.Mxy))
(Pdb) self.sx
array([ 185.16027921,  227.03151453,  278.4212929 , ...,  183.27903527,
                 nan,           nan])
(Pdb) self.sy
array([ 185.16027921,  227.03151453,  278.4212929 , ...,  183.27903527,
                 nan,           nan])
(Pdb) dir(self)
['Mxy', '__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'combine', 'copy', 'covar', 'mean_x', 'mean_y', 'sx', 'sy', 'w']
(Pdb) self.w
200.0
(Pdb) self.mean_x
array([ 0.9258014 ,  1.13515757,  1.39210646, ...,  0.91639518,
               nan,         nan])
(Pdb) self.mean_y
array([ 0.9258014 ,  1.13515757,  1.39210646, ...,  0.91639518,
               nan,         nan])
(Pdb) self.covar
array([[ 0.00805858,  0.01135199,  0.01151957, ...,  0.00157904,
                nan,         nan],
       [ 0.01135199,  0.02745306,  0.029485  , ...,  0.00418204,
                nan,         nan],
       [ 0.01151957,  0.029485  ,  0.04410689, ...,  0.00834095,
                nan,         nan],
       ..., 
       [ 0.00157904,  0.00418204,  0.00834095, ...,  0.01866587,
                nan,         nan],
       [        nan,         nan,         nan, ...,         nan,
                nan,         nan],
       [        nan,         nan,         nan, ...,         nan,
                nan,         nan]])
TensorDuck commented 8 years ago

So updating to NumPy1.11 fixes this issue for me as well. I was previously on NumPy1.10.

franknoe commented 8 years ago

how weird.

Am 18/10/16 um 08:35 schrieb Justin Chen:

So updating to |NumPy1.11| fixes this issue for me as well. I was previously on |NumPy1.10|.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/891#issuecomment-254419476, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQt7ZreX0ysFJfG5wMUUmaxvsPJoHks5q1GiqgaJpZM4JfXY8.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

marscher commented 7 years ago

Should we encourage to upgrade via a version constraint in the conda-recipe? This makes me really nervous, that users will get unpredictable results just because of an old numpy version.

franknoe commented 7 years ago

are there any potential problems by requiring >=1.11 ? If I remember correctly some omnia packages were tested against different numpy versions, but I don't remember why. Maybe this is an omnia question?

Am 18/10/16 um 12:13 schrieb Martin K. Scherer:

Should we encourage to upgrade via a version constraint in the conda-recipe? This makes me really nervous, that users will get unpredictable results just because of an old numpy version.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/891#issuecomment-254465483, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQm46FqmmgQ66rRwVqADaYXoCYbn0ks5q1Ju6gaJpZM4JfXY8.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

marscher commented 7 years ago

Yes, for omnia we build 1.10 and 1.11. But we can omit 1.10 for pyemma, this is no problem.

It turns out that the input data is already not finite:

/home/mi/marscher/workspace/pyemma/pyemma/coordinates/transform/tica.pyc in _estimate(self, iterable, **kw)
    292             for itraj, X, Y in it:
    293                 try:
--> 294                     assert np.all(np.isfinite(X))
    295                     assert np.all(np.isfinite(Y))
    296                     self._covar.add(X, Y)

The last two elements are NaN. So it seems to be either a problem with the trajectory or with the mdtraj.compute_distances function.

franknoe commented 7 years ago

Ah, but then this is not a numpy problem. So maybe we can separate the frame that has the problem and nail it down?

Am 18/10/16 um 13:46 schrieb Martin K. Scherer:

Yes, for omnia we build 1.10 and 1.11. But we can omit 1.10 for pyemma, this is no problem.

It turns out that the input data is already not finite:

|/home/mi/marscher/workspace/pyemma/pyemma/coordinates/transform/tica.pyc in _estimate(self, iterable, **kw) 292 for itraj, X, Y in it: 293 try: --> 294 assert np.all(np.isfinite(X)) 295 assert np.all(np.isfinite(Y)) 296 self._covar.add(X, Y) |

The last two elements are NaN. So it seems to be either a problem with the trajectory or with the mdtraj.compute_distances function.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/891#issuecomment-254483952, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQhBtSWwyalte9jiCLfPsW-UKdAbbks5q1LGdgaJpZM4JfXY8.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

marscher commented 7 years ago

You're right. I'm extracting this right now. However it only occurs if we use numpy=1.10

marscher commented 7 years ago

It is not reproducible, without loading the first traj too. None of these workflows obtains these NaNs...

  1. if one loads the whole traj into memory and computes the distances.
  2. iterate with pyemma and compute distances using a single iterator.
  3. iterate with pyemma and compute distances with lagged iter (which TICA uses).
  4. load in chunks with mdtraj.XTCTrajectoryFile.read_as_traj mimicking the chunksize used in pyemma.
  5. seek to a frame index prior the suspected frame, load a range with read_as_traj and compute_distances.

At the moment I have no further idea how to pin down the culprit...

franknoe commented 7 years ago

Am I understanding this correct: So if you just do this in isolation (your point 3.)

    292             for itraj, X, Y in it:
    293                 try:
    294                     assert np.all(np.isfinite(X))
    295                     assert np.all(np.isfinite(Y))
    296                     self._covar.add(X, Y)

and construct the iterator in the same way that TICA does it, you cannot reproduce into the problem, while you do run into it when doing TICA which uses the same code? So what's different between these two usecases?

If the trajectory doesn't have any NaN's itself, then the NaN's must be produced by our readers. I think we'll need to find the problem then.

jchodera commented 7 years ago

are there any potential problems by requiring >=1.11 ?

I'd be a bit careful here. If you require numpy >=1.11, you are effectively forcing everyone's conda environments to numpy 1.11. Since some other packages may use pyemma under the hood, this may have surprising effects.

marscher commented 7 years ago

One has to use covartools.running_moments in the loop to trigger the error. We've also checked that the input is in the DistanceFeature is finite, but the outcome of mdtraj contains NaN's. Unfortunately the single dumped Trajectory object does not reproduce the error, when loaded again and passed to compute_distances. So it is a very very weird side-effect in conjunction with compute_distances and running_moments. We've also disabled the release of the GIL in mdtraj _geomtry.pyx, but with no luck.