markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
310 stars 119 forks source link

tica memory error #1364

Closed euhruska closed 6 years ago

euhruska commented 6 years ago

I'm getting an memory error with tica. I have 2401 files (700MB) and when I use only 1000 of them tica works, but when I use all of them is fails as below . Is there a way to improve this? get_out_arr is a list of numpy arrays. pyemma2.5.4+4.g73f3c06d python 3.5.5

Traceback (most recent call last):
  File "run-tica-msm5.py", line 141, in run
    tica_obj = pyemma.coordinates.tica(get_out_arr, lag=tica_lag, dim=tica_dim, kinetic_map=True, stride=tica_stride, weights='koopman')
  File "x/vpy8/lib/python3.5/site-packages/pyemma/coordinates/api.py", line 1233, in tica
    koop.estimate(data, chunksize=cs)
  File "x/vpy8/lib/python3.5/site-packages/pyemma/coordinates/data/_base/streaming_estimator.py", line 39, in estimate
    X = DataInMemory(array_list, chunksize=chunksize)
  File "x/vpy8/lib/python3.5/site-packages/pyemma/coordinates/data/data_in_memory.py", line 79, in __init__
    self._set_dimensions_and_lenghts()
  File "x/vpy8/lib/python3.5/site-packages/pyemma/coordinates/data/data_in_memory.py", line 125, in _set_dimensions_and_lenghts
    "Dimensions are = %s" % ndims)
marscher commented 6 years ago

maybe related to #1365?

euhruska commented 6 years ago

no, this issue happens even without koopman option and #1365 happens only for koopman option.

1365 is able to solve tica_obj, but not in this issue.

marscher commented 6 years ago

I think I know what is going on there. Stand by

marscher commented 6 years ago

what is the shape of these arrays?

marscher commented 6 years ago

The shown stack trace says that the MemoryError is raised during exception message building. I think this is highly unlikely. Please post as much information as possible about the input of tica, so we can investigate this.

Note that the exception to be raised is about non matching input dimensions of your input data.

euhruska commented 6 years ago

my bad, there was one single numpy array which was wrong dimension

marscher commented 6 years ago

thanks for the feedback, in any case there shouldn't be a MemoryError in these cases. Was there a high memory pressure on the system while executing this?

euhruska commented 6 years ago

no, it had the whole memory of a node, less than 1GB shouldn't cause memory pressure