Make msm functions robust to non-ndarray inputs

franknoe commented 9 years ago

For example, msm.estimation.cmatrix cannot deal with the following (reasonable) input:

dtraj = [1,1,1,1,1,0,0,0,0,0,1,1,1,1,1] C = msmest.cmatrix(dtraj, 2).toarray()

because it assumes that the input must be of type ndarray:

/Users/noe/anaconda/lib/python2.7/site-packages/pyEMMA-1.1.1-py2.7-macosx-10.5-x86_64.egg/pyemma/msm/estimation/sparse/count_matrix.pyc in count_matrix_mult(dtrajs, lag, sliding, sparse, nstates) 39 nmax=0 40 for dtraj in dtrajs: ---> 41 nmax=max(nmax, dtraj.max()) 42 43 """Default is nstates = number of observed states at lagtime=1"""

AttributeError: 'int' object has no attribute 'max'

It would be better to be tolerant and accept array-like inputs. I suggest the solution to copy the input first inside the function:

dtraj = np.array(dtraj, dtype = int)

this generally has two beneficial effects:

We can continue processing the information without the danger of someone modifying it outside (not relevant for this function but for persistent objects like class instances)
We can work with ndarrays and lists.

Remember that people that are not very familiar with python will not be able to interpret the error above, thus we should be very tolerant rather than restrictive to input parameters - as long as they are reasonable.

marscher commented 9 years ago

unfortunately it is not that easy to simply convert everything to a 2d int array, since dtrajs may be of different lengths. The provided fix first looks for nested lists and converts each sublist to an int array. Otherwise if its a single list, its being converted to a single array, wrapped in a list.

Added some tests, which should cover most cases

d6b9152

franknoe commented 9 years ago

Not sure I understand, let's discuss.

Am 25/02/15 um 10:51 schrieb Martin K. Scherer:

unfortunately it is not that easy to simply convert everything to a 2d int array, since dtrajs may be of different lengths. The provided fix first looks for nested lists and converts each sublist to an int array. Otherwise if its a single list, its being converted to a single array, wrapped in a list.

Added some tests, which should cover most cases

d6b9152 https://github.com/markovmodel/PyEMMA/commit/d6b9152ba1862802afb1325462e0540d7104f6e3

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/86#issuecomment-75933076.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

marscher commented 9 years ago

ef9c3f6 updated conversion and moved it to estimation api to make it available for all callers of cmatrix. If the code is unclear I'm happy to answer questions.

markovmodel / PyEMMA

Make msm functions robust to non-ndarray inputs #86

Mail: Arnimallee 6, 14195 Berlin, Germany