markovmodel / PyEMMA

šŸš‚ Python API for Emma's Markov Model Algorithms šŸš‚
http://pyemma.org
GNU Lesser General Public License v3.0
306 stars 119 forks source link

Got ValueError associated with dimensionality when using a tica object #1585

Closed ryankzhu closed 1 year ago

ryankzhu commented 1 year ago

Hi,

I am trying to project some featurized trajectories onto a set of tica coordinates as follows:

tica = pm.coordinates.tica(ftrajs, lag=32, dim=19, kinetic_map=True)
tica.get_output()

where ftrajs is a list of ndarray(2000, 28). But I got the following error:

ValueError                                Traceback (most recent call last)
Input In [56], in <cell line: 1>()
----> 1 tica.get_output()

File ~/miniconda3/envs/msm_opt/lib/python3.10/site-packages/pyemma/coordinates/data/_base/transformer.py:226, in StreamingEstimationTransformer.get_output(self, dimensions, stride, skip, chunk)
    223 if not self._estimated:
    224     self.estimate(self.data_producer, stride=stride)
--> 226 return super(StreamingTransformer, self).get_output(dimensions, stride, skip, chunk)

File ~/miniconda3/envs/msm_opt/lib/python3.10/site-packages/pyemma/coordinates/data/_base/datasource.py:409, in DataSource.get_output(self, dimensions, stride, skip, chunk)
    407             i = slice(it.pos, it.pos + len(chunk))
    408             assert i.stop - i.start > 0
--> 409             trajs[itraj][i, :] = chunk[:, dimensions]
    410             pg.update(1)
    412 if config.coordinates_check_output:

ValueError: could not broadcast input array from shape (2000,15) into shape (2000,19)

It seems that the input array is considered to be in shape (2000,15). I don't understand why this would happen. Could someone help?

thempel commented 1 year ago

Hi, can you check if the dimensions of all elements of ftrajs are indeed identical? E.g., does set([arr.shape[1] for arr in ftrajs])contain only 28?

ryankzhu commented 1 year ago

Hi Tim, thanks for getting back to me. Yes --- when I do set([arr.shape[1] for arr in ftrajs]) it returns {28}. The first dimensions of elements of ftrajs do differ, but I don't think that matters.

thempel commented 1 year ago

Thanks for checking. I currently can't reproduce the issue, pretty sure that it's not the variance cutoff that is interfering here. As the pyemma version of TICA will be deprecated in the future, I'd suggest that you try deeptime instead, e.g.,

from deeptime.decomposition import TICA

tica = TICA(lagtime=32, dim=19, scaling='kinetic_map').fit_fetch(ftrajs)
out = [tica.transform(traj) for traj in ftrajs]

Please let us know if it works. If the issue still persists, it would be good to have some minimal example (maybe there's a specific trajectory that causes the issue?) that we can run to find the problem.

ryankzhu commented 1 year ago

I think I've figured out why this happens. You are right my feature trajectory is problematic.

My ftrajs are logistic transformed distances, but I used some slightly improper values for the centre and steepness in the transformation, and the protein I'm looking at happen to be small. The consequence is some dimensions in the ftrajs are constants, so it makes sense that the number of dimension found by tica is fewer than what I expect.

Many thanks : )