markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
306 stars 119 forks source link

Missing dimensions when performing get_output() method #1595

Open meh47336 opened 1 year ago

meh47336 commented 1 year ago

I'm using a Jupyer notebook. The system is CentOS Linux 7 and PyEMMA version 2.5.12 (conda list is attached).

I'm following the the tutorial listed here (http://www.emma-project.org/latest/tutorials/notebooks/00-pentapeptide-showcase.html) using my own trajectory (50,000 frames).

I'm performing different featurizations to compare results, so I'll just mention one: contact features giving 666 dimensions. And a quick note: the following problem occurs whether I source() or load() the data.

`contact_data = coor.load(xtc_file, contact_feat) contact_tica = coor.tica(contact_data, lag = 1)

print(len(contact_data)) print(len(contact_data[0]))

print(contact_tica.describe()) ` Gives the output:

contact_data length = 50,000 contact_data[0] length = 666

contact_tica: (TICA, lag = 1, max. output. dim = 12)

I'm unsure why the dimensions were reduced form 666 to 12 here. However, I can still work with 12.

Next, I want to use the tica output for the VAMP-2 scoring in the tutorial. So, I use the .get_output() method, which should extract all features as a default (though I've also tried messing with the Slice).

` contact_out = contact_tica.get_output()

print(len(contact_out)) print(len(contact_out[0])) `

This output gives: contact_out length = 1 contact_out[0] length = 50,000.

Why am I only getting 1 dimension here? This is causing problems with the VAMP-2 scoring in the tutorial.

Thank you so much! condalist.txt

thempel commented 1 year ago

TICA reduces dimensions according to a variance cutoff, which defaults 95%. That means 95% of the kinetic variance are kept in the transformed data, which should explain the dimension reduction from 666 to 12 dimensions. Compare this page.

To your second question: Could it be that tica.get_output() returns a list of arrays? Since you only have one trajectory, you'd need to take the zero-th element of that list, like contact_out = contact_tica.get_output()[0].