markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
306 stars 119 forks source link

how adjusting simulation timestep affects lag times #1580

Closed cgseitz closed 1 year ago

cgseitz commented 1 year ago

Hello,

I am trying to compare two datasets. Dataset A is a set of simulations was run with a 2 fs timestep, dataset B with a 6 fs timestep. I have mapped them to the same tica space, but how should I adjust the tica lag time and the MSM lag time? Do I need to have the lag times for dataset A be 3x the lag times as what dataset B has to make them directly comparable? Do I only change the tica lag time and keep the MSM lag time the same across datasets? Or do I change neither? Thanks for your help!

Best, Christian

clonker commented 1 year ago

IMO you should stride out the 2fs dataset (with a stride of 3). Changing the tica lagtime is almost surely going to give you wrong results.

EDIT: Perhaps you can mess with the lagtime in tica (as to say: if the 6fs dataset had a lagtime of 1, the 2fs dataset should have a lagtime of 3, ie finding the multiple of that). Same for the MSM lagtime. But this all depends on how you set up your data pipeline. The perhaps easiest way of dealing with this is:

  1. Choose a stride of 3 for the 2fs data source, choose a stride of 1 for the 6fs data source
  2. Estimate a joint TICA projection (or: estimate it on either of the two datasets and then compare against the other, depends on how and what exactly you want to compare)
  3. Project both with the same TICA instance and same clustering into dtrajs
  4. Build MSMs based on strided 2fs data source and non-strided 6fs data source
thempel commented 1 year ago

Are you speaking about the integrator time step or the interval that the data was saved at?

cgseitz commented 1 year ago

I am speaking about the integrator time step

thempel commented 1 year ago

I can't think of an immediate reason to increase the lag time of the transition operator as a function of the integrator time step, these are two different things (assuming that you're saving the data at the same frame rate).

cgseitz commented 1 year ago

Okay. My original question was just about the integrator timestep, but I can also mention that the simulations were saved at different frame rates.

Dataset A: integrator timestep of 2fs, frame saved every 20ps Dataset B: integrator timestep of 6fs, frame saved every 60ps

Does this change anything? Should I still just stride dataset A as @clonker mentioned?

thempel commented 1 year ago

You want to make sure that whatever you compare between these datasets has the same lag time, e.g., a transition matrix or anything derived from it. The lag time is given in physical units, not steps, so you've got to convert that in your case or to stride the input data to make that conversion obsolete, as @clonker suggests.

cgseitz commented 1 year ago

got it, thanks to both of you for clearing up my confusion!