Closed niklastoe closed 6 years ago
Hi, you are right, this is an issue. And averaging over it is definitely not the right solution. Instead you should just not include the datapoints crossing the trajectories into your estimation. We will adapt our function for this case as soon as possible.
The negative value issue: The Koopman operator can have negative values. You cannot interpret it as a probability transition matrix as for a real Markov State Model. The row sum is still 1 though. But usually this is not really an issue. You can still interpret the output of your network as a fuzzy clustering and use the eigenvalues of the Koopman matrix for calculating the implied timescales. For non-reversible and non-stationary systems the last point breaks down though, because the eigenvalues will become complex. But the Koopman model still holds and is able to describe the dynamical system (via the singular values/functions).
If you want to have a valid transition matrix, have a look at our paper: Deep Generative Markov State Models This solves this issue and also shows you how you can construct a generative model which approximates the conditional probability distribution in configuration space. By this you can simulate a new trajectory with unseen configurations based on the conditional distribution your model learned.
Hope this helps! Otherwise just contact us!
Best Andreas
Thank you very much for the quick reply and the explanation! I'll read the paper.
Best, Niklas
Hey, we now added the multiple trajectories option for the Koopman matrix estimation. It needs them as a list though. See our Alanine-Dipeptide example with multiple files.
If you need clarification, just ask!
Best Andreas
Hi,
Great, thank you!
I think it should be
for l, length_i in enumerate(traj_lengths - tau):
instead of
for l, length_i in enumerate(traj_lengths):
to avoid mixing trajectories since pred_ord
is of length sum(traj_lengths) - len(traj_lengths) * tau
. In the example, traj_list[0].shape[0] == 250000
but traj_list[-1].shape[0] == 249976
.
Best, Niklas
Hey, yes you are absolutely right! Thank you! I changed the code a bit different though: With your solution you would for each trajectory loose tau datapoints for your estimation (And in your example from above this could be many). In the example we reorganize the prediction of the network to have a state classification for each data point (3 * 250 000). Then you don't have to changed the code you were pointing at and you don't loose that many data points.
Best,
Andreas
Right, that's much better!
Hi,
in the Alanine dipeptide example with multiple files ,
pred_ord
is calculated fortotal_data_source
, i.e. for all trajectories.pred_ord
is passed on tovamp.get_its
andvamp.get_ck_test
, both of which callvamp.estimate_koopman_op
.pred_ord
does not contain any information about the trajectories sovamp.estimate_koopman_op
will consider pairs across trajectories when calculatingc_tau
e.g.traj1[-1]
withtraj2[tau-1]
. These pairs are random since there's no connection between the trajectories.In the example it's not a big issue since tau is much smaller than the length of each individual trajectory. Unless the random pairs correspond to transitions that are extremely rare, the Koopman operator won't differ much.
If that's the case though, what should one do? Estimate the Koopman operator for each individual trajectory and average all of them? I did that for the example, the transition probabilities involving the sparsely populated state 3 change a little bit.
Out of curiosity, is there any explanation for the negative values in the matrix or is that a numerical issue?
In other scenarios where tau might be in the range of 10^1 ns and trajectory lengths are in the range of 10^2 ns there would be a considerable number of random pairs and a large deviation in the Koopman operator.
Please let me know what you think. I assume I'll be working with trajectories that are 5-10 tau long so I'd like a convenience function that takes care of this. If you agree with the overall thought and think that averaging over all the Koopman operators from different trajectories is fine, I'll go ahead and write it.
Niklas