markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
311 stars 119 forks source link

issue plotting metastable states #1430

Closed wemd closed 5 years ago

wemd commented 5 years ago

Hello

I need to plot metastable states. I have used the following code:

metastable_traj = msm.metastable_assignments[dtrajsconcatenated] fig, ax = plt.subplots(figsize=(10, 4)) , _, misc = pyemma.plots.plot_state_map( *tica_concatenated[:, :2].T, metastable_traj, ax=ax) ax.set_xlabel('IC 1') ax.set_ylabel('IC 2') misc['cbar'].setticklabels([r'$\mathcal{S}%d$' % (i + 1) for i in range(nstates)]) fig.tight_layout()

However, I'm getting the following error:

IndexError Traceback (most recent call last)

in ----> 1 metastable_traj = msm.metastable_assignments[dtrajs_concatenated] IndexError: index 94 is out of bounds for axis 0 with size 44

I'll appreciate any help! Cheers W

clonker commented 5 years ago

This is because your discrete trajectory contains at least 95 states but the MSM you built contains just 44. Based on your information I can only guess: Either the discrete trajectory does not match with your msm and you built your msm based on different data or your data is (partially) disconnected and the active set is smaller than the total number of states discovered by clustering. Can you check the property msm.active_set? This would give you all the discrete states contained in your model.

wemd commented 5 years ago

Many thanks. Actually, this is what I get:

len(msm.active_set) 29

I also tried:

msm = pyemma.msm.bayesian_markov_model(cluster.dtrajs, lag=21, dt_traj='0.1 ns') print('fraction of states used = {:.2f}'.format(msm.active_state_fraction)) print('fraction of counts used = {:.2f}'.format(msm.active_count_fraction))

fraction of states used = 0.39 fraction of counts used = 0.62

How could I consider all the states? Regards W

clonker commented 5 years ago

If you want to consider all the active set states you can stride your dtraj with the active set. If you want to enlarge your active set you could try something like a different / coarser clustering, smaller stride, more data. The active set by default is just the largest connected set in your discrete trajectory.

wemd commented 5 years ago

Hello Thsnk for your reply. Shouldn't 'cluster.dtrajs' contain all the active set states? I also tried:

msm = pyemma.msm.bayesian_markov_model(dtrajs_concatenated, lag=21, dt_traj='0.24 ns')

where dtrajs_concatenated = np.concatenate(cluster.dtrajs)

However, the fraction of states used is still 39. I'll appreciate any advice about how to consider all the active set states. Cheers W

wemd commented 5 years ago

Sorry, I meant 0.39

wemd commented 5 years ago

Could you please help me understand what do you mean by "stride your dtraj with the active set"? Thanks

wemd commented 5 years ago

Ok, I got the active set by using msm.dtrajs_active and the plot worked! How could I use the total number of states? Many thanks

thempel commented 5 years ago

MSMs are mathematically only defined on a connected set of states. This means that transition probabilities from / into a state S can only be estimated if at least one transition out of state S into any other state and a transition into state S has been observed. You likely have not sampled all possible states in this way. The way pyemma deals with this is to estimate the MSM on the largest connected set of states. It cannot estimate an MSM on the full set of states if the full set of states is not connected.

There are a couple of things that you can do to solve this. Ultimately, it's a sampling problem, i.e. you need to sample the missing transitions. However, sometimes it is only an effect of an "unfortunate" projection or discretization and the missing transitions are not needed. E.g. you can try to tweak the clustering, maybe take less cluster centers, and have a look at the projection (e.g. TICA). If you observe that for example in the first TICs you only see all trajectories drifting into the same direction, that might already be the problem. You might want to make sure that the processes that you want to model are reversibly sampled.

marscher commented 5 years ago

please feel free to reopen, if you got further questions.