markovmodel / ivampnets

24 stars 2 forks source link

How to sampling trajectories from a specific state of the global model instead of the local model #7

Closed Toolxx closed 8 months ago

Toolxx commented 11 months ago

Hi,

Based on the advice given by Dr. Mardt earlier, I got what looks like a usable iVAMPnet model in some runs. I'd like to take a closer look at it now. As the title says, I want to get the information in dynamical states, and I need to generate the corresponding trajectory from specific states.

Let me describe my workflow specifically (here the codes are based on pyemma):

  1. Run iVAMPnet, get this model which seems well converge with true value of the timescales. Get the projection trajectories from the model.
  2. Discretization of projections to obtain discrete trajectories, specifically, use dtrajs = np.argmax(Y[i],axis=1) .
  3. (Use its_1 = msm.its() function to check the implied timescale of MSMs of different subsystem (in total 3 here). Then choose the lag, and estimate MSM of each subsystem use M_1 = msm.bayesian_markov_model(dtrajs_1, msm_lag) (of course checking CK test, active_state_fraction, active_count_fraction afterward).
  4. I built 3 MSMs corresponding to 3 subsystems, then I get the transition matrix from each: M1_matrix = M_1.transition_matrix, M2_matrix = M_2.transition_matrix, M3_matrix = M_3.transition_matrix Then I use np.kron to combine them together to a global matrix.
  5. I estimated the global MSM use M_global_matrix = np.kron(M1_matrix, k(M2_matrix, M3_matrix)), M = pyemma.msm.markov_model(M_global_matrix). Since each subsystem has 9 states, in total there are 999=729 states in the global transition matrix.
  6. I used PCCA to generate the coarse-grained model: M.pcca(nstates)
  7. Do MSM analysis such as relaxation timescales, stationary prob./free energies, transition pathways/rates ...

Here are the questions:

  1. I would like to have a close look to the coarse-grained model of the global system. However, it is not available to use the function sample_by_distributions like: reader = pyemma.coordinates.source(xtc_files, feat), pcca_samples = M_1.sample_by_distributions(M_1.metastable_distributions, 10000), pyemma.coordinates.save_trajs(reader,pcca_samples,outfiles=['./pcca{}_10000samples.xtc'.format(n) for n in range(M_1.n_metastable)]) This seems to stem from the fact that the way I estimate my model is given the transition matrix directly. Is there some wise way to call the pyemma/deeptime package for me to make sampling from the dynamic states of the coarse-grained model of the global system feasible?
  2. It seems that I can't use HMM to coarse-grain the model here (instead of what step6 did)? Which cause I cannot use pyemma.plots.plot_markov_model here. Is there some suggestions to visualize the coarse-grained model of the global system?

Thanks for any helps.

amardt commented 9 months ago

Hi,

I am very happy to hear that you are able to get some acceptable models. Nice work! What I do not understand is your need to investigate the global model over the independent ones. Since they are independent, it should be sufficient to study each state of the individual model. Also given the individual states, there should be no necessity to further coarse grain these states. But perhaps I simply do not understand your goal here. Are you trying to compare the model to a coarse grained global model? But lets say you really want to do that. I think the reason why the functions cannot samples these states, is that you do not have provided a discrete state assignment of the global states. Through your kronecker product you only have defined the global states indirectly as a unique combination of the three subsystems. So you can estimate a new discrete trajectory of your global model. Given the probabilities of the pcca and the discrete global traj, you can if necessary sample the states by hand. Perhaps when you can tell me more specific what your ultimate goal is I can give you better advice and if necessary help you to sample by hand.

But in principle I would simply sample the states of each subsystem individual by the soft assignment of the iVAMPnet. This way you also do not need to to steps 2-7.

Sorry for the late response. Let me know what your status is.

Best,

Andreas

Toolxx commented 9 months ago

Hi Andreas,

Nice to have you feedback again. I would like to ask if it's convenient to provide your email and we can communicate directly by email? There are a couple other things that concern me about our research using (i)VAMPnets, and I'd like to provide more specific information to discuss with you if you don't mind.

Thanks for the suggestion on implementing this feature. I will follow this idea if necessary.

Brief explanation about the needs: For this iVAMPnet case, the reason why to investigate the global model over the independent ones because I would like to have the idea about the global metastable states. I am studying a potassium channel, and I am wondering what the mechanism is for channel gating the metastable state from a global perspective. It does good to investigate the local model, however, the model detected physical isolated domain (indep Markov domain) are not necessarily similar at each run (even the different are quite large). That is also why I wanted to see if these different models would have similar dynamics in the global case from a global coarse-grained model between different runs.

I am guessing now that the main problem might be in the choice of features, and probably the reason this came up before was that the dimensions I was using were quite high (2000+ residues, dim ~15000) and the number of samplings was relatively small (~15μs without enhanced sampling), so iVAMPnet's variations from different runs would be large. Does this idea make sense to this condition?

Thanks

amardt commented 9 months ago

Hi,

sure, we can proceed more directly. Perhaps we can connect via LinkedIn. I think you should be able to find me with my full name. Yeah, I think having a lot of input features can lead to some redundancy. You might be able to construct the same eigenfunction with different sets of features. When you do get these different results, are the timescales still the same? Also, do you check structures of the extreme values of these eigenfunctions? I know that one of the nice parts of the method is supposed to extract the important information also from a huge amount of input features, but for sure you will reduce the complexity if you can reduce the number of input features. But you are already doing that right? Otherwise, I would not think you get a way with 15k features but 2k residues. This means, you already updated the mask to your problem, right? Let me know if LInkedIn works for you.

Best,

Toolxx commented 9 months ago

Hi Andreas,

I have already sent you a friend request on LinkedIn. I'll share a few more details from our research and look forward to following up with you!

Many thanks!