Closed hl2500 closed 2 years ago
Hi,
as far as I know there are no notebooks specifically dealing with adaptive sampling. Generally if the fraction of states is low that means that regions are disconnected and your sampling isn't good enough. You could define a reaction coordinate that steers the adaptive sampling process. Based on that you can then also pick frames to start new simulations.
Regarding the projections: I do not recommend using PCA for this, concerning TICA you can have a look at the free energy surface, this should give you some clues on the states the system likes to be in (in projected space). It is well possible that two components are not enough to adequately describe the energy wells and their proximity to another though.
Best, Moritz
For extracting the most probable structure (or, more precisely, the one that you've observed most frequently), you can conduct a histogram analysis in your transformed space. In a 2D space that could be done with numpy, but in general you can conduct a clustering with e.g. k-means and count the number of occurrences of each states with e.g. np.bincount(np.concatenate(cluster.dtrajs))
. From these histogram counts, you can select the state with the highest number of counts and draw frames from your simulation that were clustered in this state (e.g. with cluster.sample_indexes_by_cluster
). You can use pyemma.coordinates.save_traj
to write out frames.
Be aware that the states that you get here are purely based on your observation data and may be biased by limited sampling.
Thank you for the suggestions!
Hello,
Can I ask if there are any tutorials for adaptive sampling with pyemma? I was building a model but the fraction of states used is only 0.22. Is this because of the poor sampling and some of the states are disconnected? How could I know which trajectory frame I need to extract and start new simulations?
Also, can I ask how to extract the most probable structure from different states on TICA or PCA plots (e.g. IC1 X IC2), without building MSM?
Thank you!