Closed roadhouse23 closed 5 years ago
Hi,
Take a look at the following script as a starting point
https://github.com/msultan/SML_CV/blob/master/alanine_example/train_data/feat.py
In the script you should swap out the trj_list line with the following code:
import mdtraj as md
trj_list = [md.load("./traj1.xtc",top="prot.pdb"), md.load("./traj2.xtc",top="prot.pdb")]
where traj1.xtc is the trajectory in the starting state, and traj2.xtc is the trajectory in the end state, and what we want is a collective variable that will drive the transition between them.
The script will then dump out the dihedral features(raw_features.pkl), sin-cosine transformed dihedral features(features.pkl), and a description of the features (feature_descriptor.pkl).
The svm_models are generated in the ipython notebooks, and learn what combinition of features separate traj1.xtc from traj2.xtc. This then becomes the collective variable.
I can do a more complete tutorial if you need once I get back but that would likely be in the evening or tomorrow.
Thanks for the reply and for all your help.
After your suggestions I should be able to apply the procedure for a general MD trajectory.
All the best Gianvito
Il giorno gio 9 ago 2018 alle ore 17:21 msultan notifications@github.com ha scritto:
Hi,
Take a look at the following script as a starting point
https://github.com/msultan/SML_CV/blob/master/alanine_example/train_data/feat.py
In the script you should swap out the trj_list line with the following code:
import mdtraj as md trj_list = [md.load("./traj1.xtc",top="prot.pdb"), md.load("./traj2.xtc",top="prot.pdb")]
where traj1.xtc is the trajectory in the starting state, and traj2.xtc is the trajectory in the end state, and what we want is a collective variable that will drive the transition between them.
The script will then dump out the dihedral features(raw_features.pkl), sin-cosine transformed dihedral features(features.pkl), and a description of the features (feature_descriptor.pkl).
The svm_models are generated in the ipython notebooks, and learn what combinition of features separate traj1.xtc from traj2.xtc. This then becomes the collective variable.
I can do a more complete tutorial if you need once I get back but that would likely be in the evening or tomorrow.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/msultan/SML_CV/issues/3#issuecomment-411795962, or mute the thread https://github.com/notifications/unsubscribe-auth/AoQpz14-SjVaATpnKfOlhPCpOXqQM13-ks5uPFOLgaJpZM4V102R .
Also, you can try different featurizers
instead of plain back bone dihedrals or even combinations of backbone, alpha carbon sidechain dihedrals and contacts of various kinds. The starting feature space is really system dependent but dihedrals + side chain dihedrals and contacts seem to work for most systems that i have seen.
However if you combine features like that, its best to normalize the features and the code should work with that too. However, I haven't documented that properly so I will try and writing a tutorial for it. For now, i would recommend starting with the MSMBuilder DihedralFeaturizer and seeing how the results look.
I don't know if the standard CVs are enough in my case, considering that I'm working with a relatively big protein. However, I just trying to familizrize myself with your code before starting the production experiments.
It seems that feat.py has produced the pkl files needed for the calculation, however when I try to run the code (01-svm.py), I obtain the following error:
ValueError Traceback (most recent call last)
/media/gianvito/ongoing/CV/01jd/01-svm.py in
/opt/python/3/lib/python3.6/site-packages/sklearn/linear_model/base.py in predict(self, X) 322 Predicted class label per sample. 323 """ --> 324 scores = self.decision_function(X) 325 if len(scores.shape) == 1: 326 indices = (scores > 0).astype(np.int)
/opt/python/3/lib/python3.6/site-packages/sklearn/linear_model/base.py in decision_function(self, X) 303 if X.shape[1] != n_features: 304 raise ValueError("X has %d features per sample; expecting %d" --> 305 % (X.shape[1], n_features)) 306 307 scores = safe_sparsedot(X, self.coef.T,
ValueError: X has 4 features per sample; expecting 680"
X_val is only for plotting the dihedrals of alanine dipeptide. It wont apply to your system. Take a look at how I visualized the chignolin example to see what dihedrals it picks out as being important for the transition.
closing now, feel free to reopen this issue as needed and Good luck with the simulations!
Hello, I succesfully executed the alanine example (01-svm_example.py). However, it's not clear to me how I can obtain the following pkl files from a general MD trajectory.
raw_features.pkl features.pkl feature_descriptor.pkl svm_model_2.pkl
Can you help me or can you indicate a step-by-step documentation to apply this very interesting procedure in a general case starting from a MD simulation trajectory?
Thank you very much
GG