How dense must the sampling be to reach < 2.5A?

ignatovmg commented 5 years ago

Use the sampler class from here to study how much samples we need to reach proper sampling density. Some useful functionality for RMSD calculation is here. Ideally we want 5% < 2.5 A heavy atom RMSD (all atom minus hydrogens) conformations in the dataset to ensure a proper training. Also study the resulting backbone RMSD, our purpose is around 1.2 backbone RMSD (GradDock result)

cubazis commented 5 years ago

Simple class for processing train/test datasets in you format here
ipynb with calculations here

I need your feedback, if I've understood issue of conformations calculations. It would be great to write it here

ignatovmg commented 5 years ago

Looks good, but several things: 1) Can you change "<= 9" to "== 9". We are interested in 2 subsets: a) all peptides b) 9mers only. 2) Can you change the threshold for bb rmsd to 1.2 instead of 2.5 3) Train set should contain 105 complexes, I saw only ~20. Why? 4) Its good to know average values, but from your tables sampling quality varies strongly from case to case. Is there any way to plot how the number of structures < 2.5 depends on the bulkness of amino acid side chains in the peptide? Separately for each length group, since th longer the peptide is, the worse the sampling is.

Also this analysis would better fit for issue #4. For this one I was thinking, that you could run sampling for the worst/medium peptide cases (more than 4000 samples) and see how many samples we need to start producing good conformations at all. There is a number of cases where we have zero conformers < 2.5, would be nice to fix that.

You can try to do the following: a) Produce > 4k samples and see if we start getting good conformers. b) Remove receptor during brikard sampling and see if it helps. I produced the whole dataset with receptor included and maybe this was a bad idea. Brikard filters out those conformations which result in clashes with surrounding atoms, therefore, if the peptide was initially placed not very well, it is possible that brikard just throws out good conformations because of that. We need to check if this is the case by removing the receptor during sampling.

cubazis commented 5 years ago

Because I've used your fixed.csv. Your last tar contains ~20 only

ignatovmg commented 5 years ago

I didn't see that, i'll send you a correct set

cubazis commented 5 years ago

[x] change "<= 9" to "== 9". We are interested in 2 subsets: a) all peptides b) 9mers only.
[x] make threshold for different columns variable (bb rmsd to 1.2 instead of 2.5 for instance)
[ ] Develop separately for each length group calculation show dependency: how the number of structures < 2.5 depends on the bulkness of amino acid side chains in the peptide
[ ] Produce > 4k samples and see if we start getting good conformers.
[ ] Remove receptor during brikard sampling and see if it helps.

cubazis commented 5 years ago

Questions

[x] What does worst/medium peptide cases mean
[x] how the number of structures < 2.5 depends on the bulkness of amino acid side chains in the peptide? Separately for each length group. We focus only two groups: lengh = 9, and length != 9, or what is the scenario?

cubazis commented 5 years ago

I didn't see that, i'll send you a correct set

Everything is okay, дядь?)

ignatovmg commented 5 years ago

Ahh didnt spot another error, fixing that..

ignatovmg commented 5 years ago

https://github.com/ignatovmg/mhc-adventures/issues/2#issuecomment-532428437 1) Those which currenty don't / almost don't have any near natives at all 2) lengh = 9 and length = any

cubazis commented 5 years ago

#2 (comment)

Those which currenty don't / almost don't have any near natives at all

lengh = 9 and length = any

Got that
What does near natives means

ignatovmg commented 5 years ago

#2 (comment)

Those which currenty don't / almost don't have any near natives at all

lengh = 9 and length = any

Got that

What does near natives means

Native structure = crystal structure = the one from pdb = the answer Near native - the one close to the native structure

ignatovmg / mhc-adventures

How dense must the sampling be to reach < 2.5A? #2

Questions