Open ignatovmg opened 5 years ago
Looks good, but several things: 1) Can you change "<= 9" to "== 9". We are interested in 2 subsets: a) all peptides b) 9mers only. 2) Can you change the threshold for bb rmsd to 1.2 instead of 2.5 3) Train set should contain 105 complexes, I saw only ~20. Why? 4) Its good to know average values, but from your tables sampling quality varies strongly from case to case. Is there any way to plot how the number of structures < 2.5 depends on the bulkness of amino acid side chains in the peptide? Separately for each length group, since th longer the peptide is, the worse the sampling is.
Also this analysis would better fit for issue #4. For this one I was thinking, that you could run sampling for the worst/medium peptide cases (more than 4000 samples) and see how many samples we need to start producing good conformations at all. There is a number of cases where we have zero conformers < 2.5, would be nice to fix that.
You can try to do the following: a) Produce > 4k samples and see if we start getting good conformers. b) Remove receptor during brikard sampling and see if it helps. I produced the whole dataset with receptor included and maybe this was a bad idea. Brikard filters out those conformations which result in clashes with surrounding atoms, therefore, if the peptide was initially placed not very well, it is possible that brikard just throws out good conformations because of that. We need to check if this is the case by removing the receptor during sampling.
I didn't see that, i'll send you a correct set
[x] What does worst/medium peptide cases mean
[x] how the number of structures < 2.5 depends on the bulkness of amino acid side chains in the peptide? Separately for each length group
. We focus only two groups: lengh = 9, and length != 9, or what is the scenario?
I didn't see that, i'll send you a correct set
Everything is okay, дядь?)
Ahh didnt spot another error, fixing that..
https://github.com/ignatovmg/mhc-adventures/issues/2#issuecomment-532428437 1) Those which currenty don't / almost don't have any near natives at all 2) lengh = 9 and length = any
- Those which currenty don't / almost don't have any near natives at all
- lengh = 9 and length = any
near natives
means
- Those which currenty don't / almost don't have any near natives at all
- lengh = 9 and length = any
- Got that
- What does
near natives
means
Use the sampler class from here to study how much samples we need to reach proper sampling density. Some useful functionality for RMSD calculation is here. Ideally we want 5% < 2.5 A heavy atom RMSD (all atom minus hydrogens) conformations in the dataset to ensure a proper training. Also study the resulting backbone RMSD, our purpose is around 1.2 backbone RMSD (GradDock result)