maxscheurer / pycontact

Analysis of non-covalent interactions in MD trajectories
https://pycontact.github.io/
GNU General Public License v3.0
51 stars 12 forks source link

Edge Case in Definition of "Hydrogen Bond" vs "Other" Contact Type #85

Open RMCrean opened 2 years ago

RMCrean commented 2 years ago

First, thanks for this very nice tool, it's been very useful for my work. I think I have found an "edge case" though with the "determine_ctype" function inside Biochemisty.py.

I have being running PyContact on my replicas separately before later combining them.

In the image below are two contacts identified from a merged trajectory (1500 frames each). The only difference between the contact type is that one is labelled "Other" and in the other as a "hydrogen bond". (I have made my own script to process the PyContact output for how I would like the labels to look, which is why the labelling format is non-standard).

image

Clearly these are the same contact but in one case PyContact labels it a Hbond, and in the other as "Other". When I visualise the trajectory the interactions are clearly very largely "Other" in both trajectories but in a very small number of frames of one trajectory the two residues come into hydrogen bonding distance.

From looking at the determine_ctype function and then the hbondFramesScan function it appears that you need a single frame to have a hydrogen bond to consider the contact type as a hydrogen bond. Can you confirm if this is the case?

To be clear, I'm not requesting you fix this because I appreciate this is kind of a rare edge case, I just want to make sure I understand correctly so I can correctly handle it properly in my analysis.

Thanks!

maxscheurer commented 2 years ago

Hi @RMCrean, thanks for reporting this.

Can you confirm if this is the case?

Yes, I just looked at the code in question again and you're absolutely right.

I have not worked on PyContact over the past years because my own research has "moved away" from MD simulations... If you'd like to propose fixes, please feel free to open a PR any time!

RMCrean commented 2 years ago

Hi @maxscheurer, thanks for confirming this is the case and no worries at all that you have moved on to other work.

And thanks for the invite to make a PR for this, although I'm not entirely sure what the optimal (if there is one) solution would be here.

For me, the issue was inconsistency across runs, which I can solve (just tested now) by merging all my runs together into one trajectory and running PyContact calculations on small sections of the protein (and then merge these all back together again later).

We could update the code to change the cutoff from 1 frame to some fraction of frames with a Hbond but that wouldn't solve the consistency issue (I can imagine it might make the situation worse), and of course whatever fraction chosen would be arbitrary. So perhaps it is better to just be aware of it - unless you can think of a way to handle it?

(Sorry to go semi-off topic) Regarding PRs though, I do have a python script (run_pycontact.zip, attached as zip as GitHub wouldn't accept it otherwise) which other users may find useful which I can submit as a PR if you think it is worthwhile. I had issues with writing the contacts directly to file with the writeContactDataToFile() command so this script lets me run PyContact and process the data without having to save a session file and then load the GUI to write the contacts out.

Thanks!