rlabduke / probe

Evaluate and visualize protein interatomic packing
http://kinemage.biochem.duke.edu/software/probe.php
10 stars 4 forks source link

Crash on residues containing "O" atoms with alternate locations #3

Open sbstnk opened 6 years ago

sbstnk commented 6 years ago

Probe crashes when processing residues that contain at least two oxygen atoms named "O" if the first residue of the chain does not have any "N" nitrogen atoms named "N". This can be seen for example in "1ob6" (chain B, residue 0). The crash is due to a presumably erroneous check in ProcessResInfo. When creating the list of "O" oxygen atoms in a residue (ambigO), there is a check to see if the currently added atom is in the same residue as the previous atom in a list of "N" nitrogen atoms (ambigN). Checking ambigN does not seem to make much sense here. I think the intended behavior is to check the previous atom in ambigO which would make sense to ensure that all oxygen atoms in the list belong to the same residue, as stated in the comment above the code. If the check against ambigN was intended, then there would only ever be 1 oxygen atom added to the list, except on the n-terminal residue, because the first 4 atoms in ambigN all belong to the n-terminal residue if they are not NULL. The ambigO list however is used to determine the c-terminal mainchain oxygen atoms. So if my understanding of the code is correct, the erroneous check could in some cases also lead to undetected c-terminal mainchain oxygen atoms.

Instead of changing the check from ambigN to ambigO, it could probably also be removed entirely. This is because ProcessResInfo is only ever called after a call to NtermCheck if the residue changes, which sets all values in ambigO to NULL. There is a small difference though. The way residue changes are detected differs between the code calling NtermCheck and the check in ProcessResInfo. If the residue name of a residue changes between two alternate locations of the same atom, that will result in a different "r" pointer in the atom, leading the check in ProcessResInfo to consider this a different residue and not add the atom to the list, despite the residues having the same ID. The code that calls NtermCheck only looks at the residue ID and chain, meaning it would consider both atoms to be part of the same residue. The latter seems to be the intended behavior, so removing the check entirely would likely fix another small bug. The same argument could be made about ambigN[4-7], but not ambigN[0-3]. For ambigN[0-3] the check in ProcessResInfo would need to be adjusted to look at the residue ID + chain instead of the "r" pointer.

russell-taylor commented 2 years ago

This should be fixed in version 2.17, which did the first fix and checked against O, when the problem was encountered again recently. Your diagnosis of the problem indicates that there is a potentially better fix, so I'm leaving it open so that we can fix it in the better way later.