nglviewer / ngl

WebGL protein viewer
http://nglviewer.org/ngl/
MIT License
657 stars 168 forks source link

Fix PDBQT parser for wrong H elements interpolation #957

Closed papillot closed 1 year ago

papillot commented 1 year ago

PDBQT files (used by the Autodock programs) are derived from PDB files notably by the addition of partial charges (Q) and specific atom types (T). The atom types are defined in the last columns of the PDBQT file, instead of the element symbol when applicable. To allow element detection while circumventing this, the previous code was inferring the element symbol from the first letter(s) of the atom name. This works well but, at least on some files found on the Webina project, some H atoms have names beginning with a digit. This leads to an unrecognized element name, which is given a default radius larger than the one expected from an hydrogen which in turn causes wrong bond detection as can be seen in the following screenshot.

pdbqt-connectivity-bug

This PR fixes this bug by using mappings from PDBQT atom types to element (derived from the meeko library). A PDBQT file excerpt has been added to the test dataset to check the fix. It can also be tested using the PDB file from the webina project link cited previously.