Open speleo3 opened 7 years ago
This is a known problem, that we tracked some time ago in our internal issue tracking. I think it'd be best to describe it here again:
The full description of all compounds including protonations exists in a separate Protonation Variants Companion Dictionary which explains the issues we've seen with many hydrogen atoms missing in the normal CCD files (see http://www.wwpdb.org/data/ccd)
That means that to have a complete and accurate set of bonds we'd need to use those files too. Which presents some challenges, for instance the identifiers. Quoting the docs:
The dictionary of protonation variants provides additional nomenclature information for the protonation states of standard amino acids in N-terminal, C-terminal, and free forms, and includes common side chain protonation states. The identifiers used in this extension dictionary longer identifier codes to distinguish the various protonation forms of the standard amino acids. For instance, an identifier code ARG_LFOH_DHH12 is used to identify the arginine variant with a neutral peptide unit and side chain protonated at NH1. The extended identifier codes are not compatible with the 3-character format restrictions for the residue identifier in the PDB format, so these codes do not currently appear in PDB files. In PDB entries, protonated residues are identified by the 3-character code of their parent amino acid; however, the atom nomenclature for protonated forms will be taken from the variant dictionary definitions.
I checked one case (1a23 with H1, H2, H3 in 1st ALA of chain A) and the ALA_xxxx_xxxx identifiers are not present in the mmCIF file. So it looks that we can're really deal with this properly at the moment.
The problem in @speleo3 's example is the same, H1 and H3 are not in the standard CC dictionary entry for GLY but in one of the companion entries. Thus there's no bonds for them.
N-H1 and N-H3 bonds are missing in current MMTF files.
Example (/1NMR/A/A/GLY`1):