t7morgen / misato-dataset

GNU Lesser General Public License v2.1
182 stars 17 forks source link

Error when converting from h5 to PDB #8

Closed pujaltes closed 9 months ago

pujaltes commented 10 months ago

The h5_to_pdb.py incorrectly splits some GLN and ASN residues when converting to pdb format. See example when converting 4KNB.pdb:

image

In the script, the residue index is increased (a new residue has begun) when there is an O-N pair in the atom sequence. However, as pointed out here GLN and ASN contain an O-N within the AA. While the script accounts for this by ignoring index 12 and 9 in GLN and ASN respectively it misses the fact that the O-N pair can be in another location within the AA. From atoms_name_map_for_pdb.pickle we can see that this can also occur at indices 14 (GLN) and 11 (ASN).

image image

t7morgen commented 5 months ago

I added now https://github.com/t7morgen/misato-dataset/blob/master/src/data/processing/h5_to_traj.py, which should be in general more robust because it conserves the AMBER topology format (with all the atom Names, TERs etc.).