Thank you for open-sourcing this dataset to the community. I have been trying to explore the .hdf5 files - I used the 10GS PDB ID as an example. The number of coordinates per frame here seems to be 6593, but the number of atoms (including waters) seems to be only 3521 as per the PDB file. So do the coordinates also include water positions? I tried to find out from the residue maps whether "HOH" has been included as part of the residue list, but found only this:
Here, what does MOL and ACE correspond to? Also why is only the HIE protonation state of histidine considered and not the others? If you can help me understand these aspects and regarding how to obtain the water positions from the trajectory coordinates, it will be really helpful.
Dear team,
Thank you for open-sourcing this dataset to the community. I have been trying to explore the .hdf5 files - I used the 10GS PDB ID as an example. The number of coordinates per frame here seems to be 6593, but the number of atoms (including waters) seems to be only 3521 as per the PDB file. So do the coordinates also include water positions? I tried to find out from the residue maps whether "HOH" has been included as part of the residue list, but found only this:
{0: 'MOL', 1: 'ACE', 2: 'ALA', 3: 'ARG', 4: 'ASN', 5: 'ASP', 6: 'CYS', 7: 'CYX', 8: 'GLN', 9: 'GLU', 10: 'GLY', 11: 'HIE', 12: 'ILE', 13: 'LEU', 14: 'LYS', 15: 'MET', 16: 'PHE', 17: 'PRO', 18: 'SER', 19: 'THR', 20: 'TRP', 21: 'TYR', 22: 'VAL'}
Here, what does MOL and ACE correspond to? Also why is only the HIE protonation state of histidine considered and not the others? If you can help me understand these aspects and regarding how to obtain the water positions from the trajectory coordinates, it will be really helpful.
With regards, Sowmya