t7morgen / misato-dataset

GNU Lesser General Public License v2.1
172 stars 17 forks source link

Ligand bonding information is lost. #2

Closed Georgefwt closed 1 year ago

Georgefwt commented 1 year ago

With the provided HDF5 file, the protein topology with the chemical bond can be restored (standard residues have templates), but the non-standard ligand bonding information is lost.

So we can't retrieve the original ligand structure, e.g., whether two atoms are connected with single/double/aromatic bond. Could you provide the mol2 or sdf file so we can export mol2 formatted ligands with bonding information from the dataset? We think they should be the input of the MD file preparations.

Thank you in advance!

t7morgen commented 1 year ago

The bonds can be found in the QM h5 file. See e.g. for 10GS: qmh5 = h5py.File('QM.hdf5') qmh5['10GS']['atom_properties']['bonds'][()] Does that help?

Georgefwt commented 1 year ago

Is bonding information in QM identical to the bonding info in MDdataset? Can we directly apply it to MD dataset?

t7morgen commented 1 year ago

Yes, the ligands are the same and the ordering of atoms are also the same, so the bonding information is correct. We found an issue for peptides (~1400 cases), here the ordering is not an exact match in most cases. But you should be able to apply the standard residue template for these cases. We will fix the peptides ordering issue around next week and update the QM file.

Georgefwt commented 1 year ago

Thank you very much for your prompt response! If that's the case, could you please provide a list of the peptides? Alternatively, if it's not possible to provide the list, I will wait for you to address the ordering issue in the dataset.

t7morgen commented 1 year ago

You can find the list of peptide ids in data/peptides.txt. I will close the issue now.