Some questions - Githubissues

Thanks for you opening our source code and the fantastic works.

I have a few questions I’d like to ask.

Firstly, the protein-ligand complexes in MISATO likely come primarily from the PDBbind database. However, in the Ligand Binding Dataset (LBA), the experimentally measured binding affinities are fewer than those provided in the misato-affinity/data/affinity_data.csv. How were these additional binding affinities obtained? Were they calculated or model-predicted?

Secondly, the molecular-protein complexes in MISATO-MD are represented in PDB format, which omits bonding information for small molecules. How can these be restored as valid SDF files? I found corresponding molecules in the QM dataset, but the bonding was incorrect, and I could not restore them to SDF format. Could an official script be provided to convert the MISATO-MD files into separate PDB and SDF files? Additionally, I noticed that some hydrogen atoms were missing in the protein’s PDB structure, leading to unsaturated bonding, which prevents further calculations. Is there a way to refine these structures for saturated bonding to make subsequent quantum calculations feasible?

kierandidi / misato-affinity

Some questions #8