jwohlwend / boltz

Official repository for the Boltz-1 biomolecular interaction model
MIT License
1.24k stars 127 forks source link

Ligand bond order in the mmcif #49

Open Bae-SungHan opened 5 days ago

Bae-SungHan commented 5 days ago

Hello, I'm trying to measure RMSD of ligand docking pose predicted by Boltz.

However, in some cases, when I tried reading resulting mmCIF file, parsed ligand part and compare with ground-truth ligand with openbabel and rdkit, it raised the error indicating that rdkit.Mol substructure is not found. Except this error case, I should assign correct bond order for predicted ligand with ground-truth ligand. I think the main reason for all these is because mmCIF files can display the bond order of molecules like sdf files, but the output from boltz does not have that information. This is true even when the hydrogen and bond are explicitly recorded in SMILES provided as the ligand input of Boltz.

If the user specifies the bond order in SMILES, can this be displayed in the Boltz output mmCIF file as well?

Jnelen commented 4 days ago

In my case, the aromaticity is displayed, but "regular" double bonds are not. Fortunately, I can interpret the correct bond orders using Maestro, which works in most cases, though it's still tedious to do manually for a large number of complexes.

Additionally, I believe it would be great to have an option to include explicit hydrogens in the output ligand. This would be particularly useful for downstream applications where explicit hydrogens are required, such as when using tools like PLIP to predict potential hydrogen bond interactions.