molinfo-vienna / CDPKit

The Chemical Data Processing Toolkit
https://cdpkit.org
GNU Lesser General Public License v2.1
75 stars 10 forks source link

Energetic issues of conformer generated by ConfGen #11

Closed Hong-Rui closed 11 months ago

Hong-Rui commented 11 months ago

Hi Thomas, In this issue I want to report some weird conformations generated from CDPKit confgen. test.zip

In the uploaded zip file, there are a few molecules with problematic conformations. In test.sdf, I show some cases where confgen cannot rotate one of the torsion angles and then the later energy minimization failed to tune the H orientation so that the molecule is in a super high energy state.

1701714097175 1701714212904

For another molecule with SMILES: [H]Oc1c([H])c([H])c([H])c(/C(=N\\N([H])C(=O)c2nn([H])c(-c3c([H])c([H])c(F)c([H])c3[H])c2[H])C([H])([H])[H])c1[H], the confgen failed with a FORCEFIELD_SETUP_FAILED:

- Molecule 1/1:
Found 1 molecular graph component
Force field setup failed: MMFF94InteractionParameterizer: could not determine MMFF94 type of atom #12
Conformer generation finished with return code FORCEFIELD_SETUP_FAILED
Processing time: 0.002s

And I met several of this kind of cases before, and wondering what cause this to happen.

In ligand_confs_generated.sdf, the molecule contain a saturated 5 member ring, with one chiral cabon. The carboxyl group connect to this chiral C normally forms an equatorial bond, as shown in the PDB cocrystal structure:

1701714465998

However, none of the confgen generated conformers have this equatorial bond, they are all axial bond instead.

1701715179102

A later MMFF94s energy minimization performed in RDKit can solve this. However, considering CDPKit confgen has a built-in MMFF94 minimization step, I'm curious why this issue cannot be solved inside the confgen itself. Is this related to the lack of rules in the torsion library or the fragment library? Or is it because of some wrong records in these two libs?

Hope this helps and hope these issues can be solved soon!! Thank you.

seidelt commented 11 months ago

Hi Hong-Rui,

Your SMILES string is not valid because is contains two consecutive backslashes. The SMILES reader is very tolerant and reads the second backslash as an atom with unknown type. If you remove the second backslash the molecule can be processed without problems. Please check if other SMILES strings in your data (for which the force field setup fails) suffer from the same problem.

It seems you expect something for which the CDPKit conformer generation method has not been designed for. The goal is to produce structurally diverse conformer ensembles which, on average, contain one or more conformers that are structurally highly similar to a potential bioactive conformation. In the systematic sampling mode (which is used for the given example molecule) NO final force field (FF) optimization of the conformers is performed! The FF is only used for an energy-based ranking of the conformers. This is done for a good reason: i) The FF optimization will be done for a molecule in vacuum, any binding site or crystal environment is not considered here. ii) As a consequence the optimized structure would (on average) deviate significantly from the binding conformation - which works against the intended goal. You can easily verify this yourself: perform a force field optimization on 10 randomly selected flexible PDB ligands. In most cases the opt. 3D structure will deviate considerably from the initial structure. The torsion angles of the generated conformers solely come from matching torsion library entries. The angles in the library represent prominent peaks of torsion angle angle histograms obtained from an analysis of a large set of X-ray structures (which covered all PDB ligands). Although your example PDB ligand shows an equatorial orientation of the carboxylate group and MMFF94 also prefers it, this orientation seems not be preferred in most of the experimental structures. Otherwise this orientation would be represented by corresponding torsion angles in the matching library entry.

Since the equatorial orientation can be observed in an experimental structure (your example PDB ligand) I have added torsion angles leading to the desired orientation to the corresponding torsion library entry. The generated conformer ensemble now also features conformers with an equatorial carboxylate group orientation. Please pull the latest main branch changes into your local copy and rebuild...

Hong-Rui commented 11 months ago

Hi Thomas,

Thank you for your reply! For the axial and equatorial bond problem, I've installed the new master branch, and it works, with a mixed population of axial (70%) and equatorial (30%) bond for this carboxyl group.

For the FORCEFIELD_SETUP_FAILED error, I tried to remove one back slash in the SMILES, and it successfully generated conformers. However, this conformer looks very impossible, as the same problem as stated above:

1701807177840

With the lack of energy minimization in systematic sampling mode, for some cases the conformers contain clashes. I totally agree with that bioactive conformations are not at a lowest energy state for single molecule conformer energy, and thus a fragment and torsion drive based confgen method should be employed, and that's why I choose CDPKit Conforge as our main confgen engine over rdkit's EDKTG embedding, due to the much more abundant fragment library and a useful torsion library.

However, for these cases presented above, it seems that for the confgen to work well, the torsion library needs to be supplemented with some additional rules to deal with this kind of weird case. But then we may need to collect more crystal structure examples for these torsions....

I'm not sure if what I'm saying is correct or not. Thanks for the help!

seidelt commented 11 months ago

Hi Hong-Rui,

hint: try to add the -A flag to your command line. This should help to avoid VdW clashes for such molecules....

Hong-Rui commented 11 months ago

Hi Thomas, I tried the -A flag, and it does prevent the clashes to some degree, but seems not enough for this case. please see the attach 2 sdf files: test.zip

The ligand_confs_generated.sdf are generated with the -A flag, and the ligand_batch_prepared.sdf are subsequently energy minimized conformer from the confgen generated conformation (with -A flag).

It seems that the conformer from confgen with -A flag still has the problem that the aromatic C is too close to the N-H, while the minimized conformer tunes this torsion a little bit and avoid this issue.

Is there any way that we can fix this, so that we don't need to do MMFF minimization after confgen, at least for this case?

seidelt commented 11 months ago

Hi Hong-Rui,

indeed, the tweaks done by the -A option are not sufficient here. I added a few more angles to the corresponding torsion library entry (master) that now lead to more reasonable conformers of the test molecule. I recommend to keep the -A flag which in general produces better structures (at the cost of longer processing times, though)...

Hong-Rui commented 11 months ago

Hi Thomas

test_new.zip

The updated code solves the above one case. However, we need you to add more torsion rules as some other cases are still probmatic. In the attached test_new.sdf, like what's mentioned above, the second and third conformation are still weird and one corresponding torsion rule is needed.

1701937401917 1701937415381

And we may face more other cases later on....

So thanks for your help!

seidelt commented 11 months ago

Hi Hong-Rui,

I have fixed the torsion library entries in the master branch, give it a try! One suggestion: send more problematic cases at once instead of 1-2 at a time. You also don't need to create 3D renderings, I have to look at the molecules in a 3D viewer anyway.

Hong-Rui commented 11 months ago

Okay, thanks a lot! Currently we only see these. Will send you more after we test that in a high throughput VS library.

seidelt commented 11 months ago

I will close this issue now. If you identify more problematic molecules, just open a new issue for them....