Closed brigreens closed 9 months ago
Hey, I tried to visualize your SMILES using molview. The SMILES seems to have encode an invalid molecule: many atoms are too close to each other, thus it would probably impossible for RDKit to build even initial 3D structures from the SMILES.
Thank you for your response. It seems like my other molecules with errors were either similarly crowded or had an underscore in the name.
Hi, I have some other molecules (oligomers so they are pretty large) that are resulting in the same error. Would you know why it is happening for these 2 examples?
CCOc1c(F)c(F)c(-c2nc3c(-c4cccs4)ccc(-c4ccc(-c5cc6c(-c7ccc(CC)s7)c7sc(-c8ccc(-c9ccc(-c%10ccc(-c%11cc%12c(-c%13ccc(CC)s%13)c%13sc(-c%14ccc(-c%15ccc(-c%16ccc(-c%17cc%18c(-c%19ccc(CC)s%19)c%19sccc%19c(-c%19ccc(CC)s%19)c%18s%17)s%16)c%16nc(-c%17c(F)c(F)c(OCC)c(F)c%17F)c(-c%17c(F)c(F)c(OCC)c(F)c%17F)nc%15%16)s%14)cc%13c(-c%13ccc(CC)s%13)c%12s%11)s%10)c%10nc(-c%11c(F)c(F)c(OCC)c(F)c%11F)c(-c%11c(F)c(F)c(OCC)c(F)c%11F)nc9%10)s8)cc7c(-c7ccc(CC)s7)c6s5)s4)c3nc2-c2c(F)c(F)c(OCC)c(F)c2F)c(F)c1F
CCn1nc2c(-c3cccs3)c(Cl)c(Cl)c(-c3ccc(-c4cc5c(-c6ccc(Si(CC)CC)s6)c6sc(-c7ccc(-c8c(Cl)c(Cl)c(-c9ccc(-c%10cc%11c(-c%12ccc(Si(CC)CC)s%12)c%12sc(-c%13ccc(-c%14c(Cl)c(Cl)c(-c%15ccc(-c%16cc%17c(-c%18ccc(Si(CC)CC)s%18)c%18sccc%18c(-c%18ccc(Si(CC)CC)s%18)c%17s%16)s%15)c%15nn(CC)nc%14%15)s%13)cc%12c(-c%12ccc(Si(CC)CC)s%12)c%11s%10)s9)c9nn(CC)nc89)s7)cc6c(-c6ccc(Si(CC)CC)s6)c5s4)s3)c2n1
Hey, I guess the optimizing steps and patience are too small for these molecules. Can you set opt_step=20000, patience=20000, max_confs=10
to see if you can get any results? It would probably take much longer for auto3d to run molecules of this size.
@brigreens: those are very big molecules indeed! My guess that you might be getting to the limitation of the 3d conformer generator. For example in OpenEye there is a practical limit for ~25 rotable bonds. Otherwise, the molecule is too big and it's almost impossible to find a reliable 3d conformer. Please try to load your SMILES to e.g. RDKit directly and try to generate at least one 3d structure with AllChem.EmbedMolecule
Hi, thank you both for the suggestions.
I am still running into the same issue (attached is the output file). I have tried increasing the opt_step, patience, and max_confs but I get the same error. I have also tried generating a 3D structure with RDKit and it was able to generate it, so I don't think that's where the issue is.
smi = 'CCn1nc2c(-c3cccs3)c(Cl)c(Cl)c(-c3ccc(-c4cc5c(-c6ccc([Si](CC)(CC)CC)s6)c6sc(-c7ccc(-c8c(Cl)c(Cl)c(-c9ccc(-c%10cc%11c(-c%12ccc([Si](CC)(CC)CC)s%12)c%12sc(-c%13ccc(-c%14c(Cl)c(Cl)c(-c%15ccc(-c%16cc%17c(-c%18ccc([Si](CC)(CC)CC)s%18)c%18sccc%18c(-c%18ccc([Si](CC)(CC)CC)s%18)c%17s%16)s%15)c%15nn(CC)nc%14%15)s%13)cc%12c(-c%12ccc([Si](CC)(CC)CC)s%12)c%11s%10)s9)c9nn(CC)nc89)s7)cc6c(-c6ccc([Si](CC)(CC)CC)s6)c5s4)s3)c2n1'
mol = Chem.MolFromSmiles(smi)
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol, useRandomCoords=True, randomSeed=0xf00d)
AllChem.UFFOptimizeMolecule(mol)
I have shortened the oligomer to a tetramer (previously it was a hexamer) and it ran with no issue so I am assuming it is the size of the molecule like you both suggested. Is there anything else I can try to get Auto3D to run for these large hexamers?
Also side note, I think there is a typo in the error message. Reason 2 doesn't make much sense, unless it was meant to say invalid:
The optimization engine did not run, or no 3D structure converged.
The reason might be one of the following:
1. Allocated memory is not enough;
2. The input SMILES encodes **valid** chemical structures;
3. Patience is too small
Thank you for the feedback! I appreciate it.
For the following discussion, I used this molecule:
CCOc1c(F)c(F)c(-c2nc3c(-c4cccs4)ccc(-c4ccc(-c5cc6c(-c7ccc(CC)s7)c7sc(-c8ccc(-c9ccc(-c%10ccc(-c%11cc%12c(-c%13ccc(CC)s%13)c%13sc(-c%14ccc(-c%15ccc(-c%16ccc(-c%17cc%18c(-c%19ccc(CC)s%19)c%19sccc%19c(-c%19ccc(CC)s%19)c%18s%17)s%16)c%16nc(-c%17c(F)c(F)c(OCC)c(F)c%17F)c(-c%17c(F)c(F)c(OCC)c(F)c%17F)nc%15%16)s%14)cc%13c(-c%13ccc(CC)s%13)c%12s%11)s%10)c%10nc(-c%11c(F)c(F)c(OCC)c(F)c%11F)c(-c%11c(F)c(F)c(OCC)c(F)c%11F)nc9%10)s8)cc7c(-c7ccc(CC)s7)c6s5)s4)c3nc2-c2c(F)c(F)c(OCC)c(F)c2F)c(F)c1F
I was able to reproduce the bug. The issue is related to RDKit. For the conformed embedding function in Auto3D, we set useRandomCoords=False
. As a result, no initial conformed can be generated for the above molecule. If I set useRandomCoords=True
, I indeed got conformers, but they don't look realistic. An example is attached (oligomer.sdf).
oligomer.txt. The aromatic system is bent, which should be flat.
If you have access to OpenEye Omega software, you can try that. It's another backend for generating initial conformers in Auto3D. An example output was attached.
sample_out.txt
If you want to stick with RDKit, you could try to add useRandomCoords=True
in the following line in your local installation of Auto3D:
https://github.com/isayevlab/Auto3D_pkg/blob/f463e4fd072d3e219b709cfb2b127146db05339c/src/Auto3D/isomer_engine.py#L166
Thank you for all your help. Unfortunately changing that line in my local installation did not work. I also do not have access to OpenEye. I will just shorten the oligomers for the ones causing the errors to avoid the issue.
One last question. Do you have a script where I can start with an initial geometry (for example an xyz file) instead of a SMILES to run conformer searching with?
For now, conformer searching starts with SMILES in Auto3D. One way to work around this: you can convert XYZ or SDF into corresponding SMILES then start from there. SMILES keeps the tautomer and or stereoisomer information during the conversion as long as they are specified in the original geometry. Auto3D also obeys the restrictions specified in SMILES during the conformer search process.
Not sure if it might be relevant, but Auto3D also provides a wrapper function for geometry optimization:
import Auto3D
from Auto3D.ASE.geometry import opt_geometry
sdf = "some_path.sdf"
out = opt_geometry(sdf, model_name="AIMNET")
The 'opt_geometry` function accepts an SDF file and then does geometry optimization with either AIMNET, ANI2x, or ANI-2xt. Though the process is supported with ASE, no parallel optimization is enabled.
Auto3D would be much more useful to us if we could start with an initial 3D conformer (e.g., from Avogadro) in SDF format. The stereo information is embedded in the 3D geometry, and while tautomer searching is sometimes useful, it would prevent problems with bad SMILES -> 3D embedding, which happen periodically from both RDKit and Open Babel.
We can optimize ourselves, but @brigreens would really like to do the conformer searching with Auto3D.
For what it's worth, the molview example above looks like a 2D depiction. It's not too hard to build a valid 3D geometry (see attached).
@ghutchis Thanks for the comments and information!
Hello, just an update that Auto3D can start with an initial 3D conformer using SDF file. The usage is the same as previous, we just need to replace smi file with an SDF file. The molecule IDs can also contain any characters. I will close this issue for now, please let us know in case there is any other issues.
Hi, I am looking for advice. I am trying to optimize some molecules and some of them are resulting in errors, where the optimization engine does not run. This is the error I received:
_The optimization engine did not run, or no 3D structure converged. The reason might be one of the following:
I tried increasing the patience to 5000 but it did not help. The log files show I have 40 GB of memory available so I don't think memory is the issue. I don't know what suggestion 2 means: "The input SMILES encodes valid chemical structures".
This is the input commands I use:
python auto3D.py AOIC.smi --k=1 --patience=5000
Below is an example of one of the SMILES I am working with that is causing this error. Any advice would be appreciated.CCc1ccc(C2(c3ccc(CC)cc3)c3cc4c(cc3-c3sc(/C=C5\C(=O)c6cc(F)c(F)cc6C5=C(C#N)C#N)cc32)C(c2ccc(CC)cc2)(c2ccc(CC)cc2)c2c-4sc3c2C(c2ccc(CC)cc2)(c2ccc(CC)cc2)c2c-3sc3cc(/C=C4\C(=O)c5cc(F)c(F)cc5C4=C(C#N)C#N)sc23)cc1 AOIC