Closed okbckim closed 9 months ago
Thanks for the report. I'm working on reproducing this. The report didn't include the input file, could you add that if possible? I'm assuming it's https://zinc.docking.org/substances/ZINC000156786881/ but this page doesn't offer an SDF file download, so I may need to know how you created it to help you here.
Something seems off here. Both SMILESes reuse the 1
ring-closing index several times, which I've never seen before. And assuming the input is the ZINC molecule from my last post, there shouldn't be any 4-membered rings. This may be related to the ring-closing index reuse. I don't think I can move this forward without more details about the input.
Thank you so much for looking into it. Initially I couldn't upload sdf format in this forum. I compressed it in gzip complying to file format for uploading. This compound is from enamine, here is the link, https://enaminestore.com/catalog/Z1559982889 Z1559982889.sdf.gz
Hard to isolate it to the toolkit's default behavior, but then again I can't get anything to pass isomorphism
In [1]: from openff.toolkit import Molecule
In [2]: smiles1 = "O=C(Nc1c[nH]nc1-c1ccccn1)c1ccn(C2CCCC2)n1"
In [3]: smiles2 = "O=C(Nc1cnc1-c1ccccn1)c1ccn(C2CCCC2)n1"
In [4]: Molecule.from_file("Z1559982889.sdf").is_isomorphic_with(Molecule.from_smiles(smiles1))
Out[4]: False
In [5]: Molecule.from_file("Z1559982889.sdf").is_isomorphic_with(Molecule.from_smiles(smiles2))
Out[5]: False
In [6]: Molecule.from_file("Z1559982889.sdf").to_smiles(explicit_hydrogens=False)
Out[6]: 'c1ccnc(c1)C2=NNC=C2NC(=O)C3=C4C=CC=CN4N=C3' # or O=C(Nc1c[nH]nc1-c1ccccn1)c1cnn2ccccc12 with RDKit
The molecule in @okbckim's second post (Z1559982889) is different than the one in the first (Z1559977118). I've run the second molecule and verified that it does have the same problem, where the molecule in the table output is clearly different than the input.
(bespokefit) jw@mba$ BEFLOW_OPTIMIZER_KEEP_FILES=True openff-bespoke executor run --n-fragmenter-workers 2 --n-optimizer-workers 2 --n-qc-compute-workers 2 --qc-compute-n-cores 1 --file Z1559982889.sdf --output 1.json --output-force-field 1.offxml --workflow "default" --default-qc-spec xtb gfn2xtb none ──────────────────────────────── OpenFF Bespoke ──────────────────────────────── [✓] bespoke executor launched 1. preparing the bespoke workflow [✓] 1 molecules found [✓] fitting schemas generated 2. submitting the workflow [✓] the following workflows were submitted ┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ ID ┃ SMILES ┃ NAME ┃ FILE ┃ ┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ 1 │ O=C(Nc1cnc1-c1ccccn1)c1cnn2ccccc12 │ Z1559982889 │ Z1559982889.sdf │ └────┴────────────────────────────────────┴─────────────┴─────────────────┘ 3. running the fitting pipeline [✓] fragmentation successful [✓] qc-generation successful [✓] optimization successful outputs have been saved to 1.json the bespoke force field has been saved to 1.offxml worker: Warm shutdown (MainProcess) worker: Warm shutdown (MainProcess) worker: Warm shutdown (MainProcess)
The SMILES in the output has a 4-membered ring, which isn't present in the input mol. And it also shows the re-use of the 1
ring-closing index, which I suspect is related.
My next two questions are:
This sure is a strange one the code to print the smiles is here, Ill dig back some more to see if we do anything to the molecule before printing the smiles.
I've confirmed that the output FF does have all its bespokefit-created torsion parameters applied to the input molecule. So it's unlikely that the actual parameters are being fit to mangled chemistry. The new torsion parameters in the file are t169-t184, and I see them all appearing when I inspect parameter application using the output FF.
from openff.toolkit import Molecule, ForceField
ff = ForceField('1.offxml')
mol = Molecule.from_file("Z1559982889.sdf")
interchange = ff.create_interchange(mol.to_topology())
seen_params = set()
for key in interchange.collections["ProperTorsions"].get_mapping().keys():
seen_params.add(ff["ProperTorsions"][key.id].id)
print(sorted(list(seen_params)))
['t138', 't169', 't170', 't171', 't172', 't173', 't174', 't175', 't176', 't177', 't178', 't179', 't180', 't181', 't182', 't183', 't184', 't43', 't44', 't45', 't47', 't73', 't80', 't84', 't85', 't86']
My files are here: bespokefit-319.tar.gz
So I'm going to move forward assuming that this closes the book on question 1. I'll look into how to fix 2 next.
Thank you very much for thoroughly investigating this issue. Very good to hear it is just cosmetic issue.
Ha, rich's formatting modifies the string before printing it out to table.
import rich
import rich.table
table = rich.table.Table()
table.add_column("SMILES")
table.add_row('O=C(Nc1c[nH]nc1-c1ccccn1)c1cnn2ccccc12') # Original
table.add_row('O=C(Nc1cnHnc1-c1ccccn1)c1cnn2ccccc12') #Without brackets around nH
table.add_row('O=C(Nc1c\[nH]nc1-c1ccccn1)c1cnn2ccccc12') # With escaped brackets around nH
console = rich.get_console()
console.print(table)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ SMILES ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ O=C(Nc1cnc1-c1ccccn1)c1cnn2ccccc12 │
│ O=C(Nc1cnHnc1-c1ccccn1)c1cnn2ccccc12 │
│ O=C(Nc1c[nH]nc1-c1ccccn1)c1cnn2ccccc12 │
└────────────────────────────────────────┘
So next I need to figure out how to get rich print the string exactly as it's given.
rich.markup.escape(string)
seems to do the trick. I'll open a PR.
Any tips for the easiest way to test for this change?
Something in here could probably be frankensteined with mocking to capture the console after submitting a molecule: https://github.com/openforcefield/openff-bespokefit/blob/3f5e73edeaebd2361a76e8a0b4dcd0641e5a7342/openff/bespokefit/tests/cli/executor/test_submit.py#L324-L360
Description
I ran openff-bespoke in command with the compound in the right panel. But openff-bespoke output file show wrong SMILES as shown in the left panel. The correct SMILES is
O=C(Nc1c[nH]nc1-c1ccccn1)c1ccn(C2CCCC2)n1
and output SMILES isO=C(Nc1cnc1-c1ccccn1)c1ccn(C2CCCC2)n1
. Note that[nH]
is missing in the wrong SMILES.Please describe what you were trying to do, what you expected to happen, and what happened instead. Use the correct SMILES
Reproduction
Output
Software versions
openff-bespokefit 0.2.3
Ubuntu
as in document
Are you using Apple Silicon? If so, are you running BespokeFit in Rosetta (
osx-64
) or natively (osx-arm64
)?No
What is the output of running
conda list
?Output of
conda list