Open wutobias opened 3 years ago
Found this PR that seems to target the same issue: https://github.com/openforcefield/cmiles/pull/37 Not sure about the details of this particular solution but it seems that the error message from rdkit is parsed in order to identify the offending N atom. This seems to be fragile, since rdkit might change the formatting of error messages in the future. The above solution explicitly counts the number of bonded neighbors, which should be more stable.
Sorry for the delay on this -- CMILES is effectively deprecated, though we've done a really bad job of communicating it. Its functionality has been migrated to the actively-supported OpenFF toolkit and QCSubmit packages.
QCA molecules submitted by OpenFF now all have CMILES records attached to them, which explicitly contain the connection graph (including things like bond order and formal charge). So we shouldn't need to do any guessing about nitrogens.
We need to put out a migration guide for users for how to do all their favorite CMILES workflows using these two other tools. In the meantime, here's a little code snippet I put together to pull down initial and final molecules using the OpenFF toolkit. I think that there's a more streamlined pathway either already available in QCSubmit, or that will be available in the next release.
from openff.toolkit.topology import Molecule
import pint
from simtk.openmm import unit
import numpy as np
import qcportal as ptl
from openff.toolkit.topology import Molecule
punit = pint.UnitRegistry()
# Some stuff I scapped together form other code bits. I don't understand how this works.
client= ptl.FractalClient()
ds = client.get_collection('OptimizationDataset', "FDA Optimization Dataset 1")
entry = ds.get_entry('C[NH2+]C[C@@H](c1ccc(c(c1)O)O)O-0')
qcrecord = ds.get_record("C[NH2+]C[C@@H](c1ccc(c(c1)O)O)O-0",
specification="default")
initial_qcmol = client.query_molecules(entry.initial_molecule)[0]
final_qcmol = qcrecord.get_final_molecule()
# set conformer qcmol geometry
initial_geometry = unit.Quantity(
np.array(initial_qcmol.geometry, float), unit.bohr
)
final_geometry = unit.Quantity(
np.array(final_qcmol.geometry, float), unit.bohr
)
# Make a single mol with both conformers so we can visualize it in nglview
combined_offmol = Molecule.from_qcschema(entry)
combined_offmol.add_conformer(initial_geometry.in_units_of(unit.angstrom))
combined_offmol.add_conformer(final_geometry.in_units_of(unit.angstrom))
# Make OE and RDKit mols from this combined offmol
oemol = combined_offmol.to_openeye()
rdmol = combined_offmol.to_rdkit()
combined_offmol.visualize(backend='nglview')
Thanks for looking into this. I am now changing my workflow towards Molecule.from_qcschema
.
Sorry for the delay on this -- CMILES is effectively deprecated, though we've done a really bad job of communicating it
@j-wags : Would it make sense to add a note at the top of the https://github.com/openforcefield/cmiles repo README to communicate this?
Good call. I've just added a notice and some references to new API points to the top of the README.
Hi folks,
please have a look at the following code snippet:
The
oemol
is returned successfully, howeverrdmol
cannot be created due to valence issues with N atoms:RDKit ERROR: [13:24:10] Explicit valence for atom # 9 N, 4, is greater than permitted
. I think this is fixable with the following code block inserted after line 44 in cmiles/_cmiles_rd.py (not elegant though, won't cover general valence issues):Regards, Tobias