openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
309 stars 90 forks source link

Add `ChemicalEnvironmentParsingError` #1695

Closed mattwthompson closed 11 months ago

mattwthompson commented 1 year ago

Currently, inside of either RDKitToolkitWrapper._find_smarts_matches or OpenEyeToolkitWrapper._find_smarts_matches, a ValueError is raised when the SMARTS pattern of the query is not successfully parsed. This adds a more specific exception (#771) with backwards-compatibility in order for downstream tools to better parse this particular error as distinct from the (many) other ways a ValueError could be raised.

codecov[bot] commented 1 year ago

Codecov Report

Merging #1695 (094362f) into main (ee03804) will increase coverage by 0.02%. The diff coverage is 100.00%.

Additional details and impacted files
mattwthompson commented 1 year ago

Not sure why the docs build failed, I re-kicked it.

I think this is good to go; if there was any extension it might be thinking through other cases of SMARTS parsing failures elsewhere.

mattwthompson commented 1 year ago

These are SMARTS queries to RDKit and OEChem, not SMIRKS queries

j-wags commented 1 year ago

How about ChemicalEnvironmentParsingError? Having thought about this a bit, our current usage of "SMARTS" and "SMIRKS" in our API is inconsistent/incorrect/confusing and this will keep coming up. The expressions we use are neither SMILES, nor SMARTS, nor SMIRKS, they're a subset of features of each. So it would probably be good to call those expressions "chemical environments" (or "SMIRNOFF chemical environments"), to communicate that we're referring to our special use case.

So I'm thinking we'll eventually want all of our SMARTS/SMIRKS parsing errors to become ChemicalEnvironmentParsingErrors, but in light of the current consideration - "we either add SMARTSParsingError which is technically correct but thematically wrong, or we stick with SMIRKSParsingError which is thematically correct but technically wrong" - I'm thinking we can short-circuit this by just introducing the exception type we will ultimately want (ChemicalEnvironmentParsingError) and having one less thing to deprecate in the long run.

It was probably a mistake to ever expose a method called find_smarts_matches, because what we really intend to do is chemical_environment_matches. There's no need for us to provide yet another SMARTS matching method - there are already many cheminfomatics toolkits that do SMARTS matching faster and more flexibly than we ever will.

In future API-breaking releases we can remove find_smarts_matches in favor of a new chemical_environment_matches, and migrate SMIRKSParsingError to ChemicalEnvironmentParsingError.

If that makes sense to you, I'll open an issue to track this migration plan.

mattwthompson commented 1 year ago

In future API-breaking releases we can remove find_smarts_matches in favor of a new chemical_environment_matches

If the toolkit is not meant to have SMARTS-matching facilities, maybe it should simply be removed? It's not used internally to the toolkit at all:

$ grep -ri --exclude=openff/toolkit/_tests/test_toolkits.py '\.find_smarts' openff/toolkit
openff/toolkit/topology/molecule.py:            matches = toolkit_registry.find_smarts_matches(  # type: ignore[attr-defined]

This is a bigger change than having a smarts argument raise a more descriptive error when it cannot be parsed, but maybe it's in line with your objectives.

j-wags commented 1 year ago

If the toolkit is not meant to have SMARTS-matching facilities, maybe it should simply be removed? It's not used internally to the toolkit at all:

Since we're using a weird variant/subset of SMIRKS/SMARTS, I think it'll be handy to expose some utility short of ParameterHandler.label_molecule that lets FF developers test out chemical environments. So I'm happy to have a method or two for this purpose stay around.