openforcefield / openff-bespokefit

Automated tools for the generation of bespoke SMIRNOFF format parameters for individual molecules.
https://docs.openforcefield.org/bespokefit
MIT License
61 stars 9 forks source link

Could validate_smirks fall back to an RDKit/OE backend instead of or addition to chemper? #373

Open lilyminium opened 6 days ago

lilyminium commented 6 days ago

Description

I think I'm running into https://github.com/MobleyLab/chemper/issues/99 or a variant when I try to create the following:

from openff.bespokefit.schema.smirnoff import AngleSMIRKS
AngleSMIRKS(
    smirks="[H][C:2]~1~[C:1](~N(~C(~[N:3]1)[H])[H])C([H])([H])[C]([H])([C]=O)[N][H]",
    attributes={"k", "angle"}
)

raises

Warning: Unable to parse SMARTS
Warning: [H]-,:[C:2]~[C:1](-,:[C](-,:[C](-,:[C]=[O])(-,:[H])-,:[N]-,:[H])(-,:[H])-,:[H])~[N](~[C](-,:[H])~[N:3]1)-,:[H]
Warning:                                                                                                               ^

---------------------------------------------------------------------------
SMIRKSParsingError                        Traceback (most recent call last)
Cell In[2], line 1
----> 1 AngleSMIRKS(
      2     smirks="[H][C:2]~1~[C:1](~N(~C(~[N:3]1)[H])[H])C([H])([H])[C]([H])([C]=O)[N][H]",
      3     attributes={"k", "angle"}
      4 )

File ~/micromamba/envs/openff-param-fit/lib/python3.10/site-packages/pydantic/main.py:339, in pydantic.main.BaseModel.__init__()

File ~/micromamba/envs/openff-param-fit/lib/python3.10/site-packages/pydantic/main.py:1074, in pydantic.main.validate_model()

File ~/micromamba/envs/openff-param-fit/lib/python3.10/site-packages/pydantic/fields.py:895, in pydantic.fields.ModelField.validate()

File ~/micromamba/envs/openff-param-fit/lib/python3.10/site-packages/pydantic/fields.py:1154, in pydantic.fields.ModelField._apply_validators()

File ~/micromamba/envs/openff-param-fit/lib/python3.10/site-packages/pydantic/class_validators.py:304, in pydantic.class_validators._generic_validator_cls.lambda4()

File ~/micromamba/envs/openff-param-fit/lib/python3.10/site-packages/openff/bespokefit/schema/smirnoff.py:64, in BaseSMIRKSParameter._check_smirks(cls, value)
     62 @validator("smirks")
     63 def _check_smirks(cls, value: str) -> str:
---> 64     return validate_smirks(value, cls._expected_n_tags())

File ~/micromamba/envs/openff-param-fit/lib/python3.10/site-packages/openff/bespokefit/schema/smirnoff.py:23, in validate_smirks(smirks, expected_tags)
     18 def validate_smirks(smirks: str, expected_tags: int) -> str:
     19     """
     20     Make sure the supplied smirks has the correct number of tagged atoms.
     21     """
---> 23     smirk = ChemicalEnvironment(smirks=smirks)
     24     tagged_atoms = len(smirk.get_indexed_atoms())
     26     assert tagged_atoms == expected_tags, (
     27         f"The smirks pattern ({smirks}) has {tagged_atoms} tagged atoms, but should "
     28         f"have {expected_tags}."
     29     )

File ~/micromamba/envs/openff-param-fit/lib/python3.10/site-packages/chemper/graphs/environment.py:561, in ChemicalEnvironment.__init__(self, smirks, label, replacements)
    559 # Check that the created Environment is valid
    560 if not self.is_valid():
--> 561     raise SMIRKSParsingError("Input SMIRKS (%s), converted to %s \
    562             is now invalid" % (smirks, self.as_smirks()))
    564 return

SMIRKSParsingError: (SMIRKSParsingError(...), 'Input SMIRKS ([H][C:2]~1~[C:1](~N(~C(~[N:3]1)[H])[H])C([H])([H])[C]([H])([C]=O)[N][H]), converted to [C:1](~[C:2]-,:[H])(-,:[C](-,:[C](-,:[C]=[O])(-,:[H])-,:[N]-,:[H])(-,:[H])-,:[H])~[N](~[C](-,:[H])~[N:3]1)-,:[H]                     is now invalid')

Since the validation just checks that the correct number of atoms is tagged, would it be possible to substitute that with a fuller cheminformatics toolkit backend or even a regex?

jthorton commented 5 days ago

In the past we have been hesitant to do the regex ourselves but changing the validator to use rdkit or openeye is a great idea and should fix this issue going forward, would you like to make a PR for the change?