Plugin system for grammar styles

I've been working on a something that overlaps with this issue for the past few days. Because of some future goals I have with FF in foyer, I wanted to be able to use a broader set of SMARTS features than is currently implemented. I played with generalizing the GRAMMAR, but didn't think myself up to making those changes and all the changes elsewhere in the code that would be needed. Instead, focusing on atom types for chemical elements (not non-element '_' atoms), I just use a boolean switch in the call to FF.apply to select some new functionality and pass slightly modified SMARTS strings from the forcefield directly to rdkit to use its SMARTS substructure matching. The experimental code I have so far seems to work and can type all of the test OPLSAA molecules. The benefit is immediate access to almost the entire SMARTS grammar. For example, This: <Type name="opls_145" class="CA" element="C" mass="12.01100" def="[C;X3;r6]1[C;X3;r6][C;X3;r6][C;X3;r6][C;X3;r6][C;X3;r6]1" overrides="opls_141,opls_142" doi="10.1021/ja9621760"/>

can become: <Type name="opls_145" class="CA" element="C" mass="12.01100" def="[c]1ccccc1" overrides="opls_141,opls_142" doi="10.1021/ja9621760"/>

This works fine to type benzene. Definitions based on other atom types, for example in the aromatic H on carbon: <Type name="opls_146" class="HA" element="H" mass="1.00800" def="[H][c;%opls_145]" overrides="opls_144" desc="benzene H" doi="10.1021/ja9621760"/> works as "recursive smarts" is used and the above def is internally converted to: [#1][c;$([c]1ccccc1)].

The current defs in oplsaa.xml continue to work when passed through the new system. As evidenced in the above aromatic H defintion ( [H] --> [#1]), I had to do some ad hoc modifications to get the non-SMARTS-standard (or at least non-rdkit-SMARTS-standard) use of explicit H's to play nice with the rdkit implementation.

The current code is experimental, non-optimized, kludgy, without proper Exceptions or much validation, etc., but it seems to work just fine. Except for handling the boolean to turn this feature on/off, no changes to the code were made except in atomtyper.py and that is mostly additions. Of course, one must turn off FF validation if new defintions are to be used. The SMARTSGraph is replaced with a simple object that just holds the smarts_string, typemap, etc. and has a simple find_matches method that builds an rdkit molecule and calls rdkit substructure matching. I have an idea about how this might possibly be extended to non-element atoms, but that is unimplemented or tested.

It seems that something like this approach could be a valuable expansion of foyer capabilities without the work involved in an expanded GRAMMAR. Let me know if this is of interest to the developers.

mosdef-hub / foyer

Plugin system for grammar styles #377