scymtym / language.smarts

Common Lisp parser for the SMILES and SMARTS languages
GNU General Public License v3.0
3 stars 1 forks source link

Accept logical connectives in bond tests #1

Open drmeister opened 5 years ago

drmeister commented 5 years ago

Below are some valid SMARTS strings from the open force field effort. You will see the simple bond specifications like "~" "-" "=" and then more complex ones like: "~;@", "~;!@" and "-;!@". So bonds need their own logical tests!

The Daylight manual doesn't say much about this. http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html#RTFToC36 . "Section 4.2" There are some examples at the bottom of the link above like this...

| *@;!:* | two atoms connected by a non-aromatic ringbond |

My guess is it's just logicals for atom tests applied to bonds.

Below are examples from the Smirnoff force-field. (https://github.com/openforcefield/openforcefield/blob/master/openforcefield/data/forcefield/smirnoff99Frosst.offxml)

"[*:1]~[#6X4:2]-[*:3]"
"[#1:1]-[#6X4:2]-[#1:3]"
"[*;r3:1]1~;@[*;r3:2]~;@[*;r3:3]1"
"[*;r3:1]~;@[*;r3:2]~;!@[*:3]"
"[*:1]~;!@[*;r3:2]~;!@[*:3]"
"[#1:1]-[*;r3:2]~;!@[*:3]"
"[#6r4:1]-;@[#6r4:2]-;@[#6r4:3]"
"[!#1:1]-[#6r4:2]-;!@[!#1:3]"
"[!#1:1]-[#6r4:2]-;!@[#1:3]"
"[*:1]~[#6X3:2]~[*:3]"
"[#1:1]-[#6X3:2]~[*:3]"
"[#1:1]-[#6X3:2]-[#1:3]"
"[*;r6:1]~;@[*;r5:2]~;@[*;r5;x2:3]"
"[*:1]~;!@[*;X3;r5:2]~;@[*;r5:3]"
drmeister commented 5 years ago

Grrr - they have this '@' ring bond test - I can't find anything that says how big the ring can be (3-6 members make sense). This "specification" is so vague.

scymtym commented 5 years ago

So the AST for something like

*@;!:*@* first two atoms connected by a non-aromatic ringbond
         third atom connected arbitrary ringbond

is supposed to look like

  chain──────────────────────────────┐
   │ └──────────┐                    │
   │atom        │first               │second
   │            │                    │
anyatom   chain-element         chain-element
           │         │           │         │
           │bond     │atom       │bond     │atom
           │         │           │         │
        weak-and  anyatom    ring-bond   anyatom
           │
    ┌──────┴──────┐
    │             │
ring-bond        not
                  │
                  │
            aromatic-bond

, right?

drmeister commented 5 years ago

Yes - we transform the chain and chain-element parts into a different graph structure that I could sketch but we are already working with the current smarts parser just fine. The new stuff is everything that flows below "bond".

Page 21 of http://www.daylight.com/dayhtml/doc/theory/index.pdf (Section 4.3 Logical Operators) says that "Atom and bond primitive specifications may be combined to form expressions by using logical operators."

Now if you would rather I do this - or if you could just check my thinking because I'd like to understand the parser...

More coming...

drmeister commented 5 years ago

I got this much working. What I still struggle with is operators, where is the operand specified when you specify operators?

Is it the last argument to parser.common-rules.operators:define-operator-rules

@@ -143,10 +143,49 @@

 ;;; SMARTS 4.2 Bonds Primitives

+;;; original code without logicals
+#+(or)
 (defrule bond-pattern
     (or bond-pattern/non-literal language.smiles.parser:bond))

-;;; Only additions to SMILES 3.2.2.
+;;; New code with logicals
+(defrule bond-pattern
+    (or modified-bond-pattern one-bond-pattern))
+
+(defrule modified-bond-pattern
+  (and weak-and-bond-expression)
+  (:lambda (expression &bounds start end)
+    (architecture.builder-protocol:node* (:logical-bond-expression :bounds (cons start end))
+                                         (1 :bond-expression expression))))
+
+;;; SMARTS 4.3 Logical Operators
+
+(macrolet ((define-operator-rule (name expression
+                                  &optional (value (make-keyword name)))
+             (let ((rule-name (symbolicate '#:operator-bond- name)))
+               `(defrule ,rule-name
+                    ,expression
+                  (:constant ,value)))))
+  (define-operator-rule weak-and     #\;   :weak-and)
+  (define-operator-rule or           #\,)
+  (define-operator-rule strong-and   #\&   :strong-and)
+  (define-operator-rule not          #\!)
+  (define-operator-rule implicit-and (and) :implicit-and))
+
+(parser.common-rules.operators:define-operator-rules
+    (:skippable?-expression nil)
+  (2 weak-and-bond-expression     operator-bond-weak-and)
+  (2 or-bond-expression           operator-bond-or)
+  (2 strong-and-bond-expression   operator-bond-strong-and)
+  (1 not-bond-expression          operator-bond-not)
+  (2 implicit-and-bond-expression operator-bond-implicit-and)
+  one-bond-pattern)
+
+
+(defrule one-bond-pattern
+    (or bond-pattern/non-literal language.smiles.parser:bond))
+
+;;; Only additions to SMILES 3.2.2 and higher.
 (macrolet
     ((define-rules (&body clauses)
        (let ((rules '()))
@@ -164,7 +203,9 @@
   (define-rules
     (wildcard            #\~)
     (up-or-unspecified   "/?")
-    (down-or-unspecified "\\?")))
+    (down-or-unspecified "\\?")
+    (same-ring           #\@) ; added in smarts 4.6
+    ))
drmeister commented 5 years ago

Here are some results...

CANDO-USER> (language.smarts.parser:parse "C-C" 'list)
(:CHAIN
 (:ELEMENT
  (((:ATOM NIL :KIND :ORGANIC :SYMBOL "C" :BOUNDS (0 . 1)))
   ((:BOND (:ATOM (((:ATOM NIL :KIND :ORGANIC :SYMBOL "C" :BOUNDS (2 . 3)))))
     :KIND
     (:LOGICAL-BOND-EXPRESSION (:BOND-EXPRESSION (((:SINGLE)))) :BOUNDS
      (1 . 2))
     :BOUNDS (1 . 3)))))
 :BOUNDS (0 . 3))
NIL
T
CANDO-USER> (language.smarts.parser:parse "C-;!@C" 'list)
(:CHAIN
 (:ELEMENT
  (((:ATOM NIL :KIND :ORGANIC :SYMBOL "C" :BOUNDS (0 . 1)))
   ((:BOND (:ATOM (((:ATOM NIL :KIND :ORGANIC :SYMBOL "C" :BOUNDS (5 . 6)))))
     :KIND
     (:LOGICAL-BOND-EXPRESSION
      (:BOND-EXPRESSION
       ((((:BINARY-OPERATOR
           (:OPERAND
            ((:SINGLE)
             ((:UNARY-OPERATOR (:OPERAND ((:SAME-RING))) :OPERATOR :NOT :BOUNDS
               (3 . 5)))))
           :OPERATOR :WEAK-AND :BOUNDS (1 . 5))))))
      :BOUNDS (1 . 5))
     :BOUNDS (1 . 6)))))
 :BOUNDS (0 . 6))
NIL
T
CANDO-USER> (language.smarts.parser:parse "C-,=,:C" 'list)
(:CHAIN
 (:ELEMENT
  (((:ATOM NIL :KIND :ORGANIC :SYMBOL "C" :BOUNDS (0 . 1)))
   ((:BOND (:ATOM (((:ATOM NIL :KIND :ORGANIC :SYMBOL "C" :BOUNDS (6 . 7)))))
     :KIND
     (:LOGICAL-BOND-EXPRESSION
      (:BOND-EXPRESSION
       ((((:BINARY-OPERATOR
           (:OPERAND
            (((:BINARY-OPERATOR (:OPERAND ((:SINGLE) (:DOUBLE))) :OPERATOR :OR
               :BOUNDS (1 . 4)))
             (:AROMATIC)))
           :OPERATOR :OR :BOUNDS (1 . 6))))))
      :BOUNDS (1 . 6))
     :BOUNDS (1 . 7)))))
 :BOUNDS (0 . 7))
NIL
T
CANDO-USER> 
drmeister commented 5 years ago

And what is the role of this? Is it to organize the result inside of a (:logical-bond-expression (:bond-expression <expression>)) form?

(defrule modified-bond-pattern
  (and weak-and-bond-expression)
  (:lambda (expression &bounds start end)
    (architecture.builder-protocol:node* (:logical-bond-expression :bounds (cons start end))
                                         (1 :bond-expression expression))))
drmeister commented 5 years ago

Hi - I'd like to submit a pull request to merge my changes. Which branch should I use?

I have a directory tree for smarts that looks like this:

Directory tree
==============
[-] src
 |--[+] language-server
 |--[+] smarts
 `--[+] smiles

I don't see language-server in your repo anywhere. Did we get out of whack?

scymtym commented 5 years ago

Hi - I'd like to submit a pull request to merge my changes. Which branch should I use?

Please use the master branch. I consolidated the prototype I did in the future branch into the master branch some time ago.

I have a directory tree for smarts that looks like this:

Directory tree
==============
[-] src
 |--[+] language-server
 |--[+] smarts
 `--[+] smiles

I don't see language-server in your repo anywhere. Did we get out of whack?

I removed the language-server directory during consolidation. This probably means that your code is based on the future which shouldn't be too much of a problem.

You can submit the pull request against either branch. I will sort it out.

Did you look at the atom-map-class branch by any chance? Or will your pull request include an independent fix for that issue?