Open drmeister opened 5 years ago
Grrr - they have this '@' ring bond test - I can't find anything that says how big the ring can be (3-6 members make sense). This "specification" is so vague.
So the AST for something like
*@;!:*@* first two atoms connected by a non-aromatic ringbond
third atom connected arbitrary ringbond
is supposed to look like
chain──────────────────────────────┐
│ └──────────┐ │
│atom │first │second
│ │ │
anyatom chain-element chain-element
│ │ │ │
│bond │atom │bond │atom
│ │ │ │
weak-and anyatom ring-bond anyatom
│
┌──────┴──────┐
│ │
ring-bond not
│
│
aromatic-bond
, right?
Yes - we transform the chain and chain-element parts into a different graph structure that I could sketch but we are already working with the current smarts parser just fine. The new stuff is everything that flows below "bond".
Page 21 of http://www.daylight.com/dayhtml/doc/theory/index.pdf (Section 4.3 Logical Operators) says that "Atom and bond primitive specifications may be combined to form expressions by using logical operators."
Now if you would rather I do this - or if you could just check my thinking because I'd like to understand the parser...
More coming...
I got this much working. What I still struggle with is operators, where is the operand specified when you specify operators?
Is it the last argument to parser.common-rules.operators:define-operator-rules
@@ -143,10 +143,49 @@
;;; SMARTS 4.2 Bonds Primitives
+;;; original code without logicals
+#+(or)
(defrule bond-pattern
(or bond-pattern/non-literal language.smiles.parser:bond))
-;;; Only additions to SMILES 3.2.2.
+;;; New code with logicals
+(defrule bond-pattern
+ (or modified-bond-pattern one-bond-pattern))
+
+(defrule modified-bond-pattern
+ (and weak-and-bond-expression)
+ (:lambda (expression &bounds start end)
+ (architecture.builder-protocol:node* (:logical-bond-expression :bounds (cons start end))
+ (1 :bond-expression expression))))
+
+;;; SMARTS 4.3 Logical Operators
+
+(macrolet ((define-operator-rule (name expression
+ &optional (value (make-keyword name)))
+ (let ((rule-name (symbolicate '#:operator-bond- name)))
+ `(defrule ,rule-name
+ ,expression
+ (:constant ,value)))))
+ (define-operator-rule weak-and #\; :weak-and)
+ (define-operator-rule or #\,)
+ (define-operator-rule strong-and #\& :strong-and)
+ (define-operator-rule not #\!)
+ (define-operator-rule implicit-and (and) :implicit-and))
+
+(parser.common-rules.operators:define-operator-rules
+ (:skippable?-expression nil)
+ (2 weak-and-bond-expression operator-bond-weak-and)
+ (2 or-bond-expression operator-bond-or)
+ (2 strong-and-bond-expression operator-bond-strong-and)
+ (1 not-bond-expression operator-bond-not)
+ (2 implicit-and-bond-expression operator-bond-implicit-and)
+ one-bond-pattern)
+
+
+(defrule one-bond-pattern
+ (or bond-pattern/non-literal language.smiles.parser:bond))
+
+;;; Only additions to SMILES 3.2.2 and higher.
(macrolet
((define-rules (&body clauses)
(let ((rules '()))
@@ -164,7 +203,9 @@
(define-rules
(wildcard #\~)
(up-or-unspecified "/?")
- (down-or-unspecified "\\?")))
+ (down-or-unspecified "\\?")
+ (same-ring #\@) ; added in smarts 4.6
+ ))
Here are some results...
CANDO-USER> (language.smarts.parser:parse "C-C" 'list)
(:CHAIN
(:ELEMENT
(((:ATOM NIL :KIND :ORGANIC :SYMBOL "C" :BOUNDS (0 . 1)))
((:BOND (:ATOM (((:ATOM NIL :KIND :ORGANIC :SYMBOL "C" :BOUNDS (2 . 3)))))
:KIND
(:LOGICAL-BOND-EXPRESSION (:BOND-EXPRESSION (((:SINGLE)))) :BOUNDS
(1 . 2))
:BOUNDS (1 . 3)))))
:BOUNDS (0 . 3))
NIL
T
CANDO-USER> (language.smarts.parser:parse "C-;!@C" 'list)
(:CHAIN
(:ELEMENT
(((:ATOM NIL :KIND :ORGANIC :SYMBOL "C" :BOUNDS (0 . 1)))
((:BOND (:ATOM (((:ATOM NIL :KIND :ORGANIC :SYMBOL "C" :BOUNDS (5 . 6)))))
:KIND
(:LOGICAL-BOND-EXPRESSION
(:BOND-EXPRESSION
((((:BINARY-OPERATOR
(:OPERAND
((:SINGLE)
((:UNARY-OPERATOR (:OPERAND ((:SAME-RING))) :OPERATOR :NOT :BOUNDS
(3 . 5)))))
:OPERATOR :WEAK-AND :BOUNDS (1 . 5))))))
:BOUNDS (1 . 5))
:BOUNDS (1 . 6)))))
:BOUNDS (0 . 6))
NIL
T
CANDO-USER> (language.smarts.parser:parse "C-,=,:C" 'list)
(:CHAIN
(:ELEMENT
(((:ATOM NIL :KIND :ORGANIC :SYMBOL "C" :BOUNDS (0 . 1)))
((:BOND (:ATOM (((:ATOM NIL :KIND :ORGANIC :SYMBOL "C" :BOUNDS (6 . 7)))))
:KIND
(:LOGICAL-BOND-EXPRESSION
(:BOND-EXPRESSION
((((:BINARY-OPERATOR
(:OPERAND
(((:BINARY-OPERATOR (:OPERAND ((:SINGLE) (:DOUBLE))) :OPERATOR :OR
:BOUNDS (1 . 4)))
(:AROMATIC)))
:OPERATOR :OR :BOUNDS (1 . 6))))))
:BOUNDS (1 . 6))
:BOUNDS (1 . 7)))))
:BOUNDS (0 . 7))
NIL
T
CANDO-USER>
And what is the role of this?
Is it to organize the result inside of a (:logical-bond-expression (:bond-expression <expression>))
form?
(defrule modified-bond-pattern
(and weak-and-bond-expression)
(:lambda (expression &bounds start end)
(architecture.builder-protocol:node* (:logical-bond-expression :bounds (cons start end))
(1 :bond-expression expression))))
Hi - I'd like to submit a pull request to merge my changes. Which branch should I use?
I have a directory tree for smarts that looks like this:
Directory tree
==============
[-] src
|--[+] language-server
|--[+] smarts
`--[+] smiles
I don't see language-server in your repo anywhere. Did we get out of whack?
Hi - I'd like to submit a pull request to merge my changes. Which branch should I use?
Please use the master branch. I consolidated the prototype I did in the future branch into the master branch some time ago.
I have a directory tree for smarts that looks like this:
Directory tree ============== [-] src |--[+] language-server |--[+] smarts `--[+] smiles
I don't see language-server in your repo anywhere. Did we get out of whack?
I removed the language-server
directory during consolidation. This probably means that your code is based on the future which shouldn't be too much of a problem.
You can submit the pull request against either branch. I will sort it out.
Did you look at the atom-map-class branch by any chance? Or will your pull request include an independent fix for that issue?
Below are some valid SMARTS strings from the open force field effort. You will see the simple bond specifications like
"~"
"-"
"="
and then more complex ones like:"~;@"
,"~;!@"
and"-;!@"
. So bonds need their own logical tests!The Daylight manual doesn't say much about this. http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html#RTFToC36 . "Section 4.2" There are some examples at the bottom of the link above like this...
My guess is it's just logicals for atom tests applied to bonds.
Below are examples from the Smirnoff force-field. (https://github.com/openforcefield/openforcefield/blob/master/openforcefield/data/forcefield/smirnoff99Frosst.offxml)