scymtym / architecture.builder-protocol

Protocol for flexible construction and traversal of results (e.g. ASTs in case of parsers)
GNU Lesser General Public License v3.0
8 stars 1 forks source link
common-lisp json parsing protocol tree xpath

+TITLE: architecture.builder-protocol README

+AUTHOR: Jan Moringen

+EMAIL: jmoringe@techfak.uni-bielefeld.de

+DESCRIPTION: A protocol for flexible result construction.

+KEYWORDS: common lisp, architecture, protocol, framework, builder, pattern, parsing

+LANGUAGE: en

** STARTED Build Protocol Since this is a probably a common case, we will use the construction of a simplistic AST from the output of an equally simplistic parser as an example.

The example code in the following sections can be loaded into the =cl-user= package and assumes that the =alexandria= system is loaded.

*** Implementing a Consumer of Results The nodes of the AST we want to construct are either literals or operator applications with two operands and are both expressions:

+begin_src lisp :results none :exports code :session "tutorial"

  (defclass expression () ())

  (defclass literal (expression)
    ((%value :initarg :value :reader literal-value)))

  (defclass operator (expression)
    ((%operands :accessor operator-operands :initform '())))
#+end_src
Note that the =value= slot of the =literal= is initialized using
the =:value= initarg while the =operands= slot of the =operator=
class is initialized to the empty lists but allows for later
mutation via =(setf operator-operands)=. The rationale is that
=literal= instances can be constructed in one =make-instance= call
while =operator= instance may be constructed before their operand
nodes, thus requiring mutation to attach these operand nodes once
they have been constructed.

A simple implementation of the builder protocol for these nodes
looks like this:
#+begin_src lisp :results none :exports code :session "tutorial"
  (defclass ast-builder () ())

  (defmethod architecture.builder-protocol:make-node
      ((builder ast-builder)
       (kind    (eql :literal))
       &key value)
    (make-instance 'literal :value value))

  (defmethod architecture.builder-protocol:make-node
      ((builder ast-builder)
       (kind    (eql :operator))
       &key)
    (make-instance 'operator))

  (defmethod architecture.builder-protocol:relate
      ((builder  ast-builder)
       (relation (eql :operand))
       (left     operator)
       (right    expression)
       &key)
    (alexandria:appendf (operator-operands left) (list right))
    left)
#+end_src
We can already use this builder without a parser:
#+begin_src lisp :exports both :session "tutorial"
  (let* ((builder  (make-instance 'ast-builder))
         (operands (list (architecture.builder-protocol:make+finish-node
                          builder :literal :value 5)
                         (architecture.builder-protocol:make+finish-node
                          builder :literal :value 6)))
         (operator (architecture.builder-protocol:make-node builder :operator)))
    (architecture.builder-protocol:finish-node
     builder :operator
     (reduce (lambda (l r)
               (architecture.builder-protocol:relate
                builder :operand l r))
             operands :initial-value operator)))
#+end_src

#+RESULTS:
: #<OPERATOR {100E5961}>

The following is a more compact (but equivalent behind the scenes)
spelling of the above code:
#+begin_src lisp :exports both :session "tutorial"
  (architecture.builder-protocol:with-builder ((make-instance 'ast-builder))
    (architecture.builder-protocol:node* (:operator)
      (* :operand (list (architecture.builder-protocol:node* (:literal :value 5))
                        (architecture.builder-protocol:node* (:literal :value 6))))))
#+end_src

#+RESULTS:
: #<OPERATOR {1019F0E013}>

*** Implementing a Producer of Results We will use a parser for a very simple expressions in polish notation:

+begin_example

EXPRESSION ::= OPERATOR | LITERAL
LITERAL    ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
OPERATOR   ::= '+' EXPRESSION EXPRESSION
#+end_example
The parser is straightforward: it has a local function for each
element of the grammar and uses the builder protocol like in the
previous example. Since we now parse an actual source text, source
locations of constructed result nodes can be recorded using the
=:bounds= initarg. Note that the =ast-builder= we defined in the
previous section receives the =:bounds= initarg in =make-node=
calls, but does not store it anywhere. If storing source locations
in AST nodes was desired, a =%source= slot could be added to the
=expression= class and the value of the =:bounds= keyword argument
could be passed to =make-instance= as the =:source= initarg.
#+begin_src lisp :results silent :exports code :session "tutorial"
  (defun parse (stream builder)
    (labels ((expression ()
               (let ((c (peek-char nil stream)))
                 (cond ((char= c #\+)
                        (operator))
                       ((digit-char-p c)
                        (literal)))))
             (literal ()
               (let ((start (stream-file-position stream))
                     (c     (read-char stream)))
                 (architecture.builder-protocol:make-node
                  builder :literal
                  :value  (parse-integer (string c))
                  :bounds (cons start (1+ start)))))
             (operator ()
               (let ((start    (stream-file-position stream))
                     (c        (read-char stream))
                     (operands (list (expression) (expression)))
                     (end      (stream-file-position stream)))
                 (declare (ignore c))
                 (architecture.builder-protocol:finish-node
                  builder :operator
                  (reduce (lambda (l r)
                            (architecture.builder-protocol:relate
                             builder :operator-operand l r))
                          operands
                          :initial-value (architecture.builder-protocol:make-node
                                          builder :operator
                                          :bounds (cons start end)))))))
      (expression)))
#+end_src
As before, the various builder method calls can be written
compactly using the =node= macro:
#+begin_src lisp :results silent :exports code :session "tutorial"
  (defun parse2 (stream builder)
    (labels ((expression ()
               (let ((c (peek-char nil stream)))
                 (cond ((char= c #\+)
                        (operator))
                       ((digit-char-p c)
                        (literal)))))
             (literal ()
               (let ((start (stream-file-position stream))
                     (c     (read-char stream)))
                 (architecture.builder-protocol:node
                     (builder :literal :value  (parse-integer (string c))
                                       :bounds (cons start (1+ start))))))
             (operator ()
               (let ((start    (stream-file-position stream))
                     (c        (read-char stream))
                     (operands (list (expression) (expression)))
                     (end      (stream-file-position stream)))
                 (declare (ignore c))
                 (architecture.builder-protocol:node
                     (builder :operator :bounds (cons start end))
                   (* :operand operands)))))
      (expression)))
#+end_src
The =with-builder= macro allows writing the =node= macro calls
without supplying the =builder= argument:
#+begin_src lisp
  (architecture.builder-protocol:with-builder (BUILDER)
    (architecture.builder-protocol:node* (:KIND :INITARG …)
      (* :RELATION …)))
#+end_src

*** The =list= Builder When developing or testing result producers like parsers, it can be convenient to produce a list-based result since it pretty-prints nicely without any extra effort and can be =equal=-compared in unit tests without depending on a more heavyweight representation such as instances of AST node classes.

For these cases, the =architecture.builder-protocol= system
provides a builtin =list= builder:
#+begin_src lisp :results value code :exports both :session "tutorial"
  (parse (make-string-input-stream "++123") 'list)
#+end_src

#+RESULTS:
#+BEGIN_SRC lisp

(:OPERATOR
 (:OPERATOR-OPERAND
  (((:OPERATOR
     (:OPERATOR-OPERAND
      (((:LITERAL NIL :VALUE 1 :BOUNDS (2 . 3)))
       ((:LITERAL NIL :VALUE 2 :BOUNDS (3 . 4)))))
     :BOUNDS (1 . 4)))
   ((:LITERAL NIL :VALUE 3 :BOUNDS (4 . 5)))))
 :BOUNDS (0 . 5))
#+END_SRC

*** Printing Result Trees

This may be slightly off-topic, but a nice hack for printing
/arbitrary/ results produced by the =list= builder can be done
using the [[http://github.com/scymtym/utilities.print-tree][=utilities.print-tree= system]]:

#+begin_src lisp :exports code :session "tutorial"
  (defun my-print-tree (tree &optional (stream *standard-output*))
    (utilities.print-tree:print-tree
     stream tree
     (utilities.print-tree:make-node-printer
      (lambda (stream depth node)
        (declare (ignore depth))
        (destructuring-bind (kind relations &rest slots) node
          (declare (ignore relations))
          (format stream "~A~@[ @~A~]"
                  kind (getf slots :bounds))
          (alexandria:remove-from-plist slots :bounds)))
      (lambda (stream depth node)
        (declare (ignore depth))
        (destructuring-bind (kind relations &rest slots) node
          (declare (ignore kind relations))
          (format stream "~{~A: ~A~^~@:_~}"
                  (alexandria:remove-from-plist slots :bounds))))
      (lambda (node)
        (loop :for (relation nodes) :on (second node) :by #'cddr
              :appending (mapcar #'car nodes))))))
#+end_src

Putting these pieces together, we can achieve the following:
#+begin_src lisp :results output :exports both :session "tutorial"
  (my-print-tree (parse (make-string-input-stream "++123") 'list))
#+end_src

#+RESULTS:
: OPERATOR @(0 . 5)
: ├─OPERATOR @(1 . 4)
: │ ├─LITERAL @(2 . 3)
: │ │   VALUE: 1
: │ └─LITERAL @(3 . 4)
: │     VALUE: 2
: └─LITERAL @(4 . 5)
:     VALUE: 3

The system =architecture.builder-protocol.print-tree= implements a
more complete version (not restricted to the =list= builder, among
other things) of this idea:

#+BEGIN_SRC lisp :results output :exports both :session "tutorial"
  (defun print-tree (tree)
    (fresh-line)
    (architecture.builder-protocol.print-tree:print-tree
     'list tree *standard-output*))

  (print-tree (parse (make-string-input-stream "++123") 'list))
#+END_SRC

#+RESULTS:
: OPERATOR @(0 . 5)
: ├─OPERATOR-OPERAND: OPERATOR @(1 . 4)
: │ ├─OPERATOR-OPERAND: LITERAL @(2 . 3)
: │ │   VALUE: 1
: │ └─OPERATOR-OPERAND: LITERAL @(3 . 4)
: │     VALUE: 2
: └─OPERATOR-OPERAND: LITERAL @(4 . 5)
:     VALUE: 3

** TODO "Un-build" Protocol

*** STARTED The =walk-nodes= Function The generic function =walk-nodes= can be used to traverse trees of nodes built using the build protocol. It uses the "un-build" protocol and can thus handle arbitrary tree representations.

** STARTED Build Protocol

+begin_src lisp :exports results :session "doc"

 (doc 'architecture.builder-protocol:prepare 'function)

+end_src

+RESULTS:

+begin_example

prepare BUILDER

Prepare BUILDER for result construction, return a builder.

The default method just returns BUILDER.

+end_example

+begin_src lisp :exports results :session "doc"

 (doc 'architecture.builder-protocol:finish 'function)

+end_src

+RESULTS:

+begin_example

finish BUILDER VALUES

Finalize and return VALUES produced by BUILDER as multiple values.

VALUES is a list of objects that should be returned as multiple values and constitute the overall result of an object tree construction with BUILDER. The first element of VALUES which becomes the first return value is the constructed tree itself (which often coincides with the root node). Additional values are optional and their presence and meaning depend on BUILDER.

The default method just returns VALUES.

+end_example

+begin_src lisp :exports results :session "doc"

 (doc 'architecture.builder-protocol:wrap 'function)

+end_src

+RESULTS:

+begin_example

wrap BUILDER THUNK

Call THUNK with an appropriate dynamic environment for BUILDER.

A method on this generic function could, for example, bind special variables around the construction of a result object tree.

The existing default methods do not specialize the BUILDER parameter and specialize the THUNK parameter to cl:function' and cl:symbol'. These default methods just call THUNK.

+end_example

+begin_src lisp :exports results :session "doc"

 (doc 'architecture.builder-protocol:make-node 'function)

+end_src

+RESULTS:

+begin_example

make-node BUILDER KIND &REST INITARGS &KEY &ALLOW-OTHER-KEYS

Use BUILDER to make a result tree node of kind KIND and return it.

As a convention, when supplied, the value of the :bounds keyword argument is of the form (START . END) and can be used to indicate the input range for which the tree is constructed.

+end_example

+begin_src lisp :exports results :session "doc"

 (doc 'architecture.builder-protocol:finish-node 'function)

+end_src

+RESULTS:

+begin_example

finish-node BUILDER KIND NODE

Use BUILDER to perform finalization for NODE.

Return the modified NODE or an appropriate newly created object.

+end_example

+begin_src lisp :exports results :session "doc"

 (doc 'architecture.builder-protocol:relate 'function)

+end_src

+RESULTS:

+begin_example

relate BUILDER RELATION LEFT RIGHT &REST ARGS &KEY &ALLOW-OTHER-KEYS

Establish RELATION between nodes LEFT and RIGHT and return the resulting modified LEFT node (or an appropriate newly created object).

ARGS can be used to supply additional information about the relation that is available from neither LEFT nor RIGHT.

In a typical case, RELATION could be :child, LEFT being the parent node and RIGHT being the child node.

+end_example

*** STARTED Convenience Functions

#+BEGIN_SRC lisp :exports results :session "doc"
  (doc 'architecture.builder-protocol:add-relations 'function)
#+END_SRC

#+RESULTS:
#+begin_example
add-relations BUILDER NODE RELATIONS

Use BUILDER to add relations according to RELATIONS to NODE.

RELATIONS is a list of relation specifications of the form

(CARDINALITY RELATION-NAME RIGHT &rest ARGS)

which are translated into `relate' calls in which NODE is the
"left" argument to `relate'. CARDINALITY has to be of type
`relation-cardinality' and is interpreted as follows:

?            RIGHT is a single node or `nil'. If RIGHT is `nil',
             `relate' is not called.

1            RIGHT is a single node.

*            RIGHT is a (possibly empty) sequence of nodes. ARGS
             must be of the form

               :KEY₁ (VALUE₁₁ VALUE₁₂ …) :KEY₂ (VALUE₂₁ VALUE₂₂ …) …

             . The `relate' call for the k-th element of RIGHT will
             receive the keyword arguments
             :KEY₁ VALUEₖ₁ :KEY₂ VALUEₖ₂ …. If the value list for a
             given key would be a repetition of a particular value
             VALUE, the circular list #1=(VALUE . #1#) may be used
             as a replacement for that value list.

(:map . KEY) RIGHT is a (possible empty) sequence of nodes that
             should be "zipped" with a sequence of keys (see
             below) to form a set of key-values pair and thus a
             map. The sequence of keys is the value of the property
             whose indicator is KEY in the ARGS plist. The two
             sequences must be of the same length. Elements at
             corresponding positions will be paired in a
             "zipping" operation as described above.

RELATION-NAME does not have to be unique across the elements of
RELATIONS. This allows multiple "right" nodes to be related to
NODE via a given RELATION-NAME with CARDINALITY * in multiple
RELATIONS entries, potentially with different ARGS.

The modified NODE or a new node is returned.
#+end_example

#+BEGIN_SRC lisp :exports results :session "doc"
  (doc 'architecture.builder-protocol:make+finish-node 'function)
#+END_SRC

#+RESULTS:
#+begin_example
make+finish-node BUILDER KIND &REST INITARGS &KEY &ALLOW-OTHER-KEYS

Convenience function for constructing and immediately finishing a
node.
#+end_example

#+BEGIN_SRC lisp :exports results :session "doc"
  (doc 'architecture.builder-protocol:make+finish-node+relations 'function)
#+END_SRC

#+RESULTS:
#+begin_example
make+finish-node+relations BUILDER KIND INITARGS RELATIONS

Use BUILDER to create a KIND, INITARGS node, relate it via RELATIONS.

RELATIONS is processed as described for `add-relations'.

`finish-node' is called on the created node. The created node is
returned.
#+end_example

** STARTED "Un-build" Protocol

+begin_src lisp :exports results :session "doc"

 (doc 'architecture.builder-protocol:node-kind 'function)

+end_src

+RESULTS:

+begin_example

node-kind BUILDER NODE

Return the kind of NODE w.r.t. BUILDER.

The return value is EQ to the KIND argument used to create NODE with BUILDER.

+end_example

+begin_src lisp :exports results :session "doc"

 (doc 'architecture.builder-protocol:node-initargs 'function)

+end_src

+RESULTS:

+begin_example

node-initargs BUILDER NODE

Return a plist of initargs for NODE w.r.t. BUILDER.

The returned list is EQUAL to the list of keyword arguments pass to the MAKE-NODE call that, using BUILDER, constructed NODE.

+end_example

+begin_src lisp :exports results :session "doc"

 (doc 'architecture.builder-protocol:node-relations 'function)

+end_src

+RESULTS:

+begin_example

node-relations BUILDER NODE

Return a list of relations of NODE w.r.t. BUILDER.

Each relation is of one of the forms

RELATION-NAME (RELATION-NAME . CARDINALITY)

where RELATION-NAME names the relation and CARDINALITY is of type relation-cardinality'. When the first form is used, i.e. CARDINALITY is not present, it is assumed to be *'. CARDINALITY values are interpreted as follows:

? The relation designated by RELATION-NAME with NODE as the "left" node has zero or one "right" nodes.

1 The relation designated by RELATION-NAME with NODE as the "left" node has exactly one "right" node.

+OPTIONS: H:4 num:nil toc:t \n:nil @:t ::t |:t ^:t -:t f:t *:t <:t

+OPTIONS: TeX:t LaTeX:t skip:nil d:nil todo:t pri:nil tags:not-in-toc

+SEQ_TODO: TODO STARTED | DONE