A musing on abstract interpretation

masak commented 4 years ago

When I see this function definition:

(def char (x)
  (= (type x) 'char))

...my first thought is that it's wasteful and extravagant to use = and not id, and that this would be completely equivalent:

(def char (x)
  (id (type x) 'char))

Could a sufficiently smart compiler know that the former could be reduced to the latter? I say yes, and I will now prove that with a sequence of well-explained transformations (expressed as colorful diffs).

The first thing that we will evaluate is the subexpression (type x), so we can extract it as a variable, and type-annotate it.

 (def char (x)
-  (= (type x) 'char))
+  (let vt (type x)  ; vt : 'char|'pair|'stream|'symbol
+    (= vt 'char)))

By the way, every time we (the abstract evaluator) insert a let, think of it as using fresh/unique variables. We're using regular named/unhygienic variables here for ease of reading.

We inline all of =:

 (def char (x)
   (let vt (type x)  ; vt : 'char|'pair|'stream|'symbol
-    (= vt 'char)))
+    (let args (list vt 'char)
+      (if (no (cdr args))  t
+          (some atom args) (all [id _ (car args)] (cdr args))
+                           (and (apply = (map car args))
+                                (apply = (map cdr args)))))))

Turning our attention to the first condition in the if, we know statically that there's no way it can be true. So we eliminate it.

 (def char (x)
   (let vt (type x)  ; vt : 'char|'pair|'stream|'symbol
     (let args (list vt 'char)
-      (if (no (cdr args))  t
-          (some atom args) (all [id _ (car args)] (cdr args))
+      (if (some atom args) (all [id _ (car args)] (cdr args))
                            (and (apply = (map car args))
                                 (apply = (map cdr args)))))))

Similarly, the next condition is true, as a number of steps of inlining and abstractly evaluating the some function call would show. We can therefore "bind" to that consequent and discard the rest. As a side effect, we've now discarded the = recursion.

 (def char (x)
   (let vt (type x)  ; vt : 'char|'pair|'stream|'symbol
     (let args (list vt 'char)
-      (if (some atom args) (all [id _ (car args)] (cdr args))
-                           (and (apply = (map car args))
-                                (apply = (map cdr args)))))))
+      (all [id _ (car args)] (cdr args)))))

Since args is a named list literal, we can abstractly evaluate (car args) and (cdr args) into their known values:

 (def char (x)
   (let vt (type x)  ; vt : 'char|'pair|'stream|'symbol
     (let args (list vt 'char)
-      (all [id _ (car args)] (cdr args)))))
+      (all [id _ vt] (list 'char)))))

...and eliminate the unused let:

 (def char (x)
   (let vt (type x)  ; vt : 'char|'pair|'stream|'symbol
-    (let args (list vt 'char)
-      (all [id _ vt] (list 'char)))))
+    (all [id _ vt] (list 'char))))

Now we inline all:

 (def char (x)
   (let vt (type x)  ; vt : 'char|'pair|'stream|'symbol
-    (all [id _ vt] (list 'char))))
+    (let f [id _ vt]
+      (let xs (list 'char)
+        (if (no xs)      t
+            (f (car xs)) (all f (cdr xs))
+                         nil)))))

Again, because of static knowledge of xs, we can eliminate the first condition of the if entirely:

   (let vt (type x)  ; vt : 'char|'pair|'stream|'symbol
     (let f [id _ vt]
       (let xs (list 'char)
-        (if (no xs)      t
-            (f (car xs)) (all f (cdr xs))
+        (if (f (car xs)) (all f (cdr xs))
                          nil)))))

We abstractly evaluate the destructurings on xs:

   (let vt (type x)  ; vt : 'char|'pair|'stream|'symbol
     (let f [id _ vt]
       (let xs (list 'char)
-        (if (f (car xs)) (all f (cdr xs))
-                         nil)))))
+        (if (f 'char) (all f nil)
+                      nil)))))

And we can inline/expand the call to f:

   (let vt (type x)  ; vt : 'char|'pair|'stream|'symbol
     (let f [id _ vt]
       (let xs (list 'char)
-        (if (f 'char) (all f nil)
-                      nil)))))
+        (if (id 'char vt) (all f nil)
+                          nil)))))

The recursive call to all has nil as its argument to xs. Either we do enough small steps to realize that the result is t, or we have a separate rule/pattern for that:

   (let vt (type x)  ; vt : 'char|'pair|'stream|'symbol
     (let f [id _ vt]
       (let xs (list 'char)
-        (if (id 'char vt) (all f nil)
+        (if (id 'char vt) t
                           nil)))))

Now the refcounts of the parameters bound to all have gone to zero:

 (def char (x)
   (let vt (type x)  ; vt : 'char|'pair|'stream|'symbol
-    (let f [id _ vt]
-      (let xs (list 'char)
-        (if (id 'char vt) t
-                          nil)))))
+    (if (id 'char vt) t
+                      nil)))

id already gives back t or nil, so the if is redundant:

 (def char (x)
   (let vt (type x)  ; vt : 'char|'pair|'stream|'symbol
-    (if (id 'char vt) t
-                      nil)))
+    (id 'char vt)))

Finally, we inline (type x) again:

 (def char (x)
-  (let vt (type x)  ; vt : 'char|'pair|'stream|'symbol
-    (id 'char vt)))
+  (id 'char (type x)))

I thought before writing this out that an abstract evaluator would be quite straightforward to write. Now, I'm even more convinced. Furthermore (I realize), this is quite well-suited for a test-driven approach.

Can start on something smaller, of course. But the above steps are fairly realistic to do by machine.

masak commented 4 years ago

Here's a general request to myself to do the same kind of analysis for best. Would be interesting.

Also, the code from #136 here — specifically, can the ccc be optimized away after enough inlining? (I think it can.)

masak commented 4 years ago

I found another fun one yesterday: even does not work on chars, because you get a cdr-on-atom error thrown from c*. But the fact that you will is statically knowable, because all that happens all the way from calling (even \x) to getting the error in c* could be inlined and abstractly evaluated.

Show this. :smile: Again, I'm making a request to my future self. Even better, have a goodenuf static "type checker" (or argument checker) that could flag up this kind of error.

masak commented 4 years ago

Two more focused questions:

Which global definition is the first one that would benefit from interleaved inlining/specialization?
Could randlen be specialized to the extent that it contains only the direct computation of the number, something like a repeated multiplication and addition with a (coin) call in the middle? I.e. is it reasonable to expect that an abstract evaluator can specialize the code all the way down to that?

masak commented 4 years ago

Which global definition is the first one that would benefit from interleaved inlining/specialization?

I think the answer might be snoc and list, both of which would benefit from inlining and specializing an append2 function, an already specialized two-argument version of append.

The only other earlier candidate is cons, but that looks to me like a pure inlining, with no interleaved specialization.

masak commented 4 years ago

Here comes the successive simplification of the board-movements code.

We start with the following code (nicely annotated with comments):

;; This template defines a robot with coordinates (x, y) and direction
;; vector (dx, dy). The coordinate system has the positive y direction
;; pointing downwards, and (0, 0) is in the upper left corner.
;; The robot is not allowed to leave the rectangle spanned by (0, 0)
;; (inclusive) and (w, h) (exclusive). Note that the (w, h) coordinates
;; are not specified here; they are parameters to `run` and `runcmd`.
(tem robot x 0 y 0 dx 0 dy -1)

;; A common pattern in Bel is to define a global, initially an empty
;; list to act as a namespace for some given purpose...
(set actions nil)

;; ...and then to define a macro that will add things to this global.
;; Here, we want to define five actions that control the robot.
(mac defaction (cmd name parms . body)
  `(push (cmd . (fn ,parms ,@body)) actions))

;; The first action is a bit of an exception. It aborts the run early
;; so that no more commands are executed. This is achieved through a
;; `throw` action, a non-local command which will find us back in
;; `run`, the top-level function. The list passed to `throw`
;; conveniently serves as a return value.
(defaction 0 quit (r)
  (throw (list r!x r!y)))

;; This helper function makes it easy to talk about moving the robot
;; `r` forwards or backwards, as specified by the operator `op`.
;; Pass `+` and you move the robot forwards; pass `-` and you move it
;; backwards. This is then used exactly in this way in the two
;; subsequent actions.
(def move (op r)
  (zap op r!x r!dx)
  (zap op r!y r!dy))

(defaction 1 forward (r)
  (move + r))

(defaction 2 backward (r)
  (move - r))

;; Changing the robot's direction comes down to cycling through the
;; four directions (0, -1), (1, 0), (0, 1), and (-1, 0); forwards in
;; this list if we're going clockwise, otherwise backwards. There are
;; many ways to do this, but the below is perhaps the simplest -- the
;; `swap` makes sure to juggle the 1 between the x and y coordinates,
;; and the `zap inv` switches the sign at the appropriate moments.
(defaction 3 turncw (r)
  (swap r!dx r!dy)
  (zap inv r!dx))

(defaction 4 turnccw (r)
  (swap r!dx r!dy)
  (zap inv r!dy))

;; If the robot ever leaves its designated rectangle, we throw an
;; exception and the run aborts early so that no more commands are
;; executed. This is then checked after every command, although
;; technically it's only necessary to check after forward and
;; backwards moves.
(def checkbounds (v max)
  (or (<= 0 v (- max 1))
      (throw 'impossible-move)))

;; To run an individual command, we look up the `cmd` number in
;; our `actions` association list. If we find it there, we run its
;; associated action and check the bounds. If we don't, we throw
;; an exception and abort early so that no more commands are
;; executed.
(def runcmd (r w h cmd)
  (aif (get cmd actions)
       (let action (cdr it)
         (action r)
         (checkbounds r!x w)
         (checkbounds r!y h))
       (throw 'illegal-command)))

;; Given some dimensions (w, h), an initial robot position (x, y),
;; and zero or more commands (numbers 0..4), the `run` function
;; executes the commands one by one and returns the robot's final
;; coordinates. `catch` is the context that all three `throw`s
;; above bubble up to. `map` is the most idiomatic way to write
;; `foreach` in Bel.
(def run ((w h) (x y) cmds)
  (let r (make robot x x y y)
    (catch
      (map [runcmd r w h _] cmds)
      (list r!x r!y))))

I'm going to wish away the comments, just to make it easier now to do code transformations.

The first thing I see that we can do is to inline runcmd completely. So let's do that.

@@ -30,16 +30,15 @@
   (or (<= 0 v (- max 1))
       (throw 'impossible-move)))

-(def runcmd (r w h cmd)
-  (aif (get cmd actions)
-       (let action (cdr it)
-         (action r)
-         (checkbounds r!x w)
-         (checkbounds r!y h))
-       (throw 'illegal-command)))
-
 (def run ((w h) (x y) cmds)
   (let r (make robot x x y y)
     (catch
-      (map [runcmd r w h _] cmds)
+      (map (fn (cmd)
+             (aif (get cmd actions)
+                  (let action (cdr it)
+                    (action r)
+                    (checkbounds r!x w)
+                    (checkbounds r!y h))
+                  (throw 'illegal-command)))
+           cmds)
       (list r!x r!y))))

We will never get rid of that map, by the way. cmds is a parameter to the (topmost) run function, and unknown to us.

Next up we do a big inlining of the entire actions data structure. (Actually, an intermediate step in this one would probably be to create the actions list in code first.) This is a pretty big step, but let's say we can do it like this.

@@ -1,31 +1,9 @@
 (tem robot x 0 y 0 dx 0 dy -1)

-(set actions nil)
-
-(mac defaction (cmd name parms . body)
-  `(push (cmd . (fn ,parms ,@body)) actions))
-
-(defaction 0 quit (r)
-  (throw (list r!x r!y)))
-
 (def move (op r)
   (zap op r!x r!dx)
   (zap op r!y r!dy))

-(defaction 1 forward (r)
-  (move + r))
-
-(defaction 2 backward (r)
-  (move - r))
-
-(defaction 3 turncw (r)
-  (swap r!dx r!dy)
-  (zap inv r!dx))
-
-(defaction 4 turnccw (r)
-  (swap r!dx r!dy)
-  (zap inv r!dy))
-
 (def checkbounds (v max)
   (or (<= 0 v (- max 1))
       (throw 'impossible-move)))
@@ -34,11 +12,29 @@
   (let r (make robot x x y y)
     (catch
       (map (fn (cmd)
-             (aif (get cmd actions)
-                  (let action (cdr it)
-                    (action r)
-                    (checkbounds r!x w)
-                    (checkbounds r!y h))
-                  (throw 'illegal-command)))
+             (case cmd
+               0 (do
+                   (throw (list r!x r!y))
+                   (checkbounds r!x w)
+                   (checkbounds r!y h))
+               1 (do
+                   (move + r)
+                   (checkbounds r!x w)
+                   (checkbounds r!y h))
+               2 (do
+                   (move - r)
+                   (checkbounds r!x w)
+                   (checkbounds r!y h))
+               3 (do
+                   (swap r!dx r!dy)
+                   (zap inv r!dx)
+                   (checkbounds r!x w)
+                   (checkbounds r!y h))
+               4 (do
+                   (swap r!dx r!dy)
+                   (zap inv r!dy)
+                   (checkbounds r!x w)
+                   (checkbounds r!y h))
+                 (throw 'illegal-command)))
            cmds)
       (list r!x r!y))))

Inline move.

@@ -1,9 +1,5 @@
 (tem robot x 0 y 0 dx 0 dy -1)

-(def move (op r)
-  (zap op r!x r!dx)
-  (zap op r!y r!dy))
-
 (def checkbounds (v max)
   (or (<= 0 v (- max 1))
       (throw 'impossible-move)))
@@ -18,11 +14,13 @@
                    (checkbounds r!x w)
                    (checkbounds r!y h))
                1 (do
-                   (move + r)
+                   (zap + r!x r!dx)
+                   (zap + r!y r!dy)
                    (checkbounds r!x w)
                    (checkbounds r!y h))
                2 (do
-                   (move - r)
+                   (zap - r!x r!dx)
+                   (zap - r!y r!dy)
                    (checkbounds r!x w)
                    (checkbounds r!y h))
                3 (do

Inline checkbounds.

@@ -1,9 +1,5 @@
 (tem robot x 0 y 0 dx 0 dy -1)

-(def checkbounds (v max)
-  (or (<= 0 v (- max 1))
-      (throw 'impossible-move)))
-
 (def run ((w h) (x y) cmds)
   (let r (make robot x x y y)
     (catch
@@ -11,28 +7,38 @@
              (case cmd
                0 (do
                    (throw (list r!x r!y))
-                   (checkbounds r!x w)
-                   (checkbounds r!y h))
+                   (or (<= 0 r!x (- w 1))
+                       (throw 'impossible-move))
+                   (or (<= 0 r!y (- h 1))
+                       (throw 'impossible-move)))
                1 (do
                    (zap + r!x r!dx)
                    (zap + r!y r!dy)
-                   (checkbounds r!x w)
-                   (checkbounds r!y h))
+                   (or (<= 0 r!x (- w 1))
+                       (throw 'impossible-move))
+                   (or (<= 0 r!y (- h 1))
+                       (throw 'impossible-move)))
                2 (do
                    (zap - r!x r!dx)
                    (zap - r!y r!dy)
-                   (checkbounds r!x w)
-                   (checkbounds r!y h))
+                   (or (<= 0 r!x (- w 1))
+                       (throw 'impossible-move))
+                   (or (<= 0 r!y (- h 1))
+                       (throw 'impossible-move)))
                3 (do
                    (swap r!dx r!dy)
                    (zap inv r!dx)
-                   (checkbounds r!x w)
-                   (checkbounds r!y h))
+                   (or (<= 0 r!x (- w 1))
+                       (throw 'impossible-move))
+                   (or (<= 0 r!y (- h 1))
+                       (throw 'impossible-move)))
                4 (do
                    (swap r!dx r!dy)
                    (zap inv r!dy)
-                   (checkbounds r!x w)
-                   (checkbounds r!y h))
+                   (or (<= 0 r!x (- w 1))
+                       (throw 'impossible-move))
+                   (or (<= 0 r!y (- h 1))
+                       (throw 'impossible-move)))
                  (throw 'illegal-command)))
            cmds)
       (list r!x r!y))))

All the throws are now lexically nested under the catch (and we have some kind of vague guarantee that there aren't others hiding somewhere else). We are now in a position to eliminate them all. These would actually be replaced by bytecode goto instructions, but let's try to express them through pseudo-Bel goto and label forms:

@@ -2,43 +2,56 @@

 (def run ((w h) (x y) cmds)
   (let r (make robot x x y y)
-    (catch
+    (let return-value nil
       (map (fn (cmd)
              (case cmd
                0 (do
-                   (throw (list r!x r!y))
+                   (set return-value (list r!x r!y))
+                   (goto 'abort)
                    (or (<= 0 r!x (- w 1))
-                       (throw 'impossible-move))
+                       (do (set return-value 'impossible-move)
+                           (goto 'abort)))
                    (or (<= 0 r!y (- h 1))
-                       (throw 'impossible-move)))
+                       (do (set return-value 'impossible-move)
+                           (goto 'abort))))
                1 (do
                    (zap + r!x r!dx)
                    (zap + r!y r!dy)
                    (or (<= 0 r!x (- w 1))
-                       (throw 'impossible-move))
+                       (do (set return-value 'impossible-move)
+                           (goto 'abort)))
                    (or (<= 0 r!y (- h 1))
-                       (throw 'impossible-move)))
+                       (do (set return-value 'impossible-move)
+                           (goto 'abort))))
                2 (do
                    (zap - r!x r!dx)
                    (zap - r!y r!dy)
                    (or (<= 0 r!x (- w 1))
-                       (throw 'impossible-move))
+                       (do (set return-value 'impossible-move)
+                           (goto 'abort)))
                    (or (<= 0 r!y (- h 1))
-                       (throw 'impossible-move)))
+                       (do (set return-value 'impossible-move)
+                           (goto 'abort))))
                3 (do
                    (swap r!dx r!dy)
                    (zap inv r!dx)
                    (or (<= 0 r!x (- w 1))
-                       (throw 'impossible-move))
+                       (do (set return-value 'impossible-move)
+                           (goto 'abort)))
                    (or (<= 0 r!y (- h 1))
-                       (throw 'impossible-move)))
+                       (do (set return-value 'impossible-move)
+                           (goto 'abort))))
                4 (do
                    (swap r!dx r!dy)
                    (zap inv r!dy)
                    (or (<= 0 r!x (- w 1))
-                       (throw 'impossible-move))
+                       (do (set return-value 'impossible-move)
+                           (goto 'abort)))
                    (or (<= 0 r!y (- h 1))
-                       (throw 'impossible-move)))
+                       (do (set return-value 'impossible-move)
+                           (goto 'abort))))
                  (throw 'illegal-command)))
            cmds)
-      (list r!x r!y))))
+      (set return-value (list r!x r!y))
+      (label 'abort)
+      return-value)))

Through some additional data and control flow analysis, we can eliminate some redundant checks.

@@ -7,49 +7,25 @@
              (case cmd
                0 (do
                    (set return-value (list r!x r!y))
-                   (goto 'abort)
-                   (or (<= 0 r!x (- w 1))
-                       (do (set return-value 'impossible-move)
-                           (goto 'abort)))
-                   (or (<= 0 r!y (- h 1))
-                       (do (set return-value 'impossible-move)
-                           (goto 'abort))))
+                   (goto 'abort))
                1 (do
                    (zap + r!x r!dx)
                    (zap + r!y r!dy)
-                   (or (<= 0 r!x (- w 1))
-                       (do (set return-value 'impossible-move)
-                           (goto 'abort)))
-                   (or (<= 0 r!y (- h 1))
+                   (or (and (<= 0 r!x (- w 1)) (<= 0 r!y (- h 1)))
                        (do (set return-value 'impossible-move)
                            (goto 'abort))))
                2 (do
                    (zap - r!x r!dx)
                    (zap - r!y r!dy)
-                   (or (<= 0 r!x (- w 1))
-                       (do (set return-value 'impossible-move)
-                           (goto 'abort)))
-                   (or (<= 0 r!y (- h 1))
+                   (or (and (<= 0 r!x (- w 1)) (<= 0 r!y (- h 1)))
                        (do (set return-value 'impossible-move)
                            (goto 'abort))))
                3 (do
                    (swap r!dx r!dy)
-                   (zap inv r!dx)
-                   (or (<= 0 r!x (- w 1))
-                       (do (set return-value 'impossible-move)
-                           (goto 'abort)))
-                   (or (<= 0 r!y (- h 1))
-                       (do (set return-value 'impossible-move)
-                           (goto 'abort))))
+                   (zap inv r!dx))
                4 (do
                    (swap r!dx r!dy)
-                   (zap inv r!dy)
-                   (or (<= 0 r!x (- w 1))
-                       (do (set return-value 'impossible-move)
-                           (goto 'abort)))
-                   (or (<= 0 r!y (- h 1))
-                       (do (set return-value 'impossible-move)
-                           (goto 'abort))))
+                   (zap inv r!dy))
                  (throw 'illegal-command)))
            cmds)
       (set return-value (list r!x r!y))

We are now in a position to eliminate the robot itself, through scalar elimination. Goodbye, robot!

@@ -1,33 +1,31 @@
-(tem robot x 0 y 0 dx 0 dy -1)
-
 (def run ((w h) (x y) cmds)
-  (let r (make robot x x y y)
+  (let (dx dy) (list 0 -1)
     (let return-value nil
       (map (fn (cmd)
              (case cmd
                0 (do
-                   (set return-value (list r!x r!y))
+                   (set return-value (list x y))
                    (goto 'abort))
                1 (do
-                   (zap + r!x r!dx)
-                   (zap + r!y r!dy)
-                   (or (and (<= 0 r!x (- w 1)) (<= 0 r!y (- h 1)))
+                   (zap + x dx)
+                   (zap + y dy)
+                   (or (and (<= 0 x (- w 1)) (<= 0 y (- h 1)))
                        (do (set return-value 'impossible-move)
                            (goto 'abort))))
                2 (do
-                   (zap - r!x r!dx)
-                   (zap - r!y r!dy)
-                   (or (and (<= 0 r!x (- w 1)) (<= 0 r!y (- h 1)))
+                   (zap - x dx)
+                   (zap - y dy)
+                   (or (and (<= 0 x (- w 1)) (<= 0 y (- h 1)))
                        (do (set return-value 'impossible-move)
                            (goto 'abort))))
                3 (do
-                   (swap r!dx r!dy)
-                   (zap inv r!dx))
+                   (swap dx dy)
+                   (zap inv dx))
                4 (do
-                   (swap r!dx r!dy)
-                   (zap inv r!dy))
+                   (swap dx dy)
+                   (zap inv dy))
                  (throw 'illegal-command)))
            cmds)
-      (set return-value (list r!x r!y))
+      (set return-value (list x y))
       (label 'abort)
       return-value)))

We're left with this quite cute minimized version:

(def run ((w h) (x y) cmds)
  (let (dx dy) (list 0 -1)
    (let return-value nil
      (map (fn (cmd)
             (case cmd
               0 (do
                   (set return-value (list x y))
                   (goto 'abort))
               1 (do
                   (zap + x dx)
                   (zap + y dy)
                   (or (and (<= 0 x (- w 1)) (<= 0 y (- h 1)))
                       (do (set return-value 'impossible-move)
                           (goto 'abort))))
               2 (do
                   (zap - x dx)
                   (zap - y dy)
                   (or (and (<= 0 x (- w 1)) (<= 0 y (- h 1)))
                       (do (set return-value 'impossible-move)
                           (goto 'abort))))
               3 (do
                   (swap dx dy)
                   (zap inv dx))
               4 (do
                   (swap dx dy)
                   (zap inv dy))
                 (throw 'illegal-command)))
           cmds)
      (set return-value (list x y))
      (label 'abort)
      return-value)))

It's not so bad, actually. A lot less code duplication than I would have guessed. Fits on a screen. The biggest shortcoming is the label/goto fiction, which would make this difficult to run in practice.

Hm, could have also turned the case into an if along the way.

The map is irreducible, but at least a compiler can turn it into a loop and eliminate the anonymous function. This is basically necessary, since we need to be able to do intra-procedural jumps from the loop body to the outside.

masak commented 3 years ago

One more request: do dedup, see if those rev and part calls can be eliminated altogether.

masak / bel

A musing on abstract interpretation #65