stopachka / stopablog-archived

[Archive] The blog, powered by OneGraph : )
https://www.stopa.io
6 stars 1 forks source link

LLisp: Lisp in Lisp #292

Open stopachka opened 2 years ago

stopachka commented 2 years ago

Last week I had a thought: “What’s the simplest Lisp interpreter I could write, which supports macros?"

A weekend of Clojure hacking and some confusion later, a REPL was born. In this essay, we’ll go through the same journey and make our own Lisp interpreter, which supports macros…in Clojure! Let’s call it, LLisp: a lisp in a lisp.

By the end, you’ll write your own macros in your own darn programming language! Let’s get into it.

Sketch

Okay, let’s sketch out the basic idea behind what we need to do:

image

A programmer writes some code. So far, it’s just text, and there’s not much we can do with that. We read the programmer’s text, and convert it into data-structures. We can do something with data-structures: we can evaluate data-structures, and return whatever the programmer intended.

If we can convert "(+ 1 1)" into 2, we have ourselves the roots of a programming language.

Reader

Let’s handle “2. Read”. If a programmer writes text like "(+ 1 1)", we want to convert it to data-structures. Something like:

(read-string "(+ 1 1)")
; =>
(+ 1 1)

We could write read-string ourselves. For the simplest case it’s pretty easy [^1]. But, we’re in Clojure after all, and Clojure already understands how to read Lisp code. Let’s just cheat and use Clojure’s edn:

(ns simple-lisp.core
  (:refer-clojure :exclude [eval read read-string])
  (:require [clojure.edn :refer [read read-string]]))

And voila, read-string does what we want:

(read-string "(+ 1 1)")
; =>
(+ 1 1)

Evaluation

Okay, now to “3. Evaluate”. Remember, our goal is to take data-structures, evaluate **them and return whatever the programmer intended. We can begin with a simple function like this:

(defn eval [form] 
  ;; TODO 
  )

Evaluating Literals

Let’s get the easy things out of the way. Some things just evaluate to themselves: if a programmer wrote 12 for example, all we’d have to do is to return the number 12. This is the same for strings ("foo"), characters (\b), and booleans (true). They’re literals.

Here’s how we can evaluate literals. Let’s detect them first:

(def literal?
  (some-fn string? number? boolean? char? nil?))

If you haven’t seen some-fn before, this is one of clojure’s many handy and powerful utilities. Now we can handle literals in eval:

(defn eval [form]
  (cond
    (literal? form) form))

And boom, our programming language starts to do something:

(eval (read-string "12"))
; => 12

Introducing Quote

Let’s get to quote. Quote is a crucial form in Lisp. It lets you write code that does not evaluate. For example:

(quote (+ 1 1))
; => 
(+ 1 1)

Because (+ 1 1) was inside quote, we didn’t evaluate it to return 2. Instead we returned the expression itself. This seems weird if you come from languages that don’t support quotes, but it’s used all the time in Lisp. In fact, there’s a shortcut for it:

'(+ 1 1) 

When this is read, it converts to

(quote (+ 1 1))

Thankfully, edn’s read-string does this for us. 🙂

Evaluating Quote

So, how do we evaluate quote? Let’s first create a function that can detect them:

(defn seq-starts-with? [starts-with form]
  (and (seqable? form) (= (first form) starts-with)))

(def quote? (partial seq-starts-with? 'quote))

We see if our form is a list, and the first element is the symbol quote

(quote? (read-string "'(+ 1 1)"))
; => 
true

Works like a charm. Now let’s update eval:

(defn eval [form]
  (cond
    (literal? form) form
    (quote? form) (second form)))

If our form is a quote, we return the “second” part of it. And voila,

(eval (read-string "(quote (+ 1 1))"))
; => (+ 1 1)

Introducing Symbols

Next, we come to symbols. Symbols are a special Lisp data type. They’re like variables, in the sense that when we evaluate a symbol, we **look it up.

image

Look it up where? Our interpreter will need some kind of environment, that keeps track of all the variables we defined. If we had an environment, where the symbol fav-num pointed to 41 for example, here’s what evaluating fav-num would look like:

(eval env (read-string "fav-num"))
; => 
41

Evaluating Symbols

Let’s first create this environment. We can use java’s Hashmap to keep a mapping of symbols to their values.

Let’s import java’s Hashmap:

(ns simple-lisp.core
  (:refer-clojure :exclude [eval read read-string])
  (:require [clojure.edn :refer [read read-string]])
  (:import (java.util HashMap)))

And make a quick function to create an env map:

(defn env [] {:globe (HashMap. {'fav-num 41})})

Now, we can accept an env map in eval, and start to handle symbols:

(defn eval [env form]
  (cond
    (literal? form) form
    (quote? form) (second form)
    (symbol? form) (lookup-symbol env form)))

Clojure already has a handy symbol? function, which we can use. When that’s true, we’ll lookup-symbol. Here’s how lookup-symbol could look:

(defn lookup-symbol [{:keys [globe]} sym]
  (let [v (when (contains? globe sym) [(get globe sym)])]
    (assert v (format "expected value for sym = %s" sym))
    (first v)))

And with that, symbols work!

(eval (env) (read-string "fav-num"))
; => 
41

Evaluation Primitives

Next up, let’s take an important step to making (+ 1 1) work. We’ll want to have + mean something. What should the symbol + point too?

How about Clojure functions?

(defn env [] {:globe (HashMap. {'+ + 'fav-num 41})})

If we did that, now whenever we evaluate the symbol +, it would return a clojure function:

(eval (env) (read-string "+"))
; => #object[clojure.core$_PLUS_ 0x4efcb4b5 "clojure.core$_PLUS_@4efcb4b5"]

And if we “received” a clojure function in eval, we can just treat it as a literal:

(def literal?
  (some-fn string? number? boolean? char? nil? fn?))
(eval (env) +)
; => #object[clojure.core$_PLUS_ 0x4efcb4b5 "clojure.core$_PLUS_@4efcb4b5"]

This is one small step in our essay, but a giant step for our Lisp. Every element in (+ 1 1) can now be evaluated.

Primitive Applications

So how do we run these functions? Well, let’s update eval , to handle lists like (+ 1 1):

(defn eval [env form]
  (cond
    (literal? form) form
    (quote? form) (second form)
    (symbol? form) (lookup-symbol env form)
    :else (eval-application env form)))

Easy. If we can’t figure out what the form is, it must be list, and it must mean we want to run it! Here’s how eval-application could work:

(defn eval-application [env [f & args]]
  (let [f-evaled (eval env f)]
    (cond
      (fn? f-evaled) (apply f-evaled (eval-many env args)))))

In eval-application , we evaluate the first part of our list. In our example, this would be the symbol +, which would return a clojure function.

If f-evaled is a clojure function (which it is), we would run:

(apply f-evaled (eval-many env args))

eval-many is just a helper to evaluate a list of forms:

(defn eval-many [e forms] (map (partial eval e) forms))

args would be (1 1), so eval-many would return (1 1), and that means we’d get the equivalent of:

(apply + '(1 1))

That would return 2, and babam, we have (+ 1 1) working!

(eval (env) (read-string "(+ 1 1)")
; => 
2

Remember, we evaluate recursively, so we can do some intricate stuff already:

(eval (env) (read-string "(+ 1 (+ 1 fav-num))")
; => 
43

Introducing def

Oky doke, we have an environment, we got symbols that can look stuff up in that environment, and we can even evaluate (+ 1 (+ 1 fav-num)).

But, how are we going to define variables? For example, let’s say we wanted to have the symbol second-fav point to 42.

Well, we can introduce a special form. Something like this:

(def second-fav (+ 1 fav-num))

If we receive this def form, we’d just update our environment, to point the symbol second-fav to whatever the evaluation of (+ 1 fav-num) would be.

Evaluating def

That sounds like a plan. Let’s detect this form first:

(def def? (partial seq-starts-with? 'def))

And update eval to handle it:

(defn eval [env form]
  (cond
    (literal? form) form
    (quote? form) (second form)
    (symbol? form) (lookup-symbol env form)
    (def? form) (eval-def env form)
    :else (eval-application env form)))

And here’s all eval-def would need to do:

(defn eval-def [env [_ k v]]
  (assert (symbol? k) (format "expected k = %s to be a symbol" k))
  (.put (:globe env) k (eval env v)))

Here, we know the first argument to def is the symbol, k. The second is the value v we want to save. So, we update our environment’s globe Hashmap, and point the symbol k, to whatever (eval env v) is. In our case, k would be second-fav, v would be (+ 1 fav-num), and (eval env v) would become 42.

Prettyy cool, this works!

(let [e (env)] 
    (eval e (read-string "(def second-fav (+ 1 fav-num))"))
    (eval e (read-string "second-fav")))
; => 
42

Introducing if

Okay, let’s implement one more special form. We need something to handle conditional logic:

(if (= fav-num 41) 'yess 'noo))

When we get if, we’ll can evaluate the test form: (= fav-num 41). If that’s true, we’ll then evaluate the true case 'yess. Otherwise, we’ll evaluate the false case: 'noo. Sounds like a plan.

Evaluating if

To implement if, as usual let’s detect it first:

(def if? (partial seq-starts-with? 'if))

Then use it in eval:

(defn eval [env form]
  (cond
    (literal? form) form
    (quote? form) (second form)
    (symbol? form) (lookup-symbol env form)
    (def? form) (eval-def env form)
    (if? form) (eval-if env form)
    :else (eval-application env form)))

And here’s what eval-if could look like:

(defn eval-if [env [_ test-form when-true when-false]]
  (let [evaled-test (eval env test-form)]
    (eval env (if evaled-test when-true when-false))))

Just a few lines, following our description to a tee. We see if evaled-test is true. Then, we either evaluate when-true, or when-false.

While we’re at it, let’s add a bunch of nice primitive functions in our env:

(defn env [] {:globe (HashMap. {'+ + '= = 'list list 'map map 'concat concat
                                'first first 'second second 'not not 'fav-num 41})})

And bam. Our little interpreter can do quite a lot now!

    (eval (env) (read-string "(if (= fav-num 41) 'yess 'noo)"))
    ; => yess

Introducing Functions

Let’s take things up a notch. We have Clojure’s + and so on, but what if our programmer wanted to write their own functions? Here’s an example:

((clo (x) (+ x fav-num) 2))

Here, the programmer wrote a function that takes x and adds fav-num to it.

Say they wrote this. How can we run their function? Well, their definition has an arguments and a body. In this case, it’s (x) and (+ x fav-num) respectively. In this example, the function is called with 2.

Here’s the insight: If we could just evaluate (+ x fav-num), in an environment where x points to 2, voila, it’ll be like we ran the function! Here’s a diagram to explain the idea:

image

How can we point x to 2? We can’t just update our our globe Hashmap, because then x would be permanently set to 2, even when the body was done evaluating.

So we want a new idea. A scope. We can think of scope as an environment, where the changes we make only last while body is being evaluated. If we can introduce a concept like this, we’ll have functions!

Evaluating Functions

Okay, let’s get this to work. First, let’s make sure that if someone writes (clo (x) (+ x 1)), we don’t actually try to eval-application. Instead, we should treat this as a new closure literal.

We can detect it:

(def closure? (partial seq-starts-with? 'clo))

And update our literal?:

(def literal?
  (some-fn string? number? boolean? char? nil? fn? closure?))

Now, if a user just writes a function, it’ll return itself:

(eval (env) (read-string "(clo (x) (+ x fav-num))"))
; => (clo (x) (+ x fav-num))

Next, let’s update our environment, to have a new scope map:

(defn env [] {:globe (HashMap. {'+ + '= = 'list list 'map map 'concat concat
                                'first first 'second second 'fav-num 41})
              :scope {}})

And let’s update lookup-symbol, to also look up variables in scope:

(defn lookup-symbol [{:keys [globe scope]} sym]
  (let [v (some (fn [m] (when (contains? m sym) [(get m sym)]))
                [globe scope])]
    (assert v (format "expected value for sym = %s" sym))
    (first v)))

Closer and closer. Now let’s make it run. If we wrote this:

((clo (x) (+ x fav-num)) 1)

This list would go to eval-application. Let’s update it:

(defn eval-application [env [f & args]]
  (let [f-evaled (eval env f)]
    (cond
      (fn? f-evaled) (apply f-evaled (eval-many env args))
      (closure? f-evaled) (eval-closure env f-evaled (eval-many env args)))))

f-evaled would be (clo (x) (+ x fav-num)). This would mean (closure? f-evaled) would become true.

We’d evaluate every argument with (eval-many env args), which would be (1), and give this off to eval-closure. Here’s how eval-closure could look:

(defn eval-closure [env [_ syms body] args]
  (eval (assoc env :scope (assign-vars syms args)) body))

We’d do exactly as we did in the diagram. We’d get syms, which would be (x), and args, which would be (1). We’d then create a new scope:

(defn assign-vars [syms args]
  (assert (= (count syms) (count args))
          (format "syms and args must match syms: %s args: %s"
                  (vec syms) (vec args)))
  (into {} (map vector syms args)))

That’s {x 2}. We’d merge it into our environment, and then evaluate body.

All of a sudden, (+ x fav-num) will become 43!

(eval (env) (read-string "((clo (x) (+ x fav-num)) 2)"))
; => 43

Introducing Lexical Scope

Now, these functions are fine and all, but we can do more. Many languages support lexical scoping. For example, we can do this in javascript:

const makeAdder = (n) => (a) => n + a
const addTwo = makeAdder(2);
addTwo(4) 
// => 
6

Here, makeAdder returned a function. This function remembered the value of n where it was defined. How can we make this work in our own language?

Okay, first things first, let’s say that we let the programmer see the current scope. Here’s an example of what I mean:

((clo (x) scope) 2)
; => 
{x 2}

This special scope symbol, when evaluated returned {x 2}, the current scope!

Nice now, what if we updated our function literal? It could look something like this:

(clo {n 2} (a) (+ n a))

Here, we have a special spot for the scope! When we define the function, if we could just “plop” the scope of wherever it was defined in there, badabing badaboom, we’d support lexical scoping!

Here’s how our make-adder could work:

(def make-adder (clo nil (n) 
                  (list 'clo scope '(y) '(+ y n)))) 
(def add-two (make-adder 2))
(add-two 4)
; => 
6

When we run (make-adder 2) we’d evaluate (list 'clo scope '(y) '(+ y n)). That would return (clo {n 2} (y) (+ y n))! All of a sudden, we support lexical scoping.

You may think it’s weird that we wrote (list 'clo scope '(y) '(+ y n))) — quote verbose. But, hey, we’ll support macros soon, and that help us write this in a spiffy way. Just you wait 🙂.

Implementing Lexical Scope

Okay. that’s a lot of explaining, but the changes will be simple. First we need to make sure that if we get a scope symbol, we return the current scope:

(defn lookup-symbol [{:keys [globe scope]} sym]
  (let [v (some (fn [m] (when (contains? m sym) [(get m sym)]))
                [{'scope scope} globe scope])]
    (assert v (format "expected value for sym = %s" sym))
    (first v)))

And now all we need to do, is to update eval-closure:

(defn eval-closure [env [_ scope syms body] args]
  (eval (assoc env :scope (merge scope (assign-vars syms args))) body))

Here we take the scope from the closure itself! With that, lexical scoping works!

(((clo nil (x)
        (list 'clo scope '(y) '(+ y x))) 2) 4)
; => 
6

Introducing Macros

Now, we have all we need to implement macros! Let’s think about what macros really are.

Imagine writing the function unless:

(def unless (clo nil (test v)
              (if (not test) v)))

Will it work? If we ran (unless (= fav-num 41) 'yes), it would return the symbol yes.

But what if we ran this?

(unless (= fav-num 41) (throw-error))

Since unless is a function, we would evaluate each argument first right? In that case, we’d evaluate (= fav-num 41), which would be true. But, we’d also evaluate (throw-error). That would break our program. This defeats the whole purpose of unless , as (throw-error) was supposed to run only when test was false.

image

Now, imagine we wrote the macro unless:

(def unless (mac (clo nil (test v)
                    (list 'if (list 'not test) v))))

If we ran (unless (= fav-num 41) (throw-error)), here’s what would happen:

The value of test would not be true, it would actually be the list (= fav-num 41). Similarly, we wouldn’t evaluate (throw-error) . v would just be the actual list (throw-error).

The arguments to a macro are not evaluated. When the macro runs, it returns code. The result of

(list 'if (list 'not test) v))

would become

(if (not (= fav-num 41)) (throw-error))

And when we evaluated that, things would work as expected! (throw-error) would never evaluate.

This is the key difference between functions and macros. Functions take evaluated values as arguments and return evaluated values. Macros receive unevaluated arguments, return code, and that code is evaluated.

image

An important piece to note is that we eval twice. We give the macro the unevaluated arguments, which returns new code, and we once again evaluate that.

So, how can we support this?

Evaluating Macros

So much explanation, for such little code. Are you ready? Let’s add macros.

First, let’s use this new structure for a macro: a macro is a list that begins with mac, and has closure inside:

(mac (clo nil (...))

We can detect macro, and mark it as a literal:

(def macro? (partial seq-starts-with? 'mac))

(def literal?
  (some-fn string? number? boolean? char? nil? fn? closure? macro?))

Next, we’ll want to update eval-application:

(defn eval-application [env [f & args]]
  (let [f-evaled (eval env f)]
    (cond
      (fn? f-evaled) (apply f-evaled (eval-many env args))
      (closure? f-evaled) (eval-closure env f-evaled (eval-many env args))
      (macro? f-evaled) (eval-macro env f-evaled args))))

Before we always ran (eval-many env args). But this time, if it’s a macro, we pass in args directly! That’s the ”code” itself 🙂.

And now for eval-macro:

(defn eval-macro [env [_ clo] args]
  (eval env
        (eval env (concat [clo] (map (partial list 'quote) args)))))

Oh my god. 3 lines!! We do exactly as we said in our diagram:

image

We take the “closure” out of our macro, and run it with unevaluated args. We can do that just by wrapping each arg in a quote: (map (partial list 'quote) args).

Once we have the resulting code, we evaluate that again, and boom, we have macros.

(def unless (mac (clo nil (test v)
                    (list 'if (list 'not test) v))))
; works!
(unless (= fav-num 41) (throw-error))

Your own Macros

Okay, we have a Lisp that supports macros. Let’s go ahead and write some of our own!

defmacro

Let’s get meta off the bat. Notice what we do when we define a macro:

(def unless (mac (clo nil (test v)
                      (list 'if (list 'not test) v))))

This whole (mac (clo nil …)) business is a bit unnecessary. Let’s just write macro that does this for us!

(def defmacro
      (mac (clo nil (n p e)
                (list 'def n
                      (list 'mac (list 'clo nil p e))))))

This generates the code (def n (mac (clo nil …))). Now we could write:

(defmacro unless (test v) 
  (list 'if (list 'not test) v))

Cool!

fn

Okay, remember how we wrote our function for lexical scoping?

(def make-adder (clo nil (n)
                      (list 'clo scope '(y) '(+ y n))))

Let’s have a macro write this for us:

(defmacro fn (args body)
              (list 'list ''clo 'scope
                    (list 'quote args)
                    (list 'quote body)))

Here’s what happens if we wrote:

(def make-adder (clo nil (n)
                    (fn (y) (+ y n))))

(def add-two (make-adder 2))

When the macro expansion for fn runs, the args would would be (y) and body (+ y n). So

(list 'list ''clo 'scope
      (list 'quote args)
      (list 'quote body))

would expand to

(list 'clo scope '(y) '(+ y n))

and that’s exactly what we wrote by hand! Bam, now we can use fn instead of this whole 'clo scope business.

defn

Now if we wanted to define functions, we could write:

(def add-fav-num (fn (x) (+ x fav-num)))

But we can make it tighter. Let’s add a defn macro, so we could write:

(defn add-fav-num (x) (+ x fav-num))

Easy peasy:

(defmacro defn (n args body)
  (list 'def n (list 'fn args body)))

let

One more cool macro. Right now, there’s no real way to define temporary variables. Something like:

(let ((x 1) (y 2)) 
  (+ x y))
; => 
3

How could we support it? Well, what if we rewrote (let …) to:

((fn (x y) (+ x y)) 1 2)

We could just use the argument list of a function to create these variables! Perfecto. Here’s the macro:

(defmacro let (pairs body)
  (concat (list (list 'fn (map first pairs) body))
                (map second pairs)))

Fin

What a journey. 80 lines, 4000 words, and hopefully a fun time 🙂. You now have a Lisp with macros, and you’ve written some cool ones yourself. Here’s the full source if you’d like to follow along from there.

Credits

The idea to represent macros and closures like this, came from PG’s Bel.

Thanks to Mark Shlick, Daniel Woelfel for reviewing drafts of this essay.

[^1]: Here’s how Norvig did it for his Lis.py