lexi-lambda / hackett

WIP implementation of a Haskell-like Lisp in Racket
https://lexi-lambda.github.io/hackett/
ISC License
1.17k stars 50 forks source link

Proposal: macro expressions instead of macro bindings #51

Open gelisam opened 6 years ago

gelisam commented 6 years ago

Acknowledgement

This proposal is inspired by a presentation on Typer I recently saw at McGill University's 2017 Programming Languages Workshop. There doesn't seem to be any information on the internet about this academic programming language yet. Typer's goals seem very similar to Hackett's, as it also wants to interleave macro-expansion and type-checking. The main difference is that Typer is a dependently-typed programming language, which aims to combine Lisp+OCaml+Coq instead of Haskell+Racket.

Proposal

Hackett macros currently have to be defined at the top-level, using the special form define-syntax-parser:

(define-syntax-parser list
  [(list) #'Nil]
  [(list x xs ...) #'{x :: (list xs ...)}])

After that definition, the identifier list is known to be a macro, so later occurrences of (list ...) are macro-expanded instead of being treated as a function application:

; {1 :: 2 :: 3 :: Nil}
(list 1 2 3)

The proposal is: instead of a special form, let's have a special type. This type is an otherwise ordinary wrapper around a function which manipulates syntax objects:

(data Macro
  (DefMacro {Syntax -> Syntax}))

The type needs to be treated specially: when type-inference detects that an expression of type Macro is applied to an argument, it knows that this is not a function call which will be evaluated at runtime, but a macro which must be expanded at compile-time. For example:

; {1 :: 2 :: 3 :: Nil}
((DefMacro (syntax-parser
             [(list) #'Nil]
             [(list x xs ...) #'{x :: (list xs ...)}]))
 1 2 3)

Such expressions may refer to locally-bound variables:

; {1 :: 2 :: 3 :: Nil}
(let ([list (DefMacro (syntax-parser
                        [(list) #'Nil]
                        [(list x xs ...) #'{x :: (list xs ...)}]))])
  (list 1 2 3))

In that case, the expression to which the locally-bound variable refers must be evaluated at compile-time. If that expression refers to other variables, then those must be evaluated as well. Obviously, this is only possible for constant expressions, for which all the information is available at compile time; if the expression depends on a lambda-bound variable (such as one bound by the do macro), this results in a compile-time error. Typer calls it an "elaboration error".

; elaboration error
(defn one-two-three : {Macro -> (List Integer)}
  [[list] (list1 2 3)])

Type inference

Worryingly, since the application (f x) could either be a macro application or a function application, we can no longer deduce that there are some types a and b for which f has type {a -> b} and x has type a. We can't even deduce that x has a type, because it could be some keyword recognized by the macro! That sounds pretty bad.

Fortunately, since we know that the only valid macro expressions are constant expressions, intuitively we should always be able to determine from the surrounding context whether that expression is a macro or not. More formally, the algorithm for bidirectional type-checking first infers the type of f, and if it has the form {a -> b}, it checks that x has type a. So changing the algorithm to not check x if it turns out that f has type Macro doesn't affect type inference at all.

Motivating examples

The example given at the top was chosen for its simplicity, but it didn't really demonstrate the advantages of the proposal over define-syntax-parser. Here are some better examples.

Local macros

One advantage is that macros can now be defined in a small local scope. For small syntactic improvements which aren't reusable outside of a narrow scope, it makes more sense to define macros locally than at the top-level. For example, in the cross-product implementation below, there is a pattern which is repeated 3 times with small variations:

(data Vec
      (Vec Integer Integer Integer))

(defn cross-product : {Vec -> Vec -> Vec}
  [[(Vec x1 y1 z1) (Vec x2 y2 z2)]
   (Vec {{y1 * z2} - {z1 * y2}}
        {{z1 * x2} - {x1 * z2}}
        {{x1 * y2} - {y1 * x2}})])

That particular pattern isn't likely to be repeated anywhere outside of that function, so it would be nice to define a local macro eliminating the duplication:

(defn cross-product : {Vec -> Vec -> Vec}
  [[(Vec x1 y1 z1) (Vec x2 y2 z2)]
   (let [[submul (DefMacro
                  (syntax-parser
                    [(submul u:id v:id)
                     #`(- (* #,(suffix-id #'u "1") #,(suffix-id #'v "1"))
                          (* #,(suffix-id #'v "2") #,(suffix-id #'u "2")))]))]]
     (Vec (submul y z)
          (submul z x)
          (submul x y)))])

Of course, the fact that we can't currently define local macros in Hackett isn't very limiting, as we can simply define a top-level macro whose name indicates that it isn't used anywhere else:

(define-syntax-parser cross-product-helper
  [(submul u:id v:id)
   #`(- (* #,(suffix-id #'u "1") #,(suffix-id #'v "1"))
        (* #,(suffix-id #'v "2") #,(suffix-id #'u "2")))])

(defn cross-product : {Vec -> Vec -> Vec}
  [[(Vec x1 y1 z1) (Vec x2 y2 z2)]
   (Vec (cross-product-helper y z)
        (cross-product-helper z x)
        (cross-product-helper x y))])

This example also demonstrates one way to implement DefMacro: by rewriting the program into a form in which all the macros are at the top-level.

Macro combinator libraries

Libraries like turnstile and syntax/parse demonstrate that it is possible to define "meta-macros", that is, macros which define other macros. The idea is to define a DSL for describing macros, and to write a meta-macro which interprets this DSL. Well, in Haskell, our DSLs are combinator libraries, made out of precise types and reusable abstractions like Applicative and Monad, so if we're going to have a DSL for describing macros, it would be nice if that DSL could be a combinator library. Now that a macro is a normal value, of type Macro, which can be constructed using normal expressions, we can do precisely that.

; The intention is to construct a Syntax object out of two
; Syntax variables, e.g. {{x1 * y2} - {y1 * x2}} from x and y
(data (Template a)
      (Template {Syntax -> Syntax -> a}))

(defn runTemplate : {(Template Syntax) -> Macro}
  [[(Template f)]
   (DefMacro (syntax-parser
               [(_ u v) (f #'u #'v)]))])

(instance (Functor Template) ...)
(instance (Applicative Template) ...)  ; for liftA2 below

; (runTemplate (tU "1") u v) ;=> #'u1
(defn tU : {String -> (Template Syntax)}
  [[suffix]
   (Template (λ [u v] (suffix-id u suffix)))])

; (runTemplate (tV "2") u v) ;=> #'v2
(defn tV : {String -> (Template Syntax)}
  [[suffix]
   (Template (λ [u v] (suffix-id v suffix)))])

; (runTemplate (op #'+ (tU "1") (tV "2")) u v) ;=> #'{u1 + v2}
(defn op : {Syntax -> Template Syntax -> Template Syntax -> Template Syntax}
  [[o] (liftA2 (λ [x y] #'{x o y}))])

Now that we have a library for creating macros based on a Template, we can now rewrite our previous example in a much more readable way, by defining a local Template and using runTemplate to turn it into a local macro:

(defn cross-product : {Vec -> Vec -> Vec}
  [[(Vec x1 y1 z1) (Vec x2 y2 z2)]
   (let [[t- (op "-")]
         [t* (op "*")]
         [t (t- {(tU "1") t* (tV "1")}
                {(tV "2") t* (tU "2")})]
         [submul (runTemplate t)]]
     (Vec (submul y z)
          (submul z x)
          (submul x y)))])

This time, the version where all the macros are at the top-level has to run some Hackett code at compile-time, and has to be careful to define the macros in the right order and in the right phase, so having the compiler do that for us is a lot more valuable.

lexi-lambda commented 6 years ago

This is a really interesting proposal. Thank you for writing it up!

I do think that many of the things you’ve written make a lot of sense. I think it’s extremely likely that we want to be able to write macros in Hackett, not Racket, though I’m not sure if it’s an immediate concern of mine. Still, I don’t think it should theoretically be all that hard! As you describe, the right first step is to make something like a Syntax type that corresponds to Racket syntax objects, plus a Datum ADT that corresponds to the result of syntax-e. In fact, Datum probably ought to be parameterized over the type of the leaves, so it’s possible to have syntax-e : {Syntax -> (Datum Syntax)} and syntax->datum : {Syntax -> (Fix Datum)}.

That said, I’m a little less convinced about the notion of automatically rewriting bindings and expressions to be macro bindings and macro invocations based exclusively on type information. That might be more feasible in a language with a simpler type inference scheme, but I think it would be potentially tricky to do right with Hackett.

More to the point, however, Racket has notion of phases. Local macros are defined with let-syntax, which Hackett can absolutely support, and top-level macros are defined with define-syntax, which should possibly be called defsyntax to be consistent with Hackett’s naming scheme. It doesn’t seem like there is much to gain by making that distinction implicit rather than explicit (and I imagine there are many things to lose—it would almost certainly make things dramatically more complicated).

Your first example of a local macro is essentially exactly the same with let-syntax. Assuming Hackett provided a version of let-syntax that properly propagated the type information, I think this would work just fine:

(defn cross-product : {Vec -> Vec -> Vec}
  [[(Vec x1 y1 z1) (Vec x2 y2 z2)]
   (let-syntax ([submul (syntax-parser
                          [(submul u:id v:id)
                           #`(- (* #,(suffix-id #'u "1") #,(suffix-id #'v "1"))
                                (* #,(suffix-id #'v "2") #,(suffix-id #'u "2")))])])
     (Vec (submul y z)
          (submul z x)
          (submul x y)))])

Your section on macro combinators is a little more interesting. My gut instinct is that you really do want to have a syntax/parse-like DSL for pattern-matching on and reassembling syntax, but I also think you’re right in that you (1) want it to be typed and (2) want “syntax classes” to be first-class. I think the fact that syntax/parse syntax classes aren’t first-class is unfortunate.

We can do that, though, since Racket’s phasing system means we can theoretically just (require (for-syntax hackett)) and use all of Hackett at compile-time! If Syntax is an ordinary datatype, like we described above, that means all of the things you described are well within reach, though they might look a little different. I think this is an area that could benefit from a lot of experimentation, and putting some basic primitives (like let-syntax, Syntax, Datum, syntax-e, syntax->datum, and datum->syntax) into Hackett would make it possible to experiment with these things in userspace code, which is a nicer place to play with unknowns than in the core language.

SuzanneSoy commented 6 years ago

Concerning "local macros which look like lambda functions", I have built something along these lines as part of my type-expander library for Typed Racket (see here for the part of the docs which covers this aspect).

@gelisam's example…

((DefMacro (syntax-parser
             [(list) #'Nil]
             [(list x xs ...) #'{x :: (list xs ...)}]))
 1 2 3)

… would be similar to a type-level macro, which builds the type for a list of length 3 whose elements must be 1, 2 and 3:

#lang type-expander
(require (for-syntax syntax/parse))
(define-type l123
  ((Letrec ([lst (Λ (self . args)
                   (syntax-parse #'(self . args)
                     [(lst) #'Null]
                     [(lst x xs ...) #'(Pairof x (lst xs ...))]))])
     lst)
   1 2 3))

;; At the REPL:
(:type l123)
=> (List One 2 3)

Allowing expressions of the form (x 1 2 3), where x is an expression which produces an "anonymous macro" (just as (λ args body) produces an anonymous function) requires a slightly different behaviour than that of Racket's macro expander, especially if (x 1 2 3) is itself allowed to produce an "anonymous macro" within the larger expression ((x 1 2 3) 4 5 6).

Unfortunately, I haven't written enough type expanders to actually use this feature, so I really cannot say whether such a thing is a good idea or not. As @lexi-lambda says, this can most of the time be rewritten using let-syntax around the whole expression, instead of writing an "anonymous macro" in the function position of an application-like form (i.e. x in (x 1 2 3)).

I think it's a feature that sounds nice on paper, but is of little use in practice. I'd be interested to know if this does get implemented elsewhere, though, as I lack enough usage data to have a strong opinion on this feature's usefulness.

gelisam commented 6 years ago

This proposal is inspired by a presentation on Typer I recently saw at McGill University's 2017 Programming Languages Workshop. There doesn't seem to be any information on the internet about this academic programming language yet.

Now there is: Typer: An infix statically typed LISP

gelisam commented 6 years ago

I'm surprised to see that this issue is still open, since you have rejected my proposal. You counter-proposed:

putting some basic primitives (like let-syntax, Syntax, Datum, syntax-e, syntax->datum, and datum->syntax) into Hackett would make it possible to experiment with these things in userspace code, which is a nicer place to play with unknowns than in the core language.

followed by userspace experimentations towards a typed syntax/parse-like DSL for pattern-matching on and reassembling syntax. Is this issue still open because you want to track the completion of those primitives? If so, I'd recommend splitting them off into other, shorter issues so contributors can get to the core idea without first having to go through my proposal's huge wall of text. I can create those other issues if you want.

lexi-lambda commented 6 years ago

I’m afraid I haven’t had much time lately to work on Hackett, which is why some maintainership duties have slipped, but I’m hoping that will change in a couple of months. Anyway, I did actually start working on a prototype implementation of types for syntax things (see this tweet and this other one), but there were some subtleties I never quite worked out. I do want to take the time to watch the video you linked, though.