microsoft / knossos-ksc

Compiler with automatic differentiation
Other
45 stars 10 forks source link

Tuple pattern matching #185

Open toelli-msft opened 4 years ago

toelli-msft commented 4 years ago

We've discussed occasionally whether we should add tuple pattern matching. I think it's a good idea and would make a lot of our code neater. Finding that it is also supported in C++ 17 has convinced me that we should just go ahead and do it!

https://skebanga.github.io/structured-bindings/

awf commented 4 years ago

You mean to the IR? It's quite a sugary change? We're trying to reduce sugar in the IR. If we want a lisp-like surface, we should just translate Scheme, Racket, or Clojure.

toelli-msft commented 4 years ago

Ah I see. Yes, that would go against our wish to minimise our core IR.

awf commented 4 years ago

This came up again in the context of #274 -- let's discuss more :)

awf commented 4 years ago

From #274: We can, without much trouble, just allow existing let to pattern match:

(let ((a1 a2 a3 ... an) t) body)

And it would make some of the tupling/detupling in our AD look better.

toelli-msft commented 4 years ago

This change has more impact across the rest of the codebase than I expected. To take a random handful of functions that will need updating to accommodate this change:

simonpj commented 4 years ago

Typechecking too! And optimisation. It's a far reaching change.

awf commented 4 years ago

This is assuming we don't just do it as a preprocess? And that we can't just do that, because we need it done deeply for dup? So what's the time estimate? It feels of the order of removing vector sizes?

toelli-msft commented 4 years ago

Yes, removing vector sizes seems like a good order-of-magnitude equivalent.

We could start by implementing it as a preprocess. That would take between a couple of hours and half a day. A preprocess wouldn't be sufficient to allow SUL-based AD to work though. SUL-based AD needs actual untupling in the AST.

simonpj commented 4 years ago

Thinking of it as pre-processing step makes this only a programmer convenience, which is not a high priority. I would say not worth the bother.

The main point that I can see is that it allows us to define a single-use variant of ksc: a bound variable is used exactly once (or maybe at most once) in its scope. So we can say

let (x1, x2) = dup x in ...x1...x2...

If we didn't have pattern-matching let we couldn't do that. We'd have to say

let t = dup x in
let x1 = fst t in
let x2 = snd t in 
...x1...x2....

and now t is used twice, which defeats the whole object of the exercise.

TL;DR: I think there is only a point in putting a pattern matching let into ksc if we carry it right through, as part of the core language, understood by the optimiser, rewriter, code generator, etc. But let's keep its as simple as poss: just one level of pattern match.

We could then desugar away multi-arg functions

f (x, y) = blah

So the additional complexity does allow a small simplification too. is desugared to

f t = let (x,y) = t in blah
toelli-msft commented 4 years ago

Ah yes, it's nice to think of this as giving us "Single Use Form" as opposed to "Single Use Language". If we have tuple unpacking we can embed the Single Use Language inside KS. I will look it more detail about what exactly needs to be done.

awf commented 3 years ago

I believe this introduces a parse ambiguity:

(let (l r) body) ; single binding

(let ((l r) (f 3)) body) ; multiple bindings l=r, f=3, OR single tuple-binding (l, r) = f(3)

(let ((l1 r1) (l2 r2) (l3 r3)) body) ; multiple bindings

Potential fixes:

  1. Insist on the "tuple" pseudo-keyword on the lhs.
    This is fine, but tuple isn't a keyword, it's a primitive. OTOH, disallowing "tuple" as a variable name seems fairly reasonable. So the tuple-bound form becomes
    
    (let ((tuple l1 r1) (l2 r2)) body) ; tuple (l1 r1) bound to call (l2 r2)

(let (((tuple l1 r1) (l2 r2)) (x 3) ((tuple p q) t)) body) ; multiple bindings


2. Require even single bindings to be nested lists, i.e. replace
```lisp
(let (l r) body) ; single binding    

with

(let ((l r)) body) ; single binding    
  1. Remove the sugar allowing multiple bindings. That is, replace
    (let ((a (f 1))
      (b (f 2))
      (c (f 3)))
    body)

    with

    (let (a (f 1))
    (let (b (f 2))
    (let (c (f 3))
    body)))

    I kind of prefer this, as most autogenerated ks won't use the sugared form, and a profusion of lets makes it easier to see bindings, and the pretty printer already left-aligns lets. It also removes some really unnecessary sugar in place of a needed construct (for SUF).

awf commented 3 years ago

@toelli-msft @simonpj

simonpj commented 3 years ago

I'm fine with requiring multiple lets for multiple bindings.

toelli-msft commented 3 years ago

Interesting, I hadn't realised that. Yes it's ambiguous and what happens will be determined by the operational semantics of the Parsec parser. broken does not work; the works... functions do work. It seems like it tries to parse the first list as a list of binders and if it can then it assumes it's a tuple-unpacking let. If it can't parse the first list as a list of binders then it tries again to parse it, this time as a binder and as expression.

(def broken Float ((x : Float) (y : Float))
     (let ((a x) (bar y))
       (add a bar)))

(def works Float ((x : Float) (y : Float))
     (let ((a z) (bar y))
       (add a z)))

(def works_also Float ((x : Float) (y : Float))
     (let ((a 1.0) (bar y))
       (add a bar)))

(def works_also2 Float ((x : Float) (y : Float))
     (let ((a (neg x)) (bar y))
       (add a bar)))
awf commented 3 years ago

Thanks Tom. Which of the above options 1-3 do you prefer? Simon and I seem to be plumping for 3.

toelli-msft commented 3 years ago

I still prefer (let ((tuple x y) t) ...), firstly because it seems strange to me to require a special form (tuple) for constructing tuples but not take advantage of it in the pattern match syntax, and secondly because (KINAUFL notwithstanding) I do hand-edit a lot of complex .ks files (like gmm.ks and adbench-lstm.ks) and the sugar for multiple bindings saves a lot of closing parens that would otherwise be awkward.

awf commented 3 years ago

So,

  1. (tuple a b c) constructs. But up here in syntax land, tuple is not distinguishable from any other "n-arg" function. From up here, it's a function call, not a keyword. There's also (vec 1 2 3), which is a function call.
  2. Too many close parens... I don't buy that too much. I could see an objection to too many lets,, but when pretty printing files like gmm and bert, I found I was preferring to see the chain of lets, and the closing parens are in my view less onerous than typing out the let. Most LISP style guides say something like Place all trailing parentheses on a single line, which leads to code like this, which I find quite readable.
    ; This example calculates (the number of positive elements, and their product) in a vector.
    (def fold_example (Tuple Integer Float) (v : Vec Float)
     (fold (lam (s_vi : Tuple (Tuple Integer Float) Float)
                (let ((s vi) s_vi)
                (let ((s_num s_prod) s)
                  (if (gt vi 0.0) 
                    (tuple (add s_num 1) (mul s_prod vi))
                    (tuple s_num s_prod)))))
           (tuple 0 1.0)
           v))

    As it happens, the above is from syntax-primer.ks, because when updating the Python parser to take tuple-bound lets, I resolved the ambiguity using method 3.