[WIP] Store Syntax.node children in array

Kha commented 5 years ago

I've done a first pass through the frontend, which could certainly be improved in some places. Unfortunately, it currently fails with an assertion violation from Array.iterateAux. I haven't taken a close look yet, but it looks like partial leads to some confusion of arguments.

leodemoura commented 5 years ago

I haven't taken a close look yet, but it looks like partial leads to some confusion of arguments.

I fixed a similar bug yesterday. If you send me a small repro, I will investigate.

Kha commented 5 years ago

I'm not sure it's even the same issue, but the following lines crash the array_test on this branch:

modified   tests/compiler/array_test.lean
@@ -21,4 +21,6 @@ do
  IO.println (toString a2),
  let a2 := a.pop,
  IO.println a2,
+ a3 ← a.mmap (λ x, pure x),
+ IO.println a3,
  pure 0

leodemoura commented 5 years ago

I'm not sure it's even the same issue, but the following lines crash the array_test on this branch:

Are you sure you have the latest version? array_test.lean is working for me. I fixed a very similar bug yesterday.

Kha commented 5 years ago

It doesn't work for me :( . I've rebase this branch and added the changed test.

leodemoura commented 5 years ago

It doesn't work for me :( . I've rebase this branch and added the changed test.

My mistake, I missed your changes to array_test.lean. I managed to reproduce the crash. I will investigate.

leodemoura commented 5 years ago

I found the issue. There is a mismatch between @miterateAux._main (the definition sent to the kernel) and @miterateAux._main._unsafe_rec (the definition sent to the compiler).

#check @miterateAux._main
#check @miterateAux._main._unsafe_rec

We use a function mk_aux_definition to add them to the environment. This function captures all arguments defined in the surrounding context. Before, I added partial def, the compilation strategy would never use extra arguments from the surrounding context. Now, this is not true anymore. In the example above, partial_rec will add [Inhabited \beta] to justify the base case :(

leodemoura commented 5 years ago

I pushed a fix for the _unsafe_rec issue. It should fix the problem. I will test more later. I have to go to a meeting now.

Kha commented 5 years ago

The branch is working now. The timings are indistinguishable from master.

leodemoura commented 5 years ago

The branch is working now. The timings indistinguishable from master.

I think core.lean is not good for benchmarking. First, there are too many components being executed, and many of them are implemented in C++. We already have many components written in Lean, and non-trivial interactions between them. Another problem is that the compiler is also quite new, and often generates bad code and/or something that we did not expect. My fear is that as soon as we have the whole frontend in Lean, the performance will be bad, and it will be very hard to spot where the problem is, and how to fix it without major refactoring. Note that, we have already found and fixed a few performance and memory consumption issues, but they were not easy to find. Artificial benchmarks generators (e.g., https://github.com/leanprover/lean4/blob/master/tests/playground/gen.lean) are great for spotting performance problems. Recall that we found memory leaks and exposed issues with the coroutine abstractions using it. We should create new generators that exercise different components of the new frontend (e.g., expander). In the meantime, could you please push the new array primitives you added to master. I want to profile array manipulating programs written in Lean, and make sure the compiler is producing descent code.

leodemoura commented 5 years ago

The views are still being created in this branch, right? What about the 0-cost views we have discussed at Zulip? Here is the relevant fragment from Zulip

It would be great if we could use this thread to describe alternative designs and motivate the current one. I will start by describing an idiom we should avoid. Suppose we have a constructor

node : Kind -> List Syntax -> Syntax
and we write patterns such as

match n with
| node kind1 [c1, c2, c3, c4] := ...
| node kind2 [c1, c2] := ...
| node kind3 [c1, c2, c3, c4, c5] := ...
| ...

The generated code will be bloated with many nested List.casesOn. Another problem is that we have many levels of indirection. We can avoid the problem by using Array Syntax instead of List Syntax. If we use Arrays we can't easily extract the children, but we can write 0-cost views such as

@[inline] def withKind1Args {a : Type} (args : Array Syntax) (fn : Syntax -> Syntax -> Syntax -> Syntax -> a) : a :=
fn (args.get 0) (args.get 1) (args.get 2) (args.get 3)

If the default value for Syntax is Syntax.missing, then this approach should work well, and we can write the match above as:

match n with
| node kind1 as := withKind1Args as $ fun c1 c2 c3 c4, ...
| node kind2 as := withKind2Args as $ fun c1 c2, ...
| node kind3 as := withKind3Args as $ fun c1 c2 c3 c4 c5, ...
| ...

This is way faster than the previous approach using Lists. We can make it even faster by using a new primitive Tuple type. Tuples would be like arrays but without support for increasing their size dynamically. That is, we don't have the m_capacity field and save 8 bytes in the object header.

leodemoura commented 5 years ago

BTW, I have modified List.toArray in the master branch. Now, the resultant array size and capacity will be equal to the List length. For example, List.toArray [1, ..., 7] produces an array with capacity 7 instead of 14.

Kha commented 5 years ago

I've pushed some very simple expander tests with and without views to master and this branch. This branch is just slightly slower without views, and about 10% slower with views, most probably because the many combinator still uses lists as tagged above. Depending on the test, the views are ~10%-50% slower than no views.

Kha commented 5 years ago

What about the 0-cost views we have discussed at Zulip?

I agree that 0-cost views are probably the only practical solution, but I'd really like to keep the named structure fields for my sanity. I believe we should be able to generate the same 0-cost code with view structures if we make the following adjustments:

views are shallow: all fields are of type Syntax
HasView instances should be inlined; this should not be an issue anymore with shallow views
viewed data should not escape the current function, thus compiling down to zero-allocation Array.gets (though join points may (currently) prevent this)
review should only be called on constructor terms, thus removing the allocation of the view

Kha commented 5 years ago

The shallow view would still lose quite a bit of type information compared to the current views. We've previously discussed a more lightweight model where fields could be typed by a type like

-- "syntax with a view"
-- NOTE: does not assert the actual kind of its content, which may be `Syntax.missing`
def VSyntax (k : NodeKind) := Syntax
@[inline] def VSyntax.view {k} [HasView k v] : VSyntax k -> v := HasView.view
...

Now we have the same amount of typedness as before, though certainly many more view and review calls (textually), and we sadly lose the ability to do nested matches on views.

leodemoura commented 5 years ago

I've pushed some very simple expander tests with and without views to master and this branch. This branch is just slightly slower without views, and about 10% slower with views, most probably because the many combinator still uses lists as tagged above. Depending on the test, the views are ~10%-50% slower than no views.

I am playing with the tests. Here are some preliminatry comments.

A hand-written "to node2" transformation is more than 10x faster:

partial def toNode2 : Syntax → Syntax
| (Syntax.rawNode ⟨kind, as, scopes⟩) := Syntax.rawNode ⟨⟨`node2⟩, as.map toNode2, scopes⟩
| other                               := other

The view based version doesn't follow the view-review idiom used at expander.lean.

def testSimple := test [(`node, λ stx, pure $ Syntax.mkNode ⟨`node2⟩ $ let v := view node stx in v.children)]

I found the generated code quite confusing too. It is not what I expected when I saw the file. Here is the generated code for node.HasView.

>> node.HasView'._lambda_1
λ (_x_1 : _obj),
let _x_2 : _obj := Syntax.asNode._main _x_1
in @Option.casesOn _x_2 (let _x_3 : _obj := node.HasView'._lambda_1._closed_1 in _x_3)
   (let _x_4 : _obj := _proj.0 _x_2,
        _ : _void := _inc _x_4,
        _ : _void := _dec _x_2,
        _x_5 : _obj := _proj.1 _x_4,
        _ : _void := _inc _x_5,
        _ : _void := _dec _x_4
    in @List.casesOn _x_5 (let _x_6 : _obj := node.HasView'._lambda_1._closed_1 in _x_6)
         (let _x_7 : _obj := _proj.0 _x_5,
              _ : _void := _inc _x_7,
              _ : _void := _dec _x_5,
              _x_8 : _obj := Syntax.asNode._main _x_7
          in @Option.casesOn _x_8
               (let _x_9 : _obj := Lean.Parser.detailIdentPartEscaped.HasView'._lambda_2._closed_2 in _x_9)
               (let _x_10 : _obj := _proj.0 _x_8,
                    _ : _void := _inc _x_10,
                    _ : _void := _dec _x_8,
                    _x_11 : _obj := _proj.1 _x_10,
                    _ : _void := _inc _x_11,
                    _ : _void := _dec _x_10,
                    _x_12 : _obj := List.map._main._at.Lean.Parser.identUnivSpec.HasView'._spec_1 _x_11
                in _x_12)))

The view version creates bigger trees since review leaf {} creates an extra node containing foo. Not sure whether it impacts the performance numbers or not.

leodemoura commented 5 years ago

I agree that 0-cost views are probably the only practical solution, but I'd really like to keep the named structure fields for my sanity.

People have been writing Lisp macros for a long time without using views. I agree their syntax object is simpler since they do not have tokens, but we can easily create a 0-cost view that does ignore tokens. Example, for an if-then-else node, we create

@[inline] def withIfArgs {a : Type} (args : Array Syntax) (fn : Syntax -> Syntax -> Syntax  -> a) : a :=
fn (args.get 1) (args.get 3) (args.get 5)

We may also create the "update" function that "reuses" the tokens from a given source.

@[inline] def updateIf {a : Type} (src : Syntax) (c t e : Syntax) : Syntax :=
Syntax.rawNode src.Kind [src.get 0, c, src.get 2, t, src.get 4, e]

We can automatically generate these simple functions.

leodemoura commented 5 years ago

I believe we should be able to generate the same 0-cost code with view structures if we make the following adjustments:

I like the low tech solution I described in the previous message.

It doesn't rely on any fancy optimization in the compiler.
It doesn't rely on type classes, and we don't have to fix the type class resolution procedure to get descent compilation times.

leodemoura commented 5 years ago

The shallow view would still lose quite a bit of type information compared to the current views.

I am fine with this. For me, it is much more important to make sure I can predict the performance/shape of the generated code. I feel like that I can only achieve this with the current approach by using trace.compiler, and I get surprised over and over again with unexpected hidden costs.

leodemoura commented 5 years ago

I pushed a proof of concept for the low tech views. https://github.com/leanprover/lean4/blob/master/tests/playground/lowtech_expander.lean Here is the transformer for if (not c) then t else e ==> if c then e else t.

def flipIf : Transformer :=
λ n, withIf n $ λ c t e,
 isNot c (pure n.val) $ λ c',
 pure (updateIf n c' e t)

and here is the generated code:

λ (_x_1 : _obj) (_x_2 : UInt8),
  let _x_3 : _obj := _proj.1 _x_1,
      _ : _void := _inc _x_3,
      _x_4 : _obj := Syntax.Inhabited,
      _x_5 : _obj := 1,
      _x_6 : _obj := Array.get ◾ _x_4 _x_3 _x_5,
      _ : _void := _inc _x_1,
      _x_7 : _obj := Option._cnstr.1.0.0 _x_1
  in @Syntax.casesOn _x_6
       (let _ : _void := _dec _x_1, _ : _void := _dec _x_3, _x_8 : _obj := Except._cnstr.1.0.0 _x_7 in _x_8)
       (let _x_9 : _obj := _proj.0 _x_6,
            _ : _void := _inc _x_9,
            _x_10 : _obj := _proj.1 _x_6,
            _ : _void := _inc _x_10,
            _ : _void := _dec _x_6,
            _x_11 : _obj := notKind,
            _x_12 : UInt8 := Lean.Name.decEq _x_9 _x_11,
            _ : _void := _dec _x_9
        in @Bool.casesOn _x_12
             (let _ : _void := _dec _x_1,
                  _ : _void := _dec _x_10,
                  _ : _void := _dec _x_3,
                  _x_13 : _obj := Except._cnstr.1.0.0 _x_7
              in _x_13)
             (let _ : _void := _dec _x_7,
                  _x_14 : _obj := _proj.0 _x_1,
                  _ : _void := _inc _x_14,
                  _x_15 : _obj := _proj.2 _x_1,
                  _ : _void := _inc _x_15,
                  _x_16 : _obj := _reset.3 _x_1,
                  _x_17 : _obj := 3,
                  _x_18 : _obj := Array.get ◾ _x_4 _x_3 _x_17,
                  _x_19 : _obj := 5,
                  _x_20 : _obj := Array.get ◾ _x_4 _x_3 _x_19,
                  _x_21 : _obj := Array.get ◾ _x_4 _x_10 _x_5,
                  _ : _void := _dec _x_10,
                  _x_22 : _obj := Array.set ◾ _x_3 _x_5 _x_21,
                  _x_23 : _obj := Array.set ◾ _x_22 _x_17 _x_20,
                  _x_24 : _obj := Array.set ◾ _x_23 _x_19 _x_18,
                  _x_25 : _obj := _reuse.1.0.0.0 _x_16 _x_14 _x_24 _x_15,
                  _x_26 : _obj := Option._cnstr.1.0.0 _x_25,
                  _x_27 : _obj := Except._cnstr.1.0.0 _x_26
              in _x_27))
       (let _ : _void := _dec _x_6,
            _ : _void := _dec _x_1,
            _ : _void := _dec _x_3,
            _x_28 : _obj := Except._cnstr.1.0.0 _x_7
        in _x_28)
       (let _ : _void := _dec _x_6,
            _ : _void := _dec _x_1,
            _ : _void := _dec _x_3,
            _x_29 : _obj := Except._cnstr.1.0.0 _x_7
        in _x_29)

leodemoura commented 5 years ago

I pushed a small experiment where I use the new [init] attribute to generate unique Nat values for identifying SyntaxNodeKinds: https://github.com/leanprover/lean4/commit/8004fcf2f1fce958933786c886e3b82bb276959e There are two benefits:

We can use an array to map SyntaxNodeKind to transformers and elaborators.
Testing whether two kinds are equal or not is fast.

There is one disadvantage: we cannot store Syntax objects directly into .olean files. The issue is that the SyntaxNodeKinds may be different. However, we can fix this by having a fixKinds procedure that uses the name field to fix the id field. This trick is equivalent to the work performed by the linker in a compiler. It is also similar to the approach we wanted to use for storing low-level bytecode. Here is the proof of concept https://github.com/leanprover/lean4/commit/b75706ffff0fd61a4e127a00ba731d375e159827

Kha commented 5 years ago

Thanks, this is very helpful. I'm still a little uneasy when I think of how more complex code like this would be written in this style, and in particular how many "type" confusions one would inevitably encounter. How would one even debug which with combinator out of 5 nested ones failed to match?

leodemoura commented 5 years ago

I'm still a little uneasy when I think of how more complex code like this would be written in this style, and in particular how many "type" confusions one would inevitably encounter. How would one even debug which with combinator out of 5 nested ones failed to match?

I wrote the skeleton for letTransformer: https://github.com/leanprover/lean4/commit/d73d262aa1f54f0aefd34556b021f2ec94d23b0f

I have two versions: 1- Using a <?> combinator, and Syntax.case "kind-matching" function. The idiom would be: given a n : Syntax,

(n.case kind_1 $ fun (n : SyntaxNode), withKind1 n $ fun <args>,  <code for this case>)
<?>
...
<?>
(n.case kind_m $ fun (n : SyntaxNode), withKindM n $ fun <args>, <code for this case>)

The generated code is reasonable, but it relies on may compiler transformations to get efficient code. For example, I am currently investigating a missing optimization exposed by this idiom.

2- Using a n.isNode and if-then-else chains. Given a n : Syntax we would write:

n.isNode $ fun (k : SyntaxNodeKind) (n : SyntaxNode), 
if k == kind_1 then withKind1 n $ fun <args>, <code for this case>
else if k == kind_2 then withKind2 n $ fun <args>, <code for this case>
...
else withKindM n $ fun <args>, <code for this case>

leodemoura commented 5 years ago

How would one even debug which with combinator out of 5 nested ones failed to match?

I am not sure whether this is a problem or not. The code above looks reasonable to me, and I would debug it like any other piece of code. Are you concerned with mistakes such as: type in a letLhsId-node is optional, but a macro writer may forget this fact and treat it as regular term node? I am not sure this is a big problem. Racket is untyped, and it didn't prevent anybody from writing very complicated macros.

If debugging the transformer becomes a big problem, we can add a "trace" to TransformerM. If we use EState instead of ExceptT, we will probably produce more efficient code anyway.

leodemoura commented 5 years ago

I inspected the generated code, and the approach using if-then-else is much more compiler friendly than the one using the <?> combinator. The code pattern

(n.case kind1 ...)
<?>
(n.case kind2 ...)

generates something like the following

let result_1 := match n with <code for first case>
in match result_1 with
    | Except.ok none := match n with <code for second case>
    | other := ...

So, each <?> combinator application forces the compiler to perform a float match transformation. Since each case is probably not a small piece of code, the compiler generates a new join point for each <?> application.

Kha commented 5 years ago

One aspect we should not forget is how we want such code to look eventually, e.g. using syntax quotations. Should we use them everywhere possible? What do they need to look like to achieve that? Should we implement them using the current views and then use them to switch to low-tech views, or should we switch before that?

Kha commented 5 years ago

It's not completely thought out yet, but I could imagine the code to look something like this:

def letTransformer' : Transformer :=
λ stx, syntax_match stx with
  | `(let $n:ident : $t := $_ in $_) := noExpansion
  | `(let $n:ident := $v in $b) := pure `(let $n:ident : _ := $v in $b)
  | `(let $n:ident $($b:binder)+ $t:optType := $v in $b) := noExpansion -- TODO
  | `(let $p:pattern := $v in $b) := noExpansion --TODO

which may compile down to something very close to your code given a sufficiently smart syntax_match transformer... (except for SyntaxNode, which I don't feel is the right abstraction level for syntax quotations)?

For reference, this is what I could imagine the full code to look like:

def getOptType (stx : Syntax) : Syntax :=
syntax_match stx with
| `(optType|) := `(_)
| `(optType|: $t) := t

def letTransformer' : Transformer :=
λ stx, syntax_match stx with
  | `(let $n:ident : $t := $_) := noExpansion
  | `(let $n:ident := $v) := pure `(let $n:ident : _ := $v)
  | `(let $n:ident $($b:binder)+ $t:optType := $v) :=
    pure `(let $n:ident : Π $($b:binder)+, $(getOptType t) := λ $($b:binder)+, $v)
  | `(let $p:pattern := $v in $b) := pure `(match $v with $p:pattern := $b)

leodemoura commented 5 years ago

I agree that a syntax_match macro would be awesome. That being said, it must have a clear runtime cost model and should be compatible with memory reuse. Note that the two idioms above will reuse the memory cells for the Syntax and the array (if not shared).

leodemoura commented 5 years ago

BTW, we will need something similar for Expr. The Lean 3 approach using match is too expensive. Recall the open issue we have there. So, I am wondering whether syntax_match macro is an idiom that we can reuse for different types: Expr, Syntax

Kha commented 5 years ago

Yes, we should be able to reuse the model for Expr. The biggest difference is that we don't need to propagate macro scopes for Expr, i.e. it should be even simpler.

Kha commented 4 years ago

Implemented by the new new parser

leanprover / lean4

[WIP] Store Syntax.node children in array #12