IAssassin code generation bug when a lambda expression closes over 31 free variables

WillClinger commented 9 years ago

In current IAssassin versions of Larceny:

(begin
 (define (foo a b c d e f g h i j k l m n o p q r s t u v w x y z
              a27 a28 a29 a30 a31)
   (lambda ()
     (list a b c d e f g h i j k l m n o p q r s t u v w x y z
           a27 a28 a29 a30 a31)))

 ((foo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
       27 28 29 30 31)))

evaluates to

(1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16
 17  18  19  20  21  22  23  24  25  26  27  28  29  30 (31))

When one more argument is added to both foo and to the lambda expression, the output is correct.

This is a bug in the code generator, but I'm not sure whether it's a bug in Twobit or in the IAssassin back end. Twobit produces this MacScheme machine code:

        .entry      0,#t
        save/stores 32,(0),(0)
        lambda      *,0
        setreg      3
        setglbl     foo
        const/setreg 31,3
        global      foo
        setstk      32
        const       1
        setstk      31
        const       2
        setstk      30
        const       3
        setstk      29
        const       4
        setstk      28
        const       5
        setstk      27
        const       6
        setstk      26
        const       7
        setstk      25
        const       8
        setstk      24
        const       9
        setstk      23
        const       10
        setstk      22
        const       11
        setstk      21
        const       12
        setstk      20
        const       13
        setstk      19
        const       14
        setstk      18
        const       15
        setstk      17
        const       16
        setstk      16
        const       17
        setstk      15
        const       18
        setstk      14
        const       19
        setstk      13
        const       20
        setstk      12
        const       21
        setstk      11
        const       22
        setstk      10
        const       23
        setstk      9
        const       24
        setstk      8
        const       25
        setstk      7
        const       26
        setstk      6
        const       27
        setstk      5
        const       28
        setstk      4
        const       29
        setstk      3
        const       30
        setstk      2
        reg         3,.T126
        setstk      1
        const       ()
        setreg      31
        stack       1
        op2         cons,31
        setreg      31
        load        30,2
        load        29,3
        load        28,4
        load        27,5
        load        26,6
        load        25,7
        load        24,8
        load        23,9
        load        22,10
        load        21,11
        load        20,12
        load        19,13
        load        18,14
        load        17,15
        load        16,16
        load        15,17
        load        14,18
        load        13,19
        load        12,20
        load        11,21
        load        10,22
        load        9,23
        load        8,24
        load        7,25
        load        6,26
        load        5,27
        load        4,28
        load        3,29
        load        2,30
        load        1,31
        stack       32
        setrtn/invoke 31
        .align      4
L1006
        .cont       
        load        0,0
        setreg      2
        pop         32
        invoke      0
        .end        
        .entry      1,#f
        .proc       
        args=       31
        save/stores 32,(0 1),(0 2)
        movereg     31,1
        reg/op1/setreg car,1,ebx
        setstk      1
        reg/op1/setreg cdr,1,ebx
        load        1,1
        stack       2
        setstk      32
        reg         2,|.b\|1|
        setstk      31
        reg         3,|.c\|1|
        setstk      30
        reg         4,|.d\|1|
        setstk      29
        reg         5,|.e\|1|
        setstk      28
        reg         6,|.f\|1|
        setstk      27
        reg         7,|.g\|1|
        setstk      26
        reg         8,|.h\|1|
        setstk      25
        reg         9,|.i\|1|
        setstk      24
        reg         10,|.j\|1|
        setstk      23
        reg         11,|.k\|1|
        setstk      22
        reg         12,|.l\|1|
        setstk      21
        reg         13,|.m\|1|
        setstk      20
        reg         14,|.n\|1|
        setstk      19
        reg         15,|.o\|1|
        setstk      18
        reg         16,|.p\|1|
        setstk      17
        reg         17,|.q\|1|
        setstk      16
        reg         18,|.r\|1|
        setstk      15
        reg         19,|.s\|1|
        setstk      14
        reg         20,|.t\|1|
        setstk      13
        reg         21,|.u\|1|
        setstk      12
        reg         22,|.v\|1|
        setstk      11
        reg         23,|.w\|1|
        setstk      10
        reg         24,|.x\|1|
        setstk      9
        reg         25,|.y\|1|
        setstk      8
        reg         26,|.z\|1|
        setstk      7
        reg         27,|.a27\|1|
        setstk      6
        reg         28,|.a28\|1|
        setstk      5
        reg         29,|.a29\|1|
        setstk      4
        reg         30,|.a30\|1|
        setstk      3
        reg         1,.T93
        setstk      1
        const       ()
        setreg      31
        stack       1
        op2         cons,31
        setreg      31
        load        30,3
        load        29,4
        load        28,5
        load        27,6
        load        26,7
        load        25,8
        load        24,9
        load        23,10
        load        22,11
        load        21,12
        load        20,13
        load        19,14
        load        18,15
        load        17,16
        load        16,17
        load        15,18
        load        14,19
        load        13,20
        load        12,21
        load        11,22
        load        10,23
        load        9,24
        load        8,25
        load        7,26
        load        6,27
        load        5,28
        load        4,29
        load        3,30
        load        2,31
        load        1,32
        pop         32
        .align      4
L1001
        .proc       
        .proc-doc   #(foo #f 31 #f #f (a b c d e f g h i j k l m n o p q r s t u v w x y z a27 a28 a29 a30 a31))
        save/stores 2,(0 1),(0 2)
        movereg     31,1
        reg/op1/setreg car,1,ebx
        setstk      1
        reg/op1/setreg cdr,1,ebx
        const       ()
        setreg      31
        stack       1
        op2         cons,31
        setreg      31
        load        1,2
        lambda      *,31
        pop         2
        return      
        .end        
        .entry      2,#f
        .proc       
        args=       0
        const/setreg (),3
        lexical     0,31,|.a31\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,30,|.a30\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,29,|.a29\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,28,|.a28\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,27,|.a27\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,26,|.z\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,25,|.y\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,24,|.x\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,23,|.w\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,22,|.v\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,21,|.u\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,20,|.t\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,19,|.s\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,18,|.r\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,17,|.q\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,16,|.p\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,15,|.o\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,14,|.n\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,13,|.m\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,12,|.l\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,11,|.k\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,10,|.j\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,9,|.i\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,8,|.h\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,7,|.g\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,6,|.f\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,5,|.e\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,4,|.d\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,3,|.c\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,2,|.b\|3|
        reg/op2/setreg cons,ebx,3,3
        lexical     0,1,|.a\|3|
        op2         cons,3
        return      
        .end

Explanation:

With the current IAssassin (IA32) version of Larceny, the Twobit compiler generates code for an abstract MacScheme machine with 32 registers, of which 31 are used for arguments. The lambda instruction creates a procedure that closes over the values in some number of those registers. When closing over less than 31 values, the semantics is straightforward. When closing over 32 or more, reg31 holds a list of all values to close over past the first 30 (not counting the contents of reg0). When closing over exactly 31 registers, there are two reasonable semantics. Evidently Twobit is using one of the two reasonable semantics, but the IAssassin assembler is using the other.

I don't know which is right. I'll have to consult the MacScheme machine documentation.

pnkfelix commented 9 years ago

Its quite possible that I made a mistake in my reading of:

http://www.ccs.neu.edu/home/lth/larceny/notes/note13-malcode.html

(Update: an earlier version of this comment had some mistakes its reasoning, namely a fencepost error in my thinking about what REGr was supposed to denote.)

In particular, we have:

lambda  x,n,doc

        If n < r, loads RESULT with the procedure formed from the 
        code x and the values of registers REG0-REGn.  If n >= r,
        loads RESULT with the procedure formed from the values of
        registers REG0-REG{r-1} and the first n-r+1 values taken
        from the list in REGr.  The code x consists of a bytevector
        of pure code and a vector of constants.  The documentation
        is used only for debugging.

So, okay, there's a threshold value r. That sounds like it could correspond to that magic number 31 in @WillClinger's description. From the text above, it seems like it should corresponds to the number of (software) registers on the target, since it talks about loading a list into REGr, which would imply to me that r should be 31.

On further investigation, we find:

Implementation-specific parameters.

The highest numbered general register is REGr, where r = R-1.

See the end of this file for implementation notes, notably the 
values of R, MINOFFSET, MAXOFFSET, MINIMM, MAXIMM, MAXSLOT, and MAXRIB.

So, hmm, the value of r is actually R-1 according to the docs

But then what is R at the bottom of the file:

Implementation notes.

Minimal parameter values.

    R                  ?    (Certainly at least 3, or op3 won't work)
...

SPARC implementation.

    R                 32
Standard-C implementation.

    R                 32
...

This makes it seem like R is the number of software registers, while r is R - 1 (i.e. the index of the final software register, i.e. the magic number 31).

(Second update: still more errors corrected.)

Anyway, looking again at the documentation for the lambda macscheme instruction, note that the documented ranges are n < r for the non-list case and n >= r for the list case.

~~So that implies to me that IAssassin is following the documented semantics, even though that may not be what Twobit actually implements.~~ (That was wrong.)

IAssassin is not following the documented semantics; it appears to be assuming that when n == r that it should copy the value directly from REGr, rather than reading it as the first element of a list stored in REGr. That explains the semantics observed in this ticket: we are creating the singleton list and storing it in REGr, but then IAssassin is treating that list itself as the value of the final free variable.

(Incidentally, is there are good reason to use n < r/n >= r rather than n <= r/n > r ? It seems like the latter semantics would avoid a cons in the case where n == r... but maybe the former semantics has advantages elsewhere.)

(I would be curious to know if this same bug exhibits itself on the Petit/C or Petit/NASM backend, especially since the IAssassin backend was largely directly ported from the Petit/NASM backend, apart from a few opcodes that needed special treatment due to its non-Petit nature.)

WillClinger commented 9 years ago

Fixed by changeset 9241f06ec697cf729fee3eca212e84a4a02920e8

IAssassin was incorrect. Comparing the IAssassin code to the documentation is confusing because the documentation uses r to mean R-1, while the IAssassin code uses r to mean something completely different: the operand n of the $lambda instruction.

The bug does not show up in Petit Larceny (Standard-C) on Linux. I haven't checked the Petit/NASM backend, so I'll leave this ticket open until that's been checked.

WillClinger commented 9 years ago

By the way, I ran across this bug while adding R7RS define-library support to lib/R6RS/r6rs-expander.sch. Adding or subtracting print statements in unexecuted code could cause an unrelated part of the expander to try to call a value cell instead of calling the procedure held within that cell.

So I suspected a compiler bug, but my test case contained over 2000 lines of code (excluding comments). I found the bug by examining a 26416-line listing of the generated MacScheme machine assembly code. That was kinda fun.

pnkfelix commented 9 years ago

@WillClinger ah, I see where I went wrong in my analysis of the documentation above.

WillClinger commented 9 years ago

Felix wrote:

After all, is there are good reason to use n < r/n >= r rather than n <= r/n > r ? It seems like the latter semantics would avoid a cons in the case where n == r...

The reason is simplicity of code generation. With the semantics Lars and I documented in 1999, there are only two cases: n < r and n >= r. Avoiding one cons in the n = r case would add a third case to some parts of Twobit. In particular: the code generated for procedure calls never passes a bare argument in the last register; for procedure calls with r=R-1 or more arguments, that last register always contains a list of arguments. The code generated for lambda expressions can therefore reuse the part of the code generator that evaluates arguments for a procedure call.

lars-t-hansen commented 9 years ago

The Fence back-end gets this right, probably because it was forked off the SPARC back-end: src/asm/Fence/pass5p2.sch:emit-init-procedure-slots. (The ARM back-end, built on Fence, in any case only has hardware registers so the bug would have shown up in real code pretty quickly.)

WillClinger commented 9 years ago

The IAssassin code generation bug has been fixed, so leaving this ticket open is misleading. If Petit/Nasm has a similar bug, we should open a new ticket for that.

I'm afraid I'll forget to check, so I won't close this ticket right now. I am, however, downgrading it to minor and moving the milestone.

WillClinger commented 7 years ago

The IAssassin bug has been fixed, and issue #758 has been created to remind us to check for this bug in the Petit/Nasm version in case we go back to using that version, so I'm closing this.

larcenists / larceny

IAssassin code generation bug when a lambda expression closes over 31 free variables #688