hughperman / pure-lang

Automatically exported from code.google.com/p/pure-lang
0 stars 0 forks source link

Strange segfault #87

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Compiling a specific line of code that seems harmless---in fact the structure 
of it is:

   case something of
      res::int
           = complex expression if res == 1; // works fine
           = something else;
           = same complex expression as before if res == 5; // this is the triggering line
    end;

keeps giving me a segfault...but only when I'm "using dict,system". This is 
bizarre, the trigger code (a line just added to trees23.pure) has nothing to do 
with the system library, and it compiles just fine earlier in the case 
expression, and in fact the same structure is repeated several times in the 
file and the others all compile fine. Doing "using system,dict" instead doesn't 
crash, but then the interpreter crashes the first time it tries to display any 
result.

I don't know how to produce a minimal test case, that doesn't rely on the 
details of the library files I'm using right now. But I'm hoping from a 
backtrace and the error message you might be able to identify the issue, 
without having the specific setup to reproduce it.

Here's as far as I got towards finding a minimal case: in the context, the 
following line compiles fine:
       .. with last [] = if r1===nil then (bin l1 y1 c1) 0 ()
                      else (1);
               // other clauses for last
          end;

but this crashes:
       .. with last [] = if r1===nil then (bin l1 y1 c1) 0 ()
                      else (1 when
                        ten = 10;
                      end);
               // other clauses for last
          end;

Bizzare. As I said, many other occurrences of essentially the same pattern in 
this file give no trouble. I did retype the line from scratch several times in 
case there was some stray invisible character.

I reconfigured and rebuilt Pure using this configure line:
../configure --with-libgmp-prefix=/usr/local --enable-debug --without-elisp  
--prefix=/opt  CC=clang CXX=clang++

on FreeBSD 9.ish, x86_64.

Then with the following environment:
PURELIB=/usr/home/jim/repo/remote/pure-lang/pure/BUILD/../lib
PURE_STACK=24000
PURE_INCLUDE=/home/jim/dev/pure/unspoiled
PURE_ESCAPE=:

running: gdb /usr/home/jim/repo/remote/pure-lang/pure/BUILD/pure

Then at the gdb command line, typing
(gdb)  run -w -i -g --enable=list-opt --enable=trees23
gets me to the Pure interpreter. At the prompt I type:
> using dict, system;

that fails with the error:
Assertion failed: (act_env().xmap.find(xmap_key(tag, idx)) != 
act_env().xmap.end()), function vref, file ../interpreter.cc, line 15302.

  Program received signal SIGABRT, Aborted.

(When I ran with the non-debug build, I was instead getting:
  Segmentation fault: 11 (core dumped)
but running gdb on the pure.core file didn't seem to show anything useful.)

At this point I did a `bt` in gdb. The result is attached.

Original issue reported on code.google.com by dubious...@gmail.com on 6 Aug 2012 at 2:27

Attachments:

GoogleCodeExporter commented 8 years ago
It looks like some stack is just getting too deep or something. Switching the 
order of this triggering code block, call it Block2 with another one which was 
earlier compiling fine, call that one Block1. Anyway, switching the order has 
the result that now Block2 compiles fine, but now it's Block1 that has to be 
commented out to prevent the creashing.

Original comment by dubious...@gmail.com on 6 Aug 2012 at 2:40

GoogleCodeExporter commented 8 years ago
First, the third rule in your case statement should never be used anyway if the 
second rule doesn't have a guard on it.

Second, PURE_STACK=24000 won't help you much because the value is most likely 
much larger than your system's default C stack size. Try a smaller value like 
4096 and see whether that gives you an orderly stack_fault exception.

Third, I see that you used clang to compile Pure. Can you reproduce the bug 
when compiling Pure with gcc?

Other than that, I'd really need a reasonably small test case so that I can 
reproduce this bug. The gdb backtrace looks like there might be a bug in the 
code generator somewhere, but it's hard to tell without knowing exactly which 
Pure code triggers this. If you can't come up with a small witness, I'd at 
least need the set of library scripts that you're using and detailed 
instructions and/or a test script showing how to trigger the bug.

Original comment by aggraef@gmail.com on 6 Aug 2012 at 8:12

GoogleCodeExporter commented 8 years ago
Guards are present on all the real clauses. Also none of this is ever getting 
executed, it crashes just on compilation. Stack value of 4096 doesn't change 
the behavior at all. (Is that setting also honored while compiling, by the way?)

Results building with gcc 4.2.1 with --enable-debug seem to be the same. The 
backtrace looks similar:
Assertion failed: (act_env().xmap.find(xmap_key(tag, idx)) != 
act_env().xmap.end()), function vref, file ../interpreter.cc, line 15302.

Program received signal SIGABRT, Aborted.
[Switching to Thread 804407400 (LWP 103190/pure)]
0x0000000803c1133c in thr_kill () from /lib/libc.so.7
(gdb) bt
#0  0x0000000803c1133c in thr_kill () from /lib/libc.so.7
#1  0x0000000803ca624b in abort () from /lib/libc.so.7
#2  0x0000000803c8f9d5 in __assert () from /lib/libc.so.7
#3  0x00000008009d6734 in interpreter::vref (this=0x7fffffffb8f0, tag=909, 
idx=2 '\002', p=@0x7ffffffe0920) at ../interpreter.cc:15302
#4  0x00000008009f8558 in interpreter::codegen (this=0x7fffffffb8f0, 
x=@0x7ffffffe1720, quote=false) at ../interpreter.cc:14670
#5  0x00000008009fa378 in interpreter::codegen (this=0x7fffffffb8f0, 
x=@0x7ffffffe21e0, quote=false) at ../interpreter.cc:14829
#6  0x00000008009fa312 in interpreter::codegen (this=0x7fffffffb8f0, 
x=@0x7ffffffe2cb0, quote=false) at ../interpreter.cc:14829
#7  0x00000008009fa312 in interpreter::codegen (this=0x7fffffffb8f0, 
x=@0x7ffffffe3780, quote=false) at ../interpreter.cc:14829
#8  0x00000008009fa312 in interpreter::codegen (this=0x7fffffffb8f0, 
x=@0x7ffffffe4250, quote=false) at ../interpreter.cc:14829
#9  0x00000008009fa312 in interpreter::codegen (this=0x7fffffffb8f0, 
x=@0x7ffffffe4d20, quote=false) at ../interpreter.cc:14829
#10 0x00000008009fa312 in interpreter::codegen (this=0x7fffffffb8f0, 
x=@0x7ffffffe5250, quote=false) at ../interpreter.cc:14829
#11 0x0000000800a00495 in interpreter::toplevel_codegen (this=0x7fffffffb8f0, 
x=@0x7ffffffe55f0, rp=0x806c26340) at ../interpreter.cc:14197
#12 0x0000000800a01380 in interpreter::try_rules (this=0x7fffffffb8f0, 
pm=0x806c689c0, s=0x806c81670, failedbb=0x8080de420, reduced=@0x7ffffffe7ca0, 
    tmps=@0x7ffffffe7a80) at ../interpreter.cc:16741
...

I'll work on trying to get a minimal test case.

Original comment by dubious...@gmail.com on 6 Aug 2012 at 8:43

GoogleCodeExporter commented 8 years ago
Yeah, it really looks like it's a bug in the code generator. A minimal test 
case will be very helpful, but I understand that it may be difficult to produce 
in this case. It should be good enough to have the set of library scripts that 
you used along with instructions how to reproduce the bug.

Original comment by aggraef@gmail.com on 6 Aug 2012 at 9:20

GoogleCodeExporter commented 8 years ago
Here's a pretty minimal test case. Running the following in the interpreter, 
using the libraries from current hg tip (so none of my other local library 
changes), still gives me the segfault, before ever returning to the prompt. 
Commenting out either of block1 or block2 makes the code compile fine. Note 
that when I have this in a separate file and use it, the crash isn't triggered 
until the interpreter tries to display some result. So I was doing ``use 
badfile; 0;`` to test for the crash.

This dummy code isn't supposed to make sense. But it should be legal, and even 
if it weren't it shouldn't crash, right?

public kons g1 g2;

bar y = case y of
      res::int
          // call this block1
        = foo y with
            foo _ = snag when
                        kons snag = g1 y;
                      end;
          end if res == 0;
          // call this block2
        = foo y with
            foo _ = snag when
                        kons snag = g2 y;
                      end;
          end if res == 1;
    end;

Original comment by dubious...@gmail.com on 6 Aug 2012 at 9:34

GoogleCodeExporter commented 8 years ago
Also when entering that code directly into the interpreter, it seems I need to 
add a new line "0;" at the end to trigger the crash.

Original comment by dubious...@gmail.com on 6 Aug 2012 at 9:41

GoogleCodeExporter commented 8 years ago
Even more minimal:
$ ./run-pure --norc -n
Pure 0.56 (x86_64-unknown-freebsd9.0) Copyright (c) 2008-2012 by Albert Graef
(Type 'help' for help, 'help copying' for license information.)

> bar y = case y of 0 = foo y with foo y = x when [x] = y end end; 1 = baz y 
with baz y = x when [x] = y end end end;
> 0;
Segmentation fault: 11 (core dumped)

Original comment by dubious...@gmail.com on 6 Aug 2012 at 10:00

GoogleCodeExporter commented 8 years ago
Cool, many thanks for the short example! I can reproduce this, and I'll have a 
look at it asap.

Original comment by aggraef@gmail.com on 6 Aug 2012 at 10:17

GoogleCodeExporter commented 8 years ago
In case it's useful: appending a "when dummy = 0 end" to the end of either of 
the case rules (or both) makes everything good again. That is, this doesn't 
crash:
> bar y = case y of 0 = foo y with foo y = x when [x] = y end end when dummy = 
0 end; 1 = baz y with baz y = x when [x] = y end end end;

Original comment by dubious...@gmail.com on 6 Aug 2012 at 10:31

GoogleCodeExporter commented 8 years ago
Just for the record, I now boiled the bug witness down to:

bar y = case y of 0 = a with a = 0 end; 1 = b with b = 1 end end;
0;

This still craps out with the same assertion.

This is a tough one. It seems that the 'case' environment is to blame here. The 
problem is not really in the code generator but already in the frontend, more 
precisely in the FMap data structures needed to handle all the lambda lifting 
stuff.

Right now 'case' is handled analogous to a lambda there, so it lacks the 
subenvironments necessary to tell apart the local function bindings of the 
different case rules. (Wrapping up the function environments in an extra 'when' 
clause works around this, which explains the behaviour you mentioned in comment 
#9.) I'll probably have to handle the 'case' environment in a fashion similar 
to 'with' clauses to repair this defect.

Original comment by aggraef@gmail.com on 14 Aug 2012 at 9:07

GoogleCodeExporter commented 8 years ago
Well, the bug is in the code generator after all. The FMaps are created 
properly all right, but it seems that the code generator clobbers some of the 
traversal pointers during code generation, so that the traversal of the 
sub-FMaps in a 'case' environment gets messed up. This code has become a real 
mess, maybe I need to rewrite it.

Original comment by aggraef@gmail.com on 14 Aug 2012 at 11:35

GoogleCodeExporter commented 8 years ago
Scratch that, it seems that my original suspicion from comment #10 was right. 
For the code generator to work properly, each 'case' rule needs its own root in 
the FMap forest so that the call to FMap::select() in try_rules() does the 
right thing. This should be easy to fix, so please stay tuned...

Original comment by aggraef@gmail.com on 14 Aug 2012 at 12:04

GoogleCodeExporter commented 8 years ago
This issue was closed by revision a898740681a8.

Original comment by aggraef@gmail.com on 14 Aug 2012 at 12:27

GoogleCodeExporter commented 8 years ago
Ok, this should be fixed now, can you please give it a whirl?

Original comment by aggraef@gmail.com on 14 Aug 2012 at 12:28

GoogleCodeExporter commented 8 years ago
Hi Albert, thanks for fixing this. The real code that was triggering this is 
now working fine, no problems. Looks like you got it.

Original comment by dubious...@gmail.com on 16 Aug 2012 at 7:44

GoogleCodeExporter commented 8 years ago

Original comment by aggraef@gmail.com on 16 Aug 2012 at 9:08