ponylang / ponyc

Pony is an open-source, actor-model, capabilities-secure, high performance programming language
http://www.ponylang.io
BSD 2-Clause "Simplified" License
5.71k stars 415 forks source link

Tuple message send causes segfault (possibly autoboxing) #2237

Closed sgebbie closed 6 years ago

sgebbie commented 7 years ago

Summary

When sending a message to an actor behaviour where that message is a tuple that contains a boolean intrinsic it would seem that there might be a problem with autoboxing and unboxing. The code is resulting in a SIGSEGV at execution time as a result of accessing memory address 0x1 or 0x0 depending on the value of the Bool.

Code Snippet

Including other sends

The following code will compile and fail with a segmentation fault. It shows that other message sends succeed, but the message send with the Bool crashes.

cat trigger-tuple-autobox.pony

primitive Yes
primitive No
type Agreement is (Yes | No)

type Change is (
      ( U32, String , None      )
    | ( U32, String , String    )
    | ( U32, String , Bool      )
    | ( U32, String , Agreement )
)

actor Main

    let _env: Env

  new create(env: Env) =>
        _env = env
        go(1)

    be go(state: U32) =>
        match state
        | 1 => this.accept((1, "sending tuple with None (succeeds - no autobox)", None))
        | 2 => this.accept((2, "sending tuple with Class (succeeds - no autobox)", "hello world"))
        | 3 => this.accept((3, "sending tuple with Primitive (succeeds - no autobox)", Yes))
        | 4 => this.accept((4, "sending tuple with Bool (segfault - possible autobox)", false))
        | 5 => _env.out.print("FiN")
        end

    be accept(c: Change) =>
        match c
        | (let state: U32, let msg: String, let arg: Any) =>
            _env.out.print("got tuple: " + msg)
            this.go(state+1)
        end

Simpler code snippet

The following code is simpler but still crashes.

Interestingly if (U32, String) in Tuple is changed to (U32, None) the generated executable does not crash.

type Tuple is (
      ( U32, String      )
    | ( U32, Bool      )
)

actor Main

  new create(env: Env) =>
        this.accept((4, false))

    be accept(t: Tuple) =>
        None

Debug Output

When the line:

 4 => this.accept((4, "sending tuple with Bool (segfault - possible autobox)", false))

is changed to send true versus false in the last element of the tuple then the fault address will change from 0x1 to 0x0 respectively. This is what leads me to suspect a problem with unboxing.

Bool = true : fault address 0x1

(lldb) target create "./tuple-autobox"
Current executable set to './tuple-autobox' (x86_64).
(lldb) run
Process 15543 launched: './tuple-autobox' (x86_64)
got tuple: sending tuple with None (no box)
got tuple: sending tuple with Class (no box)
got tuple: sending tuple with Primitive (no box)
Process 15543 stopped
* thread #2: tid = 15548, 0x0000000000402637 tuple-autobox`Main_Dispatch + 631, name = 'tuple-autobox', stop reason = signal SIGSEGV: invalid address (fault address: 0x1)
    frame #0: 0x0000000000402637 tuple-autobox`Main_Dispatch + 631
tuple-autobox`Main_Dispatch:
->  0x402637 <+631>: cmpq   %rax, (%rdx)
    0x40263a <+634>: movq   %rdx, 0x48(%rsp)
    0x40263f <+639>: movq   %rsi, 0x40(%rsp)
    0x402644 <+644>: je     0x4026ae                  ; <+750>
(lldb)

Bool = false : fault address 0x0

$ lldb ./tuple-autobox 
(lldb) target create "./tuple-autobox"
Current executable set to './tuple-autobox' (x86_64).
(lldb) run ./tuple-autobox
Process 15617 launched: './tuple-autobox' (x86_64)
got tuple: sending tuple with None (no box)
got tuple: sending tuple with Class (no box)
got tuple: sending tuple with Primitive (no box)
Process 15617 stopped
* thread #2: tid = 15621, 0x00000000004027b7 tuple-autobox`Main_Dispatch + 631, name = 'tuple-autobox', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
    frame #0: 0x00000000004027b7 tuple-autobox`Main_Dispatch + 631
tuple-autobox`Main_Dispatch:
->  0x4027b7 <+631>: cmpq   %rax, (%rdx)
    0x4027ba <+634>: movq   %rdx, 0x48(%rsp)
    0x4027bf <+639>: movq   %rsi, 0x40(%rsp)
    0x4027c4 <+644>: je     0x40282e                  ; <+750>
(lldb) 

Pony Version

$ ponyc -version
0.19.0-cf45938 [release]
compiled with: llvm 3.9.1 -- cc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
sgebbie commented 7 years ago

I just tested this with the latest version of ponyc and the problem still occurs:

0 12:37 stewart@flatfoot:~/opt/pony/bugs/tuple-autobox/simpler$ cat trigger-tuple-autobox-bare.pony
type Tuple is (
          ( U32, String      )
        | ( U32, Bool      )
)

actor Main

  new create(env: Env) =>
                this.accept((4, false))

        be accept(t: Tuple) =>
                None
0 12:38 stewart@flatfoot:~/opt/pony/bugs/tuple-autobox/simpler$ ~/opt/pony/bleed/build/release/ponyc -version
0.19.1-0b561fd [release]
compiled with: llvm 3.9.1 -- cc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
0 12:38 stewart@flatfoot:~/opt/pony/bugs/tuple-autobox/simpler$ ~/opt/pony/bleed/build/release/ponyc 
Building builtin -> /home/stewart/opt/pony/bleed/packages/builtin
Building . -> /home/stewart/opt/pony/bugs/tuple-autobox/simpler
Generating
 Reachability
 Selector painting
 Data prototypes
 Data types
 Function prototypes
 Functions
 Descriptors
Optimising
Writing ./simpler.o
Linking ./simpler
Warning: environment variable $CC undefined, using cc as the linker
0 12:38 stewart@flatfoot:~/opt/pony/bugs/tuple-autobox/simpler$ ./simpler 
Segmentation fault (core dumped)
sgebbie commented 7 years ago

I thought it would be useful to check this against a newer version.

The bug is still present in:

$ ~/opt/pony/bleed/build/release/ponyc --version
0.19.3-9a12f91 [release]
compiled with: llvm 3.9.1 -- cc (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
SeanTAllen commented 6 years ago

backtrace with latest ponyc:

(lldb) run
Process 21153 launched: './issue-2237' (x86_64)
Process 21153 stopped
* thread #2: tid = 0x2cb5a, 0x0000000100001538 issue-2237`Main_Dispatch + 248, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000100001538 issue-2237`Main_Dispatch + 248
issue-2237`Main_Dispatch:
->  0x100001538 <+248>: cmpq   %rax, (%rsi)
    0x10000153b <+251>: jne    0x100001551               ; <+273>
    0x10000153d <+253>: leaq   0x2226c(%rip), %rdx
    0x100001544 <+260>: movl   $0x1, %ecx
(lldb) bt
warning: could not load any Objective-C class information. This will significantly reduce the quality of type information available.
* thread #2: tid = 0x2cb5a, 0x0000000100001538 issue-2237`Main_Dispatch + 248, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000100001538 issue-2237`Main_Dispatch + 248
    frame #1: 0x0000000100001440 issue-2237`StdStream_Trace + 32
    frame #2: 0x0000000100003352 issue-2237`ponyint_actor_run(ctx=0x00000001097ff448, actor=0x00000001097fe800, batch=100) + 162 at actor.c:226
    frame #3: 0x000000010001a16b issue-2237`run(sched=0x00000001097ff400) + 251 at scheduler.c:794
    frame #4: 0x0000000100019819 issue-2237`run_thread(arg=0x00000001097ff400) + 57 at scheduler.c:835
    frame #5: 0x00007fff9102d99d libsystem_pthread.dylib`_pthread_body + 131
    frame #6: 0x00007fff9102d91a libsystem_pthread.dylib`_pthread_start + 168
    frame #7: 0x00007fff9102b351 libsystem_pthread.dylib`thread_start + 13
sgebbie commented 6 years ago

This issue and the more recently logged issue appear to be duplicate: https://github.com/ponylang/ponyc/issues/2741

I've been placing my recent findings on #2741. Should I close this one?

SeanTAllen commented 6 years ago

seems reasonable yes as long as there is a reference back to this.

sgebbie commented 6 years ago

I'm closing this in favour of: https://github.com/ponylang/ponyc/issues/2741.

2741 appears to be a duplicate, but has more recent investigation notes.