ponylang / ponyc

Pony is an open-source, actor-model, capabilities-secure, high performance programming language
http://www.ponylang.io
BSD 2-Clause "Simplified" License
5.69k stars 410 forks source link

Application SEGV when sending a union of tuple #2741

Open sgebbie opened 6 years ago

sgebbie commented 6 years ago

When using:

$ ponyc -v
0.22.3-aff82e4 [release]
compiled with: llvm 5.0.1 -- cc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
Defaults: pic=false ssl=openssl_0.9.0

The following short program consistent crashes.

$ cat crasher.pony

actor Crasher
  be go(value: TimeOp) => None

actor Main

  new create(env: Env) =>
    let w:Crasher = Crasher

    let ctc: TimeStamp = (5,7)
    let cto: TimeOp = (11, ctc)
    w.go(cto)

class val TimeChunk
type TimeStamp is (I64, I64)

type TimeOp is (
    (I32 , TimeChunk)
  | (I32 , TimeStamp)
)

The LLDB output shows that values in the tuple seem to be being used as addresses to dereferrence:

$ lldb ./crasher
(lldb) target create "./crasher"
Current executable set to './crasher' (x86_64).
(lldb) run
Process 11224 launched: './crasher' (x86_64)
Process 11224 stopped
* thread #2: tid = 11228, 0x0000000000402e73 crasher`Main_tag_create_oo(this=0x00007ffff71c9800, env=0x00007ffff71c9000) + 451 at crasher.pony:11, name = 'crasher', stop reason = signal SIGSEGV: invalid address (fault address: 0x5)
    frame #0: 0x0000000000402e73 crasher`Main_tag_create_oo(this=0x00007ffff71c9800, env=0x00007ffff71c9000) + 451 at crasher.pony:11
   8   
   9        let ctc: TimeStamp = (5,7)
   10       let cto: TimeOp = (11, ctc)
-> 11       w.go(cto)
   12  
   13   class val TimeChunk
   14   type TimeStamp is (I64, I64)
(lldb) 

This might be related to: https://github.com/ponylang/ponyc/issues/2237 which is still an open issue.

SeanTAllen commented 6 years ago

Can verify on OSX

sgebbie commented 6 years ago

Hi,

I've been working with LLVM 5.0.1 because it fixes some other bugs that I bumped into. So, I haven't tried to confirm that the bug also happens with LLVM 3.9.1.

Stewart.

-- Stewart Gebbie sgebbie@gethos.net +27 84 738 2899 http://gethos.net

On Mon, Jun 04, 2018 at 12:51:00PM -0700, Sean T Allen wrote:

I can't reproduce this on OSX w llvm 3.9.1 and lastest master. It might have been fixed. Can you try building from source and checking [1]@sgebbie ?

— You are receiving this because you were mentioned. Reply to this email directly, [2]view it on GitHub, or [3]mute the thread.

References

  1. https://github.com/sgebbie
  2. https://github.com/ponylang/ponyc/issues/2741#issuecomment-394477101
  3. https://github.com/notifications/unsubscribe-auth/ADXcL70RDNoYijJ6yWXQuravOBSPq0kXks5t5Y-kgaJpZM4UZQ25
sgebbie commented 6 years ago

I can verify the bug on OS X with LLVM 5.0.1

sigma:~ stewart$ cd tmp/crasher/
sigma:crasher stewart$ ~/opt/pony/bleed/build/release/ponyc 
Building builtin -> /Users/stewart/opt/pony/bleed/packages/builtin
Building . -> /Users/stewart/tmp/crasher
Generating
 Reachability
 Selector painting
 Data prototypes
 Data types
 Function prototypes
 Functions
 Descriptors
Optimising
Writing ./crasher.o
Linking ./crasher
sigma:crasher stewart$ ./crasher 
Segmentation fault: 11
sigma:crasher stewart$ ~/opt/pony/bleed/build/release/ponyc --version
0.22.5-e06ad495 [release]
compiled with: llvm 5.0.1 -- Apple LLVM version 9.1.0 (clang-902.0.39.2)
Defaults: pic=false ssl=openssl_0.9.0
sigma:crasher stewart$ 
sgebbie commented 6 years ago

Generating asm output from ponyc and correlating this with lldb I can see that the crash takes place on line 105 cmpq %rax, (%rsi) in the listing below. This seems to have something to do with GC tracing during a message send to an actor.

But, again, from the listing, it is clear that the constant 5 has been loaded into a representation of the tuple in line 90 movq $5, 16(%rbx) and then is being read back out for the the failing comparison in line 103 movq 16(%rbx), %rsi.

But, jumping to conclusions, I would guess that the logic here would be trying to look at the type of the object being passed. So, either some form of autoboxing could be being muddled up, or maybe some code for fetching type type ID is just reading from the wrong offset — but this is all just a complete stab in the dark (based on the fact that the surrounding code referenced pony_trace(un)known which could related to gentrace.c:trace_tuple, all of which I didn't quite get to grips with :) )

  83         callq   pony_sendv_single
  84         xorl    %esi, %esi
  85         movq    %r12, %rdi
  86         callq   pony_alloc_small
  87         movq    %rax, %rbx
  88         movq    $.L__unnamed_4, (%rbx)
  89         movq    $139957, 24(%rbx)
  90         movq    $5, 16(%rbx)
  91         movl    $11, 8(%rbx)
  92         xorl    %edi, %edi
  93         xorl    %esi, %esi
  94         callq   pony_alloc_msg
  95         movq    %rax, %r15
  96         movq    %rbx, 16(%r15)
  97         movq    %r12, %rdi
  98         callq   pony_gc_send
  99         movl    $2, %edx
 100         movq    %r12, %rdi
 101         movq    %rbx, %rsi
 102         callq   pony_traceunknown
 103         movq    16(%rbx), %rsi
 104         movl    $.L__unnamed_5, %eax
 105         cmpq    %rax, (%rsi)
 106         jne     .LBB2_3
 107         movl    $.L__unnamed_5, %edx
 108         movl    $1, %ecx
 109         movq    %r12, %rdi
 110         callq   pony_traceknown
 111 .LBB2_3:
 112         movl    $2, %edx
 113         movq    %r12, %rdi
 114         movq    %rbx, %rsi
 115         callq   pony_traceunknown
sgebbie commented 6 years ago

Slowly trying to work through the code. I realise that this is still somewhat broad, but I think that I can narrow this down to code generated by gentrace in between the calls to pony_gc_send and pony_send_done in gencall.c:gen_send_message.

Again, my sense is that the tuple has been packed into a compact representation, but that the code is then trying to later load up a an actual data value part of the tuple as though it was internal type information.

sgebbie commented 6 years ago

Just being hopeful, so retested following the fix for: #2735 but the bug still occurs for:

ponyc -v
0.22.6-20ea5ff [release]
compiled with: llvm 5.0.1 -- cc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
Defaults: pic=false ssl=openssl_0.9.0
SeanTAllen commented 6 years ago

Thanks @sgebbie

sgebbie commented 6 years ago

https://github.com/ponylang/ponyc/issues/2237 appears to be a duplicate of this bug, but has been closed because this issue (#2741) has more recent notes.

sgebbie commented 6 years ago

Looking at the older #2237, it seems that the code snippet was simpler and probably easier to debug. So, for reference, here is the simpler code that causes the same type of failure:

type Tuple is (
    (U32, String)
  | (U32, U32)
)

actor Main
  new create(env: Env) =>
    this.accept((0x39BF, 0x856D))

  be accept(t: Tuple) =>
    None

The magic numbers (0x39BF, 0x856D) make it easier to spot the values in the assembler listing.

jemc commented 6 years ago

I wonder if this is related to #2808...

SeanTAllen commented 4 years ago

This still happens as of Pony 0.37.0.

SeanTAllen commented 2 years ago

Note that if TimeOp isn't a union, this doesn't segfault.

SeanTAllen commented 2 years ago

This certainly looks to be a boxing issue as it needs to be a union of tuple where one of the tuples has a class. If it is all machine words, or all classes, no problem.