Open sgebbie opened 6 years ago
Can verify on OSX
Hi,
I've been working with LLVM 5.0.1 because it fixes some other bugs that I bumped into. So, I haven't tried to confirm that the bug also happens with LLVM 3.9.1.
Stewart.
-- Stewart Gebbie sgebbie@gethos.net +27 84 738 2899 http://gethos.net
On Mon, Jun 04, 2018 at 12:51:00PM -0700, Sean T Allen wrote:
I can't reproduce this on OSX w llvm 3.9.1 and lastest master. It might have been fixed. Can you try building from source and checking [1]@sgebbie ?
— You are receiving this because you were mentioned. Reply to this email directly, [2]view it on GitHub, or [3]mute the thread.
References
I can verify the bug on OS X with LLVM 5.0.1
sigma:~ stewart$ cd tmp/crasher/
sigma:crasher stewart$ ~/opt/pony/bleed/build/release/ponyc
Building builtin -> /Users/stewart/opt/pony/bleed/packages/builtin
Building . -> /Users/stewart/tmp/crasher
Generating
Reachability
Selector painting
Data prototypes
Data types
Function prototypes
Functions
Descriptors
Optimising
Writing ./crasher.o
Linking ./crasher
sigma:crasher stewart$ ./crasher
Segmentation fault: 11
sigma:crasher stewart$ ~/opt/pony/bleed/build/release/ponyc --version
0.22.5-e06ad495 [release]
compiled with: llvm 5.0.1 -- Apple LLVM version 9.1.0 (clang-902.0.39.2)
Defaults: pic=false ssl=openssl_0.9.0
sigma:crasher stewart$
Generating asm output from ponyc and correlating this with lldb I can see that the crash takes place on line 105 cmpq %rax, (%rsi)
in the listing below. This seems to have something to do with GC tracing during a message send to an actor.
But, again, from the listing, it is clear that the constant 5
has been loaded into a representation of the tuple in line 90 movq $5, 16(%rbx)
and then is being read back out for the the failing comparison in line
103 movq 16(%rbx), %rsi
.
But, jumping to conclusions, I would guess that the logic here would be trying to look at the type of the object being passed. So, either some form of autoboxing could be being muddled up, or maybe some code for fetching type type ID is just reading from the wrong offset — but this is all just a complete stab in the dark (based on the fact that the surrounding code referenced pony_trace(un)known
which could related to gentrace.c:trace_tuple
, all of which I didn't quite get to grips with :) )
83 callq pony_sendv_single
84 xorl %esi, %esi
85 movq %r12, %rdi
86 callq pony_alloc_small
87 movq %rax, %rbx
88 movq $.L__unnamed_4, (%rbx)
89 movq $139957, 24(%rbx)
90 movq $5, 16(%rbx)
91 movl $11, 8(%rbx)
92 xorl %edi, %edi
93 xorl %esi, %esi
94 callq pony_alloc_msg
95 movq %rax, %r15
96 movq %rbx, 16(%r15)
97 movq %r12, %rdi
98 callq pony_gc_send
99 movl $2, %edx
100 movq %r12, %rdi
101 movq %rbx, %rsi
102 callq pony_traceunknown
103 movq 16(%rbx), %rsi
104 movl $.L__unnamed_5, %eax
105 cmpq %rax, (%rsi)
106 jne .LBB2_3
107 movl $.L__unnamed_5, %edx
108 movl $1, %ecx
109 movq %r12, %rdi
110 callq pony_traceknown
111 .LBB2_3:
112 movl $2, %edx
113 movq %r12, %rdi
114 movq %rbx, %rsi
115 callq pony_traceunknown
Slowly trying to work through the code. I realise that this is still somewhat broad, but I think that I can narrow this down to code generated by gentrace
in between the calls to pony_gc_send
and pony_send_done
in gencall.c:gen_send_message
.
Again, my sense is that the tuple has been packed into a compact representation, but that the code is then trying to later load up a an actual data value part of the tuple as though it was internal type information.
Just being hopeful, so retested following the fix for: #2735 but the bug still occurs for:
ponyc -v
0.22.6-20ea5ff [release]
compiled with: llvm 5.0.1 -- cc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
Defaults: pic=false ssl=openssl_0.9.0
Thanks @sgebbie
https://github.com/ponylang/ponyc/issues/2237 appears to be a duplicate of this bug, but has been closed because this issue (#2741) has more recent notes.
Looking at the older #2237, it seems that the code snippet was simpler and probably easier to debug. So, for reference, here is the simpler code that causes the same type of failure:
type Tuple is (
(U32, String)
| (U32, U32)
)
actor Main
new create(env: Env) =>
this.accept((0x39BF, 0x856D))
be accept(t: Tuple) =>
None
The magic numbers (0x39BF, 0x856D) make it easier to spot the values in the assembler listing.
I wonder if this is related to #2808...
This still happens as of Pony 0.37.0.
Note that if TimeOp isn't a union, this doesn't segfault.
This certainly looks to be a boxing issue as it needs to be a union of tuple where one of the tuples has a class. If it is all machine words, or all classes, no problem.
When using:
The following short program consistent crashes.
$ cat crasher.pony
The LLDB output shows that values in the tuple seem to be being used as addresses to dereferrence:
This might be related to: https://github.com/ponylang/ponyc/issues/2237 which is still an open issue.