Closed pdtwonotes closed 7 years ago
See issue #1000 where an interlock problem specific to ARM was causing a segfault.
On IRC @pdtwonotes you said that #1505 is where this started to happen to you, yes?
If that is the case then the problem already existed it was just being hidden or less likely due to a timing issue.
Some more clues:
In setting up the new actor, the generated code calls pony_create
which creates the actor. Then it wants to send a message - I am assuming this is the actual create
method invocation, which by the way has 9 parameters. So it calls pony_alloc_msg
in the runtime library. This in turn calls ponyint_pool_alloc
. That call (ponyint_pool_alloc
) does not return.
Rebuilding pony with config=debug
provides more information, due to the assertions buried in the runtime code. The first line is the actual failure.
tokara: src/libponyrt/mem/pool.c:545: pool_pull: Assertion `top->length == global->count' failed.
Process 5943 stopped
* thread #2: tid = 5947, 0x76cebaf6 libc.so.6, name = 'tokara', stop reason = signal SIGABRT
frame #0: 0x76cebaf6 libc.so.6
-> 0x76cebaf6: blmi 0x76ddb0fe
libc.so.6`__errno_location:
0x76cebafa <+2>: mrc p15, #0x0, r0, c13, c0, #0x3
0x76cebafe <+6>: add r3, pc
0x76cebb00 <+8>: ldr r3, [r3]
(lldb) bt
* thread #2: tid = 5947, 0x76cebaf6 libc.so.6, name = 'tokara', stop reason = signal SIGABRT
* frame #0: 0x76cebaf6 libc.so.6
frame #1: 0x00058628 tokara`handle_message(ctx=<unavailable>, actor=<unavailable>, msg=<unavailable>) + 240 at actor.c:103
I narrowed the suspects down to transaction d715a6bae "Optimise count zeros operations in the runtime (#1443)" since it had to do with counting in the heap, and the assertion failure also had to do with counting in the heap. And the time of the change also happened during a period when I was not regularly rebuilding my ARM compiler.
So as an experiment I undid just that one transaction, involving the three files platform.h, pool.c, and heap.c and rebuilt ponyc on ARM. The problem went away.
I do not know enough about either Pony's heap structures or ARM instruction fencing to suggest a proper fix but the bug is somewhere in that transaction. I have changed the issue title to more properly reflect the cause.
@pdtwonotes Thanks for investigating that. The change in pool.c
is indeed wrong on 32 bits systems. I'll make a fix right away.
I still get the assertion failure. I will check again that I am testing it correctly.
Edit: I rebuilt ponyc on my i686 machine using config=debug and it also gets the assertion failure. It just never got the looping on actor startup.
More: Removing the original commit from November fixes the assertion failure on both ARM and i686. The math involving word length was clearly a problem, but that is not sufficient to fix this, it seems.
Can you try reverting only one part of the change, to see which function is causing the problem?
It would be a complete shot in the dark. I do not know how these things interact:
The source code in the RT library has very few comments explaining what is going on.
- A change in the builtin function that counts bits in a word. Left-to-right or right-to-left?
It's FFS (Find First Set) and CTZ (Count Trailing Zeros) so it starts from the least-significant bit.
- Whether a +1 is added to the resulting count. Fencepost errors?
This is because ffs(x)
returns ctz(x) + 1
for a non-zero x
.
- What are these bitmasks in the first place? A map to used/allocated pages in the heap?
It's a map to used slots in a chunk of heap memory.
- What was the purpose of the change?
Speed. FFS checks for 0 while CTZ doesn't, and slots
isn't supposed to be 0 here.
The source code in the RT library has very few comments explaining what is going on.
That's true, it's an area that can be improved. In the meantime, you can take a look at @sylvanc's thesis, it contains a lot of explanations on the runtime and its implementation.
For your tests, you can try leaving heap.c
in its current state (with the call to __pony_ctz
) but reverting pool.c
to use ponyint_next_pow2
and __pony_ffsl
. I think you'll have to edit platform.h
to have the definitions of both __pony_ctz
and __pony_ffsl
.
Reverting just pool.c (and the small change to platform.h) as you described does appear to fix the problem. I tested it on ARM.
Something else is wrong in that change then. I'll look into it. Thanks for testing all that.
@Praetonus I don't have time to test this out but I think I see the error.
+ return ((size_t)64 - __pony_clzl(size)) - POOL_MIN_BITS;
should be
+ return ((size_t)64 - __pony_ctzl(size)) - POOL_MIN_BITS;
in commit d715a6b because you want to count trailing zeros when replacing ffs
as you did in heap.c
.
This combined with your change in PR #1525 to use the appropriate bit count depending on ILP32
should resolve this issue.
I should note though, that I've only got a shallow understanding of the code/changes so I may be misunderstanding the change from ffs
to clz
.
So, ummm, I thought some more about this (which I probably should have done prior to posting my previous comment) and my previous comment was completely wrong. Please ignore. Also, apologies for the unnecessary noise.
The only thing that jumps out to me at this point is the use of __pony_clzl
even on ILP32
(with the changes from PR #1525). it seems that __pony_clz
should be used on ILP32
instead. Not sure if that makes a difference as I'm unable to find information on the behavior of __pony_clzl
on ILP32
.
The only thing that jumps out to me at this point is the use of pony_clzl even on ILP32 (with the changes from PR #1525). it seems that __pony_clz should be used on ILP32 instead. Not sure if that makes a difference as I'm unable to find information on the behavior of pony_clzl on ILP32.
The difference on Linux/OSX is that __pony_clz
takes an int
while __pony_clzl
takes a long
. Since size_t
on ILP32 is the same size as long
, I think clzl
is the correct version to use.
I found another problem with the original change. If the size is a power of two, the resulting index will be off by one. I'll push a fix for that as well.
This latest fix seems to work. Tested on ARM - i will do i686 tomorrow
Test on i686 gives different results. My program goes into a CPU loop on startup. But if I run it with GDB to find where, instead I get a SegFault, again in the message processing during startup of the main server actor, but in a different place:
Thread 4 "tokara" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb4b68b40 (LWP 4509)]
0x080bdf65 in ponyint_messageq_push (q=0xb7b69c05, m=0xb31ffc20)
at src/libponyrt/actor/messageq.c:75
75 atomic_store_explicit(&prev->next, m, memory_order_release);
(gdb) bt
#0 0x080bdf65 in ponyint_messageq_push (q=0xb7b69c05, m=0xb31ffc20)
at src/libponyrt/actor/messageq.c:75
#1 0x080be8c4 in pony_sendv (ctx=0xb7b6ad44, to=0xb7b69c01, m=0xb31ffc20)
at src/libponyrt/actor/actor.c:342
#2 0x080c7521 in ponyint_cycle_block (ctx=0xb7b6ad44, actor=0xb7b65000,
gc=0xb7b65074) at src/libponyrt/gc/cycle.c:889
#3 0x080be54e in ponyint_actor_run (ctx=0xb7b6ad44, actor=0xb7b65000,
batch=100) at src/libponyrt/actor/actor.c:198
#4 0x080bcd1f in run (sched=0xb7b6ad00) at src/libponyrt/sched/scheduler.c:290
#5 0x080bcdf1 in run_thread (arg=0xb7b6ad00)
at src/libponyrt/sched/scheduler.c:341
#6 0xb7d58295 in start_thread (arg=0xb4b68b40) at pthread_create.c:333
#7 0xb7c5feee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:114
In pool.c pool_pull, line 515, there is a while loop that never terminates. (on x86)
I have adjusted my environment so I can do all three platforms from a shared copy of the repo, to make sure they have identical code. With the latest fix from Praetonus it works fine on x86_64 and ARM, but not on i686. This is odd, becuase both ARM and i686 are 32-bits. So the difference must lie elsewhere. Ideas: timing? LSP32 defined wrong? ARM and x86_64 are running Ubuntu MATE 16.04.1 and i686 is running Ubuntu Server, 16.04.1, so those differences should be minor.
ILP32
is defined as
#if defined(__LP64__)
# define PLATFORM_IS_LP64
#elif defined(_WIN64)
# define PLATFORM_IS_LLP64
#else
# define PLATFORM_IS_ILP32
#endif
which seems right to me.
That unterminating loop is very strange. Can you try running it in a debugger and look at the value of index
in pool_get
when you hit that loop? The full backtrace could also contain useful data. Thanks.
If I run it without gdb, it goes into a loop. If I run it under GDB, instead it gets the segment fault. There was always a rule for debugging that if behavior is different with the debugger, then there is uninitialized memory somewhere.
So I tried using the GDB 'attach' command to remotely attach to the looping program. In that case, I get this stack:
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
0xb7795c31 in __kernel_vsyscall ()
(gdb) bt
#0 0xb7795c31 in __kernel_vsyscall ()
#1 0xb75143fb in pthread_join (threadid=3048360768, thread_return=0x0)
at pthread_join.c:90
#2 0x080bd99a in pony_thread_join (thread=3048360768)
at src/libponyrt/platform/threads.c:213
#3 0x080bcac7 in ponyint_sched_shutdown ()
at src/libponyrt/sched/scheduler.c:353
#4 0x080bcdd9 in ponyint_sched_start (library=false)
at src/libponyrt/sched/scheduler.c:430
#5 0x080bc387 in pony_start (library=false) at src/libponyrt/sched/start.c:128
#6 0x080b9a0c in main ()
I tried stppping through the creation of the actor, with a breakpoint on pool_get. index values vary between 0 and 33. From the messages coming from the debugger it would appear two threads are busy in the memory allocation code, as sometimes the breakpoint is hit one one thread, and sometimes in another. (CPU is an Intel Atom with 4 threads in two cores). Eventually this stepping gets to the SegFault in messageq_push
noted previously.
Those index values aren't normal. A pool index shouldn't be higher than 15 in the current implementation. Can you add assert(index < POOL_COUNT);
at the beginning of ponyint_pool_alloc
(pool.c:605
) and see if that assertion fires?
Sure enough, it does fire. Here is the bt and you can see that pony_alloc_msg
is being passed the 33 as well, directly from Main_tag_create_oo
at line 78, which happens to be where it makes the call to create another actor.
#0 0xb7fdac31 in __kernel_vsyscall ()
#1 0xb7ba4ea9 in __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#2 0xb7ba6407 in __GI_abort () at abort.c:89
#3 0xb7b9dd07 in __assert_fail_base (
fmt=0xb7cd8088 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=0x80de9a1 "index < POOL_COUNT",
file=0x80de950 "src/libponyrt/mem/pool.c", line=605,
function=0x80de9cc <__PRETTY_FUNCTION__.4686> "ponyint_pool_alloc")
at assert.c:92
#4 0xb7b9dd8b in __GI___assert_fail (
assertion=0x80de9a1 "index < POOL_COUNT",
file=0x80de950 "src/libponyrt/mem/pool.c", line=605,
function=0x80de9cc <__PRETTY_FUNCTION__.4686> "ponyint_pool_alloc")
at assert.c:101
#5 0x080c1051 in ponyint_pool_alloc (index=33) at src/libponyrt/mem/pool.c:605
#6 0x080be466 in pony_alloc_msg (index=33, id=6)
at src/libponyrt/actor/actor.c:326
#7 0x080758c6 in Main_tag_create_oo (this=0xb7b69800, env=0xb7b69400)
at main.pony:78
#8 0x0804d527 in Main_Dispatch ()
#9 0x080bde77 in handle_message (ctx=0xb7b6ac44, actor=0xb7b69800,
msg=0xb7b6ab80) at src/libponyrt/actor/actor.c:103
#10 0x080bdffb in ponyint_actor_run (ctx=0xb7b6ac44, actor=0xb7b69800,
batch=100) at src/libponyrt/actor/actor.c:158
#11 0x080bc91f in run (sched=0xb7b6ac00) at src/libponyrt/sched/scheduler.c:290
#12 0x080bc9f1 in run_thread (arg=0xb7b6ac00)
at src/libponyrt/sched/scheduler.c:341
#13 0xb7d58295 in start_thread (arg=0xb5369b40) at pthread_create.c:333
#14 0xb7c5feee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:114
Here is the compiled code from just before the call to pony_alloc_msg
and you can clearly see it putting a 33 on the stack.
0x08075840 <+4288>: call 0x80be350 <pony_create>
0x08075845 <+4293>: mov $0x21,%ecx
0x0807584a <+4298>: mov $0x6,%edx
0x0807584f <+4303>: mov 0xbc(%esp),%esi
0x08075856 <+4310>: mov 0xb8(%esp),%edi
0x0807585d <+4317>: mov 0xb0(%esp),%ebx
0x08075864 <+4324>: mov 0xac(%esp),%ebp
0x0807586b <+4331>: mov %eax,0x98(%esp)
0x08075872 <+4338>: mov 0xa0(%esp),%eax
0x08075879 <+4345>: mov %eax,0x94(%esp)
0x08075880 <+4352>: mov 0x9c(%esp),%eax
0x08075887 <+4359>: movl $0x21,(%esp)
0x0807588e <+4366>: movl $0x6,0x4(%esp)
0x08075896 <+4374>: mov %edx,0x90(%esp)
0x0807589d <+4381>: mov %ecx,0x8c(%esp)
0x080758a4 <+4388>: mov %eax,0x88(%esp)
0x080758ab <+4395>: mov %esi,0x84(%esp)
0x080758b2 <+4402>: mov %edi,0x80(%esp)
0x080758b9 <+4409>: mov %ebx,0x7c(%esp)
0x080758bd <+4413>: mov %ebp,0x78(%esp)
0x080758c1 <+4417>: call 0x80be453 <pony_alloc_msg>
You can see the Pony source code here at line 78. The definition of the actor it is creating is here.
The compiler is using ponyint_pool_index
to compute that value so we're back to that function. Can you add assert(index < POOL_COUNT);
just before returning the value at the end of ponyint_pool_index
and see what the results are? In particular, what is the value of size
?
I turns out that the symbol ILP32 is false on this computer, which uname -m
identifies as i686
. It is indeed a 32-bit processor. This makes ponyint_pool_index
assign the wrong value to bits
, resulting in an index
value that is out of range.
Verified with this test code:
#include <stdio.h>
int main( int agc, const char *arg[] ) {
#ifdef ILP32
printf("32");
#else
printf("64");
#endif
}
It prints "64".
I notice that this place in pool.c is the only place in the Pony code that uses ILP32
. Other places use PLATFORM_IS_ILP32
which is defined in platform.h
You're totally right, it's a typo on my side. I'll fix that.
My Main actor creates another actor. The first line of the other actor is never executed, but the library code goes into a loop. This only happens on ARM - the same code works correctly on X86_64. I interrupt the loop with ^C to use the debugger to see the stack: