Closed beta-ziliani closed 2 years ago
@beta-ziliani it may be related to https://github.com/ysbaddaden/gc/issues/11#issuecomment-979162619 ; something changed in crystal at some point and this GC doesn't work anymore.
This is unrelated with #11. The GC is working, but there is a bug while allocating:
0x000055555562ac71 in GC_LocalAllocator_allocateSmall (self=0x555555849410, size=160, atomic=0) at src/local_allocator.c:121
121 void *GC_LocalAllocator_allocateSmall(LocalAllocator *self, size_t size, int atomic) {
I have to set MIN=10000
for the bug to trigger. It's forcing the allocation of lots of small objects (hundreds of the same size, then it grows by 1 byte, and so on). Removing the division indeed makes the program pass (neat, we can see the GC allocating and collecting), but we now allocate one object of a size, then another of 6 bytes larger.
GC: malloc object=0x68cfea0 size=217 actual=240 atomic=1 ptr=0x68cfeb0
==17586== Conditional jump or move depends on uninitialised value(s)
==17586== at 0x1DEFE3: LocalAllocator_findNextHole (local_allocator.c:31)
==17586== by 0x1DEFE3: LocalAllocator_tryAllocateSmall (local_allocator.c:115)
==17586== by 0x1DEFE3: GC_LocalAllocator_allocateSmall (local_allocator.c:126)
==17586== by 0x1DD713: GC_malloc_with_atomic (immix.c:74)
==17586== by 0x1DD713: GC_malloc (immix.c:95)
==17586== by 0x17F6D5: *GC::malloc<UInt64>:Pointer(Void) (immix.cr:22)
==17586== by 0x14890D: __crystal_malloc64 (gc.cr:24)
==17586== by 0x19BFB4: *Fiber::new<String, &Proc(Nil)>:Fiber (fiber.cr:88)
==17586== by 0x157CEB: *spawn:name<String, &Proc(Nil)>:Fiber (concurrent.cr:61)
==17586== by 0x17F77A: *GC::collect:Nil (immix.cr:40)
==17586== by 0x158885: GC_collect (immix.cr:4)
==17586== by 0x1DE052: GlobalAllocator_tryCollect (global_allocator.c:147)
==17586== by 0x1DE052: GC_GlobalAllocator_nextBlock (global_allocator.c:164)
==17586== by 0x1DEFEC: LocalAllocator_initCursor (local_allocator.c:10)
==17586== by 0x1DEFEC: GC_LocalAllocator_allocateSmall (local_allocator.c:135)
==17586== by 0x1DD7A3: GC_malloc_with_atomic (immix.c:74)
==17586== by 0x1DD7A3: GC_malloc_atomic (immix.c:99)
==17586== by 0x17F495: *GC::malloc_atomic<UInt64>:Pointer(Void) (immix.cr:26)
==17586== Uninitialised value was created by a heap allocation
==17586== at 0x4C31B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17586== by 0x1DD4AC: GC_init (immix.c:40)
==17586== by 0x17F705: *GC::init:Nil (immix.cr:12)
==17586== by 0x1DD23A: *Crystal::main<Int32, Pointer(Pointer(UInt8))>:Int32 (main.cr:35)
==17586== by 0x155D95: main (main.cr:127)
==17586==
==17586== Stack overflow in thread #1: can't grow stack to 0x1ffe801000
Invalid memory access (signal 11) at address 0x1ffe801ff8
Aaaand... this is again caused by starting the collector loop in a Fiber (see the backtrace), which is indeed a very bad moment to create this fiber, because it requires to allocate memory, which can pass sometimes (if there are some memory left), but... what if we filled it?
There is still a bug: self->next
is uninitialized when LocalAllocator_findNextHole()
tries to access it, when the memory is filled, and there is no more block (yet). I don't recall the immix algorithm, and the block, chunk, and hole terms are totally foreign to me, now :sob:
Maybe the fiber is completely wrong, and Immix GC itself should start a thread (completely distinct from Crystal), and switch execution to it, but if I recall correctly I switched fibers to make sure that the current context was saved on the stack (again, if I recall correctly).
I think understand what's the issue with the Fiber (not the details but the why): we can't do anything in GC.init
except calling external functions that don't expect the Crystal runtime, because Crystal calls it before starting up the "runtime" (main user code) which initializes constants, globals, and everything.
Delaying is the good solution but we can't delay it until we start collecting (as HEAD is currently doing), otherwise we'd get a Fiber.new
which is trying to allocate, but we're collecting, so memory may be full, so we must do it while the runtime is starting up.
Just doing @@collector = spawn { collector_loop }
fixes the spawn issue (trying to collect won't try to allocate memory anymore), except that the program eventually hangs in a forever sleep
(I assume @@pending
isn't properly set).
The following commits fixed the issue:
There are more issues in Crystal and/or the GC, but this reproducible issue is fixed.
The following loop core dumps (and works fine with libgc).
crystal 1.4.0 LLVM 12.0.1 Default target: x86_64-pc-linux-gnu
lib/immix
built withmake -B CUSTOM="-DNDEBUG"