wren-lang / wren

The Wren Programming Language. Wren is a small, fast, class-based concurrent scripting language.
http://wren.io
MIT License
6.86k stars 550 forks source link

enabling WREN_DEBUG_GC_STRESS reveals memory issues #869

Open ruby0x1 opened 3 years ago

ruby0x1 commented 3 years ago

Ran into some weird memory issues on the latest Wren (which may have been there a while). I figured I would post as it may be more efficient if we have several perspectives looking for the cause, or if any of the changes over time ring a bell as to a potential origin. @mhermier @munificent


if you set #define WREN_DEBUG_GC_STRESS 1 in wren_common.h, it's expected to run normally (even if slowly) (WREN_DEBUG_TRACE_GC enabled to see the GC). However it starts failing very early in the core module, with Assert failed in hashObject(): Only immutable objects can be hashed. The first error is seemingly random, and the objects that it has when hashing are of course bogus.

Another thing is during a collect, where the object it's accessing is invalid memory. ASAN catches it as expected, and shows that the value was already freed by GC previously (second block of info. note: in the first and second block the path taken to wrenCollectGarbage isn't relevant, only the initial allocation path is).


Something else that's interesting, sometimes the bytesAllocated before GC is higher, leading to misleading amounts collected, but also implies memory is allocated during a collect somehow...

GC 10410 before, 10409 after (1 collected)
GC 10413 before, 10414 after (4294967295 collected)

ASAN output

==22884==ERROR: AddressSanitizer: heap-use-after-free on address 0x125d86002554 at pc 0x7ff7f94440c1 bp 0x00d375af0a20 sp 0x00d375af0a68
READ of size 1 at 0x125d86002554 thread T0
    #0 0x7ff7f94440c0 in wrenGrayObj wren\vm\wren_value.c:1026
    #1 0x7ff7f94209fc in wrenCollectGarbage wren\vm\wren_vm.c:165
    #2 0x7ff7f94205b7 in wrenReallocate wren\vm\wren_vm.c:256
    #3 0x7ff7f943a68b in wrenValueBufferFill wren\vm\wren_value.c:36
    #4 0x7ff7f943a8bc in wrenValueBufferWrite wren\vm\wren_value.c:36
    #5 0x7ff7f9883a67 in addConstant wren\vm\wren_compiler.c:492
    #6 0x7ff7f988ad3c in emitConstant wren\vm\wren_compiler.c:1251
    #7 0x7ff7f987e6a2 in literal wren\vm\wren_compiler.c:2341
    #8 0x7ff7f987a395 in parsePrecedence wren\vm\wren_compiler.c:2758
    #9 0x7ff7f9881487 in expression wren\vm\wren_compiler.c:2772
    #10 0x7ff7f9887006 in finishArgumentList wren\vm\wren_compiler.c:1886
    #11 0x7ff7f9887a3e in methodCall wren\vm\wren_compiler.c:1940
    #12 0x7ff7f9887413 in namedCall wren\vm\wren_compiler.c:2016
    #13 0x7ff7f987de6f in name wren\vm\wren_compiler.c:2307
    #14 0x7ff7f987a395 in parsePrecedence wren\vm\wren_compiler.c:2758
    #15 0x7ff7f9881487 in expression wren\vm\wren_compiler.c:2772
    #16 0x7ff7f988824a in finishBlock wren\vm\wren_compiler.c:1691
    #17 0x7ff7f9888189 in finishBody wren\vm\wren_compiler.c:1717
    #18 0x7ff7f988fc1b in method wren\vm\wren_compiler.c:3312
    #19 0x7ff7f988dcf6 in classDefinition wren\vm\wren_compiler.c:3404
    #20 0x7ff7f98814c0 in definition wren\vm\wren_compiler.c:3536
    #21 0x7ff7f987f228 in wrenCompile wren\vm\wren_compiler.c:3606
    #22 0x7ff7f94233ac in compileInModule wren\vm\wren_vm.c:512
    #23 0x7ff7f94303c4 in wrenCompileSource wren\vm\wren_vm.c:1566
    #24 0x7ff7f943025d in wrenInterpret wren\vm\wren_vm.c:1545
    #25 0x7ff7f98517b9 in wrenInitializeCore wren\vm\wren_core.c:1461
    #26 0x7ff7f9420108 in wrenNewVM wren\vm\wren_vm.c:101

0x125d86002554 is located 4 bytes inside of 33-byte region [0x125d86002550,0x125d86002571)
freed by thread T0 here:
    #0 0x7ff7f8edb954 in free 
    #1 0x7ff7f941fbe8 in defaultReallocate wren\vm\wren_vm.c:36
    #2 0x7ff7f942064b in wrenReallocate wren\vm\wren_vm.c:261
    #3 0x7ff7f9444fa7 in wrenFreeObj wren\vm\wren_value.c:1350
    #4 0x7ff7f94212b1 in wrenCollectGarbage wren\vm\wren_vm.c:209
    #5 0x7ff7f94205b7 in wrenReallocate wren\vm\wren_vm.c:256
    #6 0x7ff7f984b945 in wrenByteBufferFill wren\vm\wren_utils.c:6
    #7 0x7ff7f984bbfc in wrenByteBufferWrite wren\vm\wren_utils.c:6
    #8 0x7ff7f988508f in emitByte wren\vm\wren_compiler.c:1192
    #9 0x7ff7f9881628 in emitOp wren\vm\wren_compiler.c:1204
    #10 0x7ff7f988940a in loadLocal wren\vm\wren_compiler.c:1554
    #11 0x7ff7f9888dbb in loadVariable wren\vm\wren_compiler.c:2027
    #12 0x7ff7f98889cb in loadThis wren\vm\wren_compiler.c:2044
    #13 0x7ff7f987de4b in name wren\vm\wren_compiler.c:2306
    #14 0x7ff7f987a395 in parsePrecedence wren\vm\wren_compiler.c:2758
    #15 0x7ff7f9881487 in expression wren\vm\wren_compiler.c:2772
    #16 0x7ff7f988824a in finishBlock wren\vm\wren_compiler.c:1691
    #17 0x7ff7f9888189 in finishBody wren\vm\wren_compiler.c:1717
    #18 0x7ff7f988fc1b in method wren\vm\wren_compiler.c:3312
    #19 0x7ff7f988dcf6 in classDefinition wren\vm\wren_compiler.c:3404
    #20 0x7ff7f98814c0 in definition wren\vm\wren_compiler.c:3536
    #21 0x7ff7f987f228 in wrenCompile wren\vm\wren_compiler.c:3606
    #22 0x7ff7f94233ac in compileInModule wren\vm\wren_vm.c:512
    #23 0x7ff7f94303c4 in wrenCompileSource wren\vm\wren_vm.c:1566
    #24 0x7ff7f943025d in wrenInterpret wren\vm\wren_vm.c:1545
    #25 0x7ff7f98517b9 in wrenInitializeCore wren\vm\wren_core.c:1461
    #26 0x7ff7f9420108 in wrenNewVM wren\vm\wren_vm.c:101

previously allocated by thread T0 here:
    #0 0x7ff7f8edbc8b in realloc 
    #1 0x7ff7f941fc05 in defaultReallocate wren\vm\wren_vm.c:40
    #2 0x7ff7f942064b in wrenReallocate wren\vm\wren_vm.c:261
    #3 0x7ff7f9442452 in allocateString wren\vm\wren_value.c:714
    #4 0x7ff7f944239b in wrenNewStringLength wren\vm\wren_value.c:751
    #5 0x7ff7f988be7e in readString wren\vm\wren_compiler.c:932
    #6 0x7ff7f98805a4 in nextToken wren\vm\wren_compiler.c:1062
    #7 0x7ff7f987a215 in parsePrecedence wren\vm\wren_compiler.c:2741
    #8 0x7ff7f9881487 in expression wren\vm\wren_compiler.c:2772
    #9 0x7ff7f988824a in finishBlock wren\vm\wren_compiler.c:1691
    #10 0x7ff7f9888189 in finishBody wren\vm\wren_compiler.c:1717
    #11 0x7ff7f988fc1b in method wren\vm\wren_compiler.c:3312
    #12 0x7ff7f988dcf6 in classDefinition wren\vm\wren_compiler.c:3404
    #13 0x7ff7f98814c0 in definition wren\vm\wren_compiler.c:3536
    #14 0x7ff7f987f228 in wrenCompile wren\vm\wren_compiler.c:3606
    #15 0x7ff7f94233ac in compileInModule wren\vm\wren_vm.c:512
    #16 0x7ff7f94303c4 in wrenCompileSource wren\vm\wren_vm.c:1566
    #17 0x7ff7f943025d in wrenInterpret wren\vm\wren_vm.c:1545
    #18 0x7ff7f98517b9 in wrenInitializeCore wren\vm\wren_core.c:1461
    #19 0x7ff7f9420108 in wrenNewVM wren\vm\wren_vm.c:101
mhermier commented 3 years ago

I know a potential use after free after wrenEnsureStack that does not zero the stack when growing. But that does not explains why it would trip on value hashing.

I think there are 2 different issues. One in the heap and probably one in the hash map implementation, not zeroing data correctly. The fact that we trip on a string, is probably an artefact of stack configuration.

I would start by making default stack frame insanely big by default, to be sure not to trip on potential stack corruption just to be safe, and force zeroing the stack growing part.

With that secured it should be possible to isolate the if the hash map has an error. When isolated/fixed, I would ensure if it's not wrenEnsureStack that cause all the mess. And then going down again if there is not a stack corruption.

ruby0x1 commented 3 years ago

The hash assert is because it's given an invalid object (sometimes a valid looking object of the wrong type like map, and sometimes what looks like garbage), seems to me. So the hash code in the switch seems fine, it's the input that is wrong.

Worth a try either way, just clarifying.

mhermier commented 3 years ago

Fixed:

diff --git a/src/vm/wren_compiler.c b/src/vm/wren_compiler.c
index d54b930d..51c80ee2 100644
--- a/src/vm/wren_compiler.c
+++ b/src/vm/wren_compiler.c
@@ -3640,6 +3640,7 @@ void wrenMarkCompiler(WrenVM* vm, Compiler* compiler)
 {
   wrenGrayValue(vm, compiler->parser->current.value);
   wrenGrayValue(vm, compiler->parser->previous.value);
+  wrenGrayValue(vm, compiler->parser->next.value);

   // Walk up the parent chain to mark the outer compilers too. The VM only
   // tracks the innermost one.
ruby0x1 commented 3 years ago

OH, of course 👍 That's a good find thanks!

mhermier commented 3 years ago

Was trivial once I isolated the issue to nextToken(). The first 2 lines advance the token, and my brain clicked because I was toying with the gc compiler hook 2 minutes before, and went "wait, next token is not grayed"....

ruby0x1 commented 3 years ago

Yea makes sense. I was distracted by the string because it was from trim in wren core, I thought maybe it was somehow an old issue triggered by enough constants or something. But really, I wasn't able to debug properly immediately so I'm glad to have other eyes on it!

mhermier commented 3 years ago

I'm currently valgrind verifying tests that times out, but there is one test that seems to be annoying. It seems enabling gc makes test/api/call_wren_call_root.wren fails oddly.

ruby0x1 commented 3 years ago

I currently still get the use after free (GC trying to collect an already collected object), but this is now further along inside some code I was already suspicious of, so they may be unrelated.

mhermier commented 3 years ago

Found a nasty bug in 'PUSH()' macro, thanks to the definition of WREN_DEBUG_GC_STRESS. It seems that *fiber->stackTop++ = value opens a window for initialized data (visible when used with newly created value, like PUSH(importModule(vm, fn->constants.data[READ_SHORT()]));)

ruby0x1 commented 3 years ago

Interesting, @avivbeeri mentioned some strange behavior with import order, but I haven't seen any myself.

avivbeeri commented 3 years ago

Could it be related to the particular chains of import usage that occur in DOME, compared to other embedded applications? I don't know if the import pattern is similar for Luxe, but if they are radically different it might explain why you don't run into the same issues.

I often end up with issues that are "fixed" by shuffling the import order around, which is very frustrating, as it usually happens towards the end of a project, and is very hard to diagnose.

ruby0x1 commented 3 years ago

There's two classes of issue though, one is import order where there's expected behaviour (like a value being null, because you imported the value at the time it was null) and unexpected behaviour (like corruption issues or errors in unrelated code from the import order)

avivbeeri commented 3 years ago

And DOME seems to hit both :P

ruby0x1 commented 3 years ago

We'd rather confirm the first fix is good for the corruption part since it's quite likely involved in the 0.4.0 prerelease version

mhermier commented 3 years ago
#define PUSH(value)  do { *fiber->stackTop = (value); fiber->stackTop++; } while (false)

seems to fix it, but it should probably be commented. The issue is that stackTop might be incremented prior to value evaluation, opening the window to not initialized data, if value evaluation cause a GC.

ruby0x1 commented 3 years ago

Ooh, that's clearer thanks @mhermier. I'll test this a bunch with my issues when I can.

mhermier commented 3 years ago

Knowing the issue, and writing a good diff are sometimes 2 different things... Problem is that the macros in the VM can have quite some side effects, and probably should be transformed to inline functions to avoid them...

ruby0x1 commented 3 years ago

And, I just found the cause of my issue, which isn't related to Wren but my code using the API directly.

The context is a json-like parser that converts the data to Wren objects, nothing surprising. The issue: GC is being triggered (I am parsing a lot of data rapidly) while parsing. The crash was inside wrenBlackenObjects in the GC, specifically while iterating a list.

image

I looked closely at the list section, when I make a list it's doing:

ObjList* list = wrenNewList(vm, count);
for each value {
  list->elements.data[idx] = ...
}

The real issue is that wrenNewList doesn't initialize the values (expected!) since this function isn't accessible from the API or from Wren*. The buffer for list->elements.data is entirely uninitialized memory! The contents of the list are allocated, but left uninitialized, meaning that memory is just waiting to be accessed by GC. When GC comes around, it accesses bogus memory.

//input data
list = [4, {"some":"stuff"}, 5]

wrenNewList -> elements uninitialized

i = 0 | elements[0] = NUM_VAL(4)
i = 1 | elements[1] = wrenNewMap (...) //triggers a GC, accesses elements[2]!
      | elements[2] = uninitialized

*The only way it's used in Wren is List.filled which initializes the values to a single value (which is not evaluated dynamically).


Summary and fix: if you use wrenNewList, make sure you initialize the elements:

ObjList* list = wrenNewList(vm, count);
for(int i = 0; i < count; ++i) {
  list->elements.data[i] = NULL_VAL;
}
munificent commented 3 years ago

Summary and fix: if you use wrenNewList, make sure you initialize the elements:

ObjList* list = wrenNewList(vm, count);
for(int i = 0; i < count; ++i) {
  list->elements.data[i] = NULL_VAL;
}

+1. This is an easy mistake to make. It's easy to think that nothing is looking at the object yet, when the GC could kick in at almost any time and get its grubby paws on it.