iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.57k stars 574 forks source link

[Godot] Program crash when frequent reload / unload bytecode. #13119

Open RechieKho opened 1 year ago

RechieKho commented 1 year ago

What happened?

Hi. I am currently writing an GDExtension to embed iree runtime into Godot game engine. Here is the project source code. On this f36fc7e, I found out that whenever I frequently reload / unload bytecode (aka. release iree_vm_module_t and iree_vm_context_t), the program crashes. I also create an issue on that repo as well.

As @stellaraccident (a very cool person!) suggest me to use asan, I manage to finally get some cool debug information.

Here is the asan output:

=================================================================
==63659==ERROR: AddressSanitizer: heap-use-after-free on address 0x625001ba591c at pc 0x7f4fc918ae13 bp 0x7ffcc4909d00 sp 0x7ffcc4909cf8
READ of size 4 at 0x625001ba591c thread T0
    #0 0x7f4fc918ae12 in __flatbuffers_soffset_read_from_pe /home/rechie/Documents/cxx/iree.gd/thirdparty/iree/third_party/flatcc/include/flatcc/flatcc_endian.h:89
    #1 0x7f4fc918df3e in iree_vm_BytecodeModuleDef_exported_functions /home/rechie/Documents/cxx/iree.gd/build/thirdparty/iree/runtime/src/iree/schemas/bytecode_module_def_reader.h:693
    #2 0x7f4fc9190a41 in iree_vm_bytecode_module_lookup_function /home/rechie/Documents/cxx/iree.gd/thirdparty/iree/runtime/src/iree/vm/bytecode/module.c:286
    #3 0x7f4fc943ffe7 in iree_vm_module_lookup_function_by_name /home/rechie/Documents/cxx/iree.gd/thirdparty/iree/runtime/src/iree/vm/module.c:287
    #4 0x7f4fc942b520 in iree_vm_context_run_function /home/rechie/Documents/cxx/iree.gd/thirdparty/iree/runtime/src/iree/vm/context.c:73
    #5 0x7f4fc942ca4d in iree_vm_context_release_modules /home/rechie/Documents/cxx/iree.gd/thirdparty/iree/runtime/src/iree/vm/context.c:264
    #6 0x7f4fc942d5d2 in iree_vm_context_destroy /home/rechie/Documents/cxx/iree.gd/thirdparty/iree/runtime/src/iree/vm/context.c:352
    #7 0x7f4fc942d87f in iree_vm_context_release /home/rechie/Documents/cxx/iree.gd/thirdparty/iree/runtime/src/iree/vm/context.c:380
    #8 0x7f4fc914b307 in IREEModule::unload() /home/rechie/Documents/cxx/iree.gd/src/iree_module.cpp:17
    #9 0x7f4fc914b645 in IREEModule::load(godot::String const&) /home/rechie/Documents/cxx/iree.gd/src/iree_module.cpp:42
    #10 0x7f4fc9156f09 in void godot::call_with_variant_args_ret_helper<godot::___UnexistingClass, godot::Error, godot::String const&, 0ul>(godot::___UnexistingClass*, godot::Error (godot::___UnexistingClass::*)(godot::String const&), godot::Variant const**, godot::Variant&, GDExtensionCallError&, IndexSequence<0ul>) (/home/rechie/Documents/cxx/iree.gd/sample/extension/iree-gd/libiree-gd.linux.debug.so+0x12af09)
    #11 0x7f4fc915462d in void godot::call_with_variant_args_ret_dv<godot::___UnexistingClass, godot::Error, godot::String const&>(godot::___UnexistingClass*, godot::Error (godot::___UnexistingClass::*)(godot::String const&), void const* const*, int, godot::Variant&, GDExtensionCallError&, std::vector<godot::Variant, std::allocator<godot::Variant> > const&) (/home/rechie/Documents/cxx/iree.gd/sample/extension/iree-gd/libiree-gd.linux.debug.so+0x12862d)
    #12 0x7f4fc9152644 in godot::MethodBindTR<godot::Error, godot::String const&>::call(void*, void const* const*, long, GDExtensionCallError&) const (/home/rechie/Documents/cxx/iree.gd/sample/extension/iree-gd/libiree-gd.linux.debug.so+0x126644)
    #13 0x7f4fc9259bd5 in godot::MethodBind::bind_call(void*, void*, void const* const*, long, void*, GDExtensionCallError*) /home/rechie/Documents/cxx/iree.gd/thirdparty/godot-cpp/src/core/method_bind.cpp:96
    #14 0x4572c67  (/home/rechie/bin/godot4+0x4572c67)
    #15 0x4597795  (/home/rechie/bin/godot4+0x4597795)
    #16 0x45e2a7d  (/home/rechie/bin/godot4+0x45e2a7d)
    #17 0x45e3caa  (/home/rechie/bin/godot4+0x45e3caa)
    #18 0x45e50d1  (/home/rechie/bin/godot4+0x45e50d1)
    #19 0x45e5287  (/home/rechie/bin/godot4+0x45e5287)
    #20 0x1e67d69  (/home/rechie/bin/godot4+0x1e67d69)
    #21 0x1c72255  (/home/rechie/bin/godot4+0x1c72255)
    #22 0x1c72b07  (/home/rechie/bin/godot4+0x1c72b07)
    #23 0x1c431ab  (/home/rechie/bin/godot4+0x1c431ab)
    #24 0x43ce431  (/home/rechie/bin/godot4+0x43ce431)
    #25 0x45e1039  (/home/rechie/bin/godot4+0x45e1039)
    #26 0x1c62c98  (/home/rechie/bin/godot4+0x1c62c98)
    #27 0x1dadabd  (/home/rechie/bin/godot4+0x1dadabd)
    #28 0x1d91cc9  (/home/rechie/bin/godot4+0x1d91cc9)
    #29 0x45e1039  (/home/rechie/bin/godot4+0x45e1039)
    #30 0x1c16bf6  (/home/rechie/bin/godot4+0x1c16bf6)
    #31 0x1c17bdf  (/home/rechie/bin/godot4+0x1c17bdf)
    #32 0x43ce431  (/home/rechie/bin/godot4+0x43ce431)
    #33 0x45e1039  (/home/rechie/bin/godot4+0x45e1039)
    #34 0x10b4d2a  (/home/rechie/bin/godot4+0x10b4d2a)
    #35 0x2d249d0  (/home/rechie/bin/godot4+0x2d249d0)
    #36 0x2baa98c  (/home/rechie/bin/godot4+0x2baa98c)
    #37 0x2c026b5  (/home/rechie/bin/godot4+0x2c026b5)
    #38 0x2c05760  (/home/rechie/bin/godot4+0x2c05760)
    #39 0x2c05ef4  (/home/rechie/bin/godot4+0x2c05ef4)
    #40 0x2c1d004  (/home/rechie/bin/godot4+0x2c1d004)
    #41 0xe8d753  (/home/rechie/bin/godot4+0xe8d753)
    #42 0x434ded3  (/home/rechie/bin/godot4+0x434ded3)
    #43 0x434f052  (/home/rechie/bin/godot4+0x434f052)
    #44 0xe8fd9e  (/home/rechie/bin/godot4+0xe8fd9e)
    #45 0xe03981  (/home/rechie/bin/godot4+0xe03981)
    #46 0x7f4fcf638d09 in __libc_start_main ../csu/libc-start.c:308
    #47 0xe23a0d  (/home/rechie/bin/godot4+0xe23a0d)

0x625001ba591c is located 28 bytes inside of 8208-byte region [0x625001ba5900,0x625001ba7910)
freed by thread T0 here:
    #0 0x7f4fcfa25b6f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:123
    #1 0x44d8e20  (/home/rechie/bin/godot4+0x44d8e20)
    #2 0x7f4fc914b99c in IREEModule::load(godot::String const&) /home/rechie/Documents/cxx/iree.gd/src/iree_module.cpp:45
    #3 0x7f4fc9156f09 in void godot::call_with_variant_args_ret_helper<godot::___UnexistingClass, godot::Error, godot::String const&, 0ul>(godot::___UnexistingClass*, godot::Error (godot::___UnexistingClass::*)(godot::String const&), godot::Variant const**, godot::Variant&, GDExtensionCallError&, IndexSequence<0ul>) (/home/rechie/Documents/cxx/iree.gd/sample/extension/iree-gd/libiree-gd.linux.debug.so+0x12af09)
    #4 0x7f4fc915462d in void godot::call_with_variant_args_ret_dv<godot::___UnexistingClass, godot::Error, godot::String const&>(godot::___UnexistingClass*, godot::Error (godot::___UnexistingClass::*)(godot::String const&), void const* const*, int, godot::Variant&, GDExtensionCallError&, std::vector<godot::Variant, std::allocator<godot::Variant> > const&) (/home/rechie/Documents/cxx/iree.gd/sample/extension/iree-gd/libiree-gd.linux.debug.so+0x12862d)
    #5 0x7f4fc9152644 in godot::MethodBindTR<godot::Error, godot::String const&>::call(void*, void const* const*, long, GDExtensionCallError&) const (/home/rechie/Documents/cxx/iree.gd/sample/extension/iree-gd/libiree-gd.linux.debug.so+0x126644)
    #6 0x7f4fc9259bd5 in godot::MethodBind::bind_call(void*, void*, void const* const*, long, void*, GDExtensionCallError*) /home/rechie/Documents/cxx/iree.gd/thirdparty/godot-cpp/src/core/method_bind.cpp:96
    #7 0x4572c67  (/home/rechie/bin/godot4+0x4572c67)

previously allocated by thread T0 here:
    #0 0x7f4fcfa25e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x4b8fdf2  (/home/rechie/bin/godot4+0x4b8fdf2)
    #2 0x41b58ab2  (<unknown module>)

SUMMARY: AddressSanitizer: heap-use-after-free /home/rechie/Documents/cxx/iree.gd/thirdparty/iree/third_party/flatcc/include/flatcc/flatcc_endian.h:89 in __flatbuffers_soffset_read_from_pe
Shadow bytes around the buggy address:
  0x0c4a8036cad0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a8036cae0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a8036caf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a8036cb00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a8036cb10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c4a8036cb20: fd fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4a8036cb30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4a8036cb40: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4a8036cb50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4a8036cb60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4a8036cb70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==63659==ABORTING

Steps to reproduce your issue

Note: This requires you to have Godot 4.0.2

  1. Clone IREE.gd.
  2. Extract this zip into sample/ directory of the root of the repo.
  3. Build it following the README.md.
  4. Run Godot 4 and open the sample project.
  5. Wildly load and unload the bytecode in sample/bytecodes by setting load_path of sample/resources/new-iree-module.tres (Here is the video demo)

What component(s) does this issue relate to?

Runtime

Version information

For the iree-dist tools: candidate-20230415.490 For the runtime: It is a custom fork to fix cmake linker error.

Additional context

No response

benvanik commented 1 year ago

Nice progress!

It's tricky to go dig into user projects so reproducers that don't require doing that will always be useful in getting assistance. It also helps to isolate issues in the IREE codebase from issues in user codebases. There are definitely bugs in IREE but it's harder for us to evaluate in-situ :)

In this case I suspect your issue is in your code: you are not (AFAICT) retaining your file data - line 45 of iree_module.cpp is getting the data from somewhere, passing in a reference to it to iree_vm_bytecode_module_create, but not retaining it and setting up the release (instead of iree_allocator_null) such that whatever decides to unload the file can do it whenever it wants. You're probably dropping the data (indirectly or directly) before the unload(), which still needs that data, hence the explosion when trying to access it.

stellaraccident commented 1 year ago

Sorry, forgot to put my response from the discord thread: What is keeping your bytecode_data alive? That looks like a temporary and it needs to be kept valid for the life of the module. The use after free is happening on context destroy when it tries to find the module's destructor export -- which requires access to the bytecode which was only on the stack at load time.

We don't make a copy of the bytecode. It can be actually large and (for production uses is often backed by an mmap) so the caller must arrange to keep it live.

https://discord.com/channels/689900678990135345/1097386769143562284/1097541566194798733

stellaraccident commented 1 year ago

I wonder if we could name the parameter to hint this better? caller_owned_archive_contents? Signals that it is special without studying the docs.