nod-ai / TheRock

The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm
Apache License 2.0
5 stars 2 forks source link

WIP Use local manifest that builds against development head #9

Closed sogartar closed 3 months ago

sogartar commented 5 months ago

This build fails with

/home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/device/rocm/rocdevice.cpp:1844:50: error: ‘HSA_AMD_AGENT_INFO_MEMORY_PROPERTIES’ was not declared in this scope; did you mean ‘HSA_AMD_AGENT_INFO_MEMORY_WIDTH’?
 1844 |                               (hsa_agent_info_t) HSA_AMD_AGENT_INFO_MEMORY_PROPERTIES,
      |                                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                                  HSA_AMD_AGENT_INFO_MEMORY_WIDTH
/home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/device/rocm/rocdevice.cpp:1850:43: error: ‘HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU’ was not declared in this scope
 1850 |   if (hsa_flag_isset64(memory_properties, HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU)) {
      |                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/device/rocm/rocdevice.cpp:1850:7: error: ‘hsa_flag_isset64’ was not declared in this scope
 1850 |   if (hsa_flag_isset64(memory_properties, HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU)) {
      |       ^~~~~~~~~~~~~~~~

probably due to https://github.com/ROCm/ROCR-Runtime not being synced with the private source.

stellaraccident commented 5 months ago

Yeah, probably missing patches in a dep. I would just comment those things out to try.

stellaraccident commented 5 months ago

Yeah, probably missing patches in a dep. I would just comment those things out to try.

But would also be good to peak behind the firewall and see what stash of patches we are missing.

sogartar commented 5 months ago

The build passes when I remove the missing declarations stuff. But I get some error

CMake Error: Error processing file: /home/nmeganat/boian/ws/TheRock/build/cmake_install.cmake

from

cmake --install build --component amdgpu-runtime
sogartar commented 5 months ago

Nevermind, it was my bad for the install error.

sogartar commented 5 months ago

The sanity test

dlopen-hip libamdhip64.so

fails with

HIP VERSION: 393e532
free(): double free detected in tcache 2
Aborted (core dumped)
stellaraccident commented 5 months ago

The sanity test

dlopen-hip libamdhip64.so

fails with

HIP VERSION: 393e532
free(): double free detected in tcache 2
Aborted (core dumped)

We should get a stack trace and report. This must be happening on dlclose and I would not be completely shocked to see instability on such a thing.

sogartar commented 5 months ago

It does a double free during library unloading. With address sanitizer:

LD_PRELOAD=/usr/lib/llvm-15/lib/clang/15.0.7/lib/linux/libclang_rt.asan-x86_64.so ASAN_OPTIONS=detect_leaks=0 ./build/Debug/dlopen-hip build/Debug/staging_install/runtime_dynamic/lib/libamdhip64.so
HIP VERSION: 393e532
=================================================================
==3709619==ERROR: AddressSanitizer: attempting double-free on 0x604000022410 in thread T0:
    #0 0x7f05bd8de982 in operator delete(void*, unsigned long) (/usr/lib/llvm-15/lib/clang/15.0.7/lib/linux/libclang_rt.asan-x86_64.so+0xde982) (BuildId: 03f6e5fb88d7ac33c0c90c7822c039793c28fca2)
    #1 0x7f05ba3922cc in __gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>::deallocate(std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*, unsigned long) /usr/include/c++/11/ext/new_allocator.h:145:19
    #2 0x7f05ba391d18 in std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>>::deallocate(std::allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>&, std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*, unsigned long) /usr/include/c++/11/bits/alloc_traits.h:496:23
    #3 0x7f05ba391250 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_put_node(std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*) /usr/include/c++/11/bits/stl_tree.h:565:34
    #4 0x7f05ba3900a5 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_drop_node(std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*) /usr/include/c++/11/bits/stl_tree.h:632:13
    #5 0x7f05ba38f456 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_erase(std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*) /usr/include/c++/11/bits/stl_tree.h:1891:16
    #6 0x7f05ba391581 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::clear() /usr/include/c++/11/bits/stl_tree.h:1254:10
    #7 0x7f05ba390700 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_erase_aux(std::_Rb_tree_const_iterator<std::pair<unsigned long const, unsigned long>>, std::_Rb_tree_const_iterator<std::pair<unsigned long const, unsigned long>>) /usr/include/c++/11/bits/stl_tree.h:2498:7
    #8 0x7f05ba38f66c in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::erase(unsigned long const&) /usr/include/c++/11/bits/stl_tree.h:2512:19
    #9 0x7f05ba38eb24 in std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::erase(unsigned long const&) /usr/include/c++/11/bits/stl_map.h:1069:26
    #10 0x7f05ba38b419 in amd::SvmBuffer::Remove(unsigned long) /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/platform/memory.cpp:1511:19
    #11 0x7f05ba38b6cb in amd::SvmBuffer::free(amd::Context const&, void*) /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/platform/memory.cpp:1539:9
    #12 0x7f05ba3adec5 in roc::Device::~Device() /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/device/rocm/rocdevice.cpp:262:25
    #13 0x7f05ba3ae40f in roc::Device::~Device() /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/device/rocm/rocdevice.cpp:312:1
    #14 0x7f05ba328d43 in amd::Device::tearDown() /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/device/device.cpp:512:28
    #15 0x7f05ba3a19b5 in amd::Runtime::tearDown() /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/platform/runtime.cpp:94:19
    #16 0x7f05ba3a1ade in amd::hipTearDown() /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/platform/runtime.cpp:126:22
    #17 0x7f05be3dc24d in _dl_fini elf/dl-fini.c:142:9
    #18 0x7f05bd445494 in __run_exit_handlers stdlib/exit.c:113:8
    #19 0x7f05bd44560f in exit stdlib/exit.c:143:3
    #20 0x7f05bd429d96 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:74:3
    #21 0x7f05bd429e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #22 0x5629a2f7c1a4 in _start (/home/nmeganat/boian/ws/TheRock/build/Debug/dlopen-hip+0x11a4) (BuildId: eb1b2c1b24144806b4b78e5191024dfd867e58da)

0x604000022410 is located 0 bytes inside of 48-byte region [0x604000022410,0x604000022440)
freed by thread T0 here:
    #0 0x7f05bd8de982 in operator delete(void*, unsigned long) (/usr/lib/llvm-15/lib/clang/15.0.7/lib/linux/libclang_rt.asan-x86_64.so+0xde982) (BuildId: 03f6e5fb88d7ac33c0c90c7822c039793c28fca2)
    #1 0x7f05ba3922cc in __gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>::deallocate(std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*, unsigned long) /usr/include/c++/11/ext/new_allocator.h:145:19
    #2 0x7f05ba391d18 in std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>>::deallocate(std::allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>&, std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*, unsigned long) /usr/include/c++/11/bits/alloc_traits.h:496:23
    #3 0x7f05ba391250 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_put_node(std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*) /usr/include/c++/11/bits/stl_tree.h:565:34
    #4 0x7f05ba3900a5 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_drop_node(std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*) /usr/include/c++/11/bits/stl_tree.h:632:13
    #5 0x7f05ba38f456 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_erase(std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*) /usr/include/c++/11/bits/stl_tree.h:1891:16
    #6 0x7f05ba38ea6f in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::~_Rb_tree() /usr/include/c++/11/bits/stl_tree.h:984:17
    #7 0x7f05ba392809 in std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::~map() /usr/include/c++/11/bits/stl_map.h:302:7
    #8 0x7f05bd445494 in __run_exit_handlers stdlib/exit.c:113:8

previously allocated by thread T0 here:
    #0 0x7f05bd8ddd1d in operator new(unsigned long) (/usr/lib/llvm-15/lib/clang/15.0.7/lib/linux/libclang_rt.asan-x86_64.so+0xddd1d) (BuildId: 03f6e5fb88d7ac33c0c90c7822c039793c28fca2)
    #1 0x7f05ba392352 in __gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>::allocate(unsigned long, void const*) /usr/include/c++/11/ext/new_allocator.h:127:41
    #2 0x7f05ba391d47 in std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>>::allocate(std::allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>&, unsigned long) /usr/include/c++/11/bits/alloc_traits.h:464:28
    #3 0x7f05ba39127c in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_get_node() /usr/include/c++/11/bits/stl_tree.h:561:39
    #4 0x7f05ba3900df in std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>* std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_create_node<std::pair<unsigned long, unsigned long>>(std::pair<unsigned long, unsigned long>&&) /usr/include/c++/11/bits/stl_tree.h:611:34
    #5 0x7f05ba38f4c9 in std::pair<std::_Rb_tree_iterator<std::pair<unsigned long const, unsigned long>>, bool> std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_emplace_unique<std::pair<unsigned long, unsigned long>>(std::pair<unsigned long, unsigned long>&&) /usr/include/c++/11/bits/stl_tree.h:2384:33
    #6 0x7f05ba38eaf7 in _ZNSt3mapImmSt4lessImESaISt4pairIKmmEEE6insertIS2_ImmEEENSt9enable_ifIXsrSt16is_constructibleIS4_JT_EE5valueES2_ISt17_Rb_tree_iteratorIS4_EbEE4typeEOSB_ /usr/include/c++/11/bits/stl_map.h:817:33
    #7 0x7f05ba38b386 in amd::SvmBuffer::Add(unsigned long, unsigned long) /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/platform/memory.cpp:1506:20
    #8 0x7f05ba38b6a4 in amd::SvmBuffer::malloc(amd::Context&, unsigned long, unsigned long, unsigned long, amd::Device const*) /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/platform/memory.cpp:1534:6
    #9 0x7f05ba3b0174 in roc::Device::init() /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/device/rocm/rocdevice.cpp:605:66
    #10 0x7f05ba328c07 in amd::Device::init() /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/device/device.cpp:488:28
    #11 0x7f05ba3a169c in amd::Runtime::init() /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/platform/runtime.cpp:75:56
    #12 0x7f05ba0519fd in hip::init(bool*) /home/nmeganat/boian/ws/TheRock/repo/sources/clr/hipamd/src/hip_context.cpp:45:26
    #13 0x7f05ba068481 in void std::__invoke_impl<void, void (&)(bool*), bool*>(std::__invoke_other, void (&)(bool*), bool*&&) /usr/include/c++/11/bits/invoke.h:61:36
    #14 0x7f05ba0673f9 in std::__invoke_result<void (&)(bool*), bool*>::type std::__invoke<void (&)(bool*), bool*>(void (&)(bool*), bool*&&) /usr/include/c++/11/bits/invoke.h:96:40
    #15 0x7f05ba06576b in void std::call_once<void (&)(bool*), bool*>(std::once_flag&, void (&)(bool*), bool*&&)::'lambda'()::operator()() const /usr/include/c++/11/mutex:776:17
    #16 0x7f05ba06742c in std::once_flag::_Prepare_execution::_Prepare_execution<void std::call_once<void (&)(bool*), bool*>(std::once_flag&, void (&)(bool*), bool*&&)::'lambda'()>(void (&)(bool*))::'lambda'()::operator()() const /usr/include/c++/11/mutex:712:64
    #17 0x7f05ba067441 in std::once_flag::_Prepare_execution::_Prepare_execution<void std::call_once<void (&)(bool*), bool*>(std::once_flag&, void (&)(bool*), bool*&&)::'lambda'()>(void (&)(bool*))::'lambda'()::_FUN() /usr/include/c++/11/mutex:712:16
    #18 0x7f05bd499ee7 in __pthread_once_slow nptl/pthread_once.c:116:7

SUMMARY: AddressSanitizer: double-free (/usr/lib/llvm-15/lib/clang/15.0.7/lib/linux/libclang_rt.asan-x86_64.so+0xde982) (BuildId: 03f6e5fb88d7ac33c0c90c7822c039793c28fca2) in operator delete(void*, unsigned long)
==3709619==ABORTING
stellaraccident commented 5 months ago

It does a double free during library unloading. With address sanitizer:

LD_PRELOAD=/usr/lib/llvm-15/lib/clang/15.0.7/lib/linux/libclang_rt.asan-x86_64.so ASAN_OPTIONS=detect_leaks=0 ./build/Debug/dlopen-hip build/Debug/staging_install/runtime_dynamic/lib/libamdhip64.so
HIP VERSION: 393e532
=================================================================
==3709619==ERROR: AddressSanitizer: attempting double-free on 0x604000022410 in thread T0:
    #0 0x7f05bd8de982 in operator delete(void*, unsigned long) (/usr/lib/llvm-15/lib/clang/15.0.7/lib/linux/libclang_rt.asan-x86_64.so+0xde982) (BuildId: 03f6e5fb88d7ac33c0c90c7822c039793c28fca2)
    #1 0x7f05ba3922cc in __gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>::deallocate(std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*, unsigned long) /usr/include/c++/11/ext/new_allocator.h:145:19
    #2 0x7f05ba391d18 in std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>>::deallocate(std::allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>&, std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*, unsigned long) /usr/include/c++/11/bits/alloc_traits.h:496:23
    #3 0x7f05ba391250 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_put_node(std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*) /usr/include/c++/11/bits/stl_tree.h:565:34
    #4 0x7f05ba3900a5 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_drop_node(std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*) /usr/include/c++/11/bits/stl_tree.h:632:13
    #5 0x7f05ba38f456 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_erase(std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*) /usr/include/c++/11/bits/stl_tree.h:1891:16
    #6 0x7f05ba391581 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::clear() /usr/include/c++/11/bits/stl_tree.h:1254:10
    #7 0x7f05ba390700 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_erase_aux(std::_Rb_tree_const_iterator<std::pair<unsigned long const, unsigned long>>, std::_Rb_tree_const_iterator<std::pair<unsigned long const, unsigned long>>) /usr/include/c++/11/bits/stl_tree.h:2498:7
    #8 0x7f05ba38f66c in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::erase(unsigned long const&) /usr/include/c++/11/bits/stl_tree.h:2512:19
    #9 0x7f05ba38eb24 in std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::erase(unsigned long const&) /usr/include/c++/11/bits/stl_map.h:1069:26
    #10 0x7f05ba38b419 in amd::SvmBuffer::Remove(unsigned long) /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/platform/memory.cpp:1511:19
    #11 0x7f05ba38b6cb in amd::SvmBuffer::free(amd::Context const&, void*) /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/platform/memory.cpp:1539:9
    #12 0x7f05ba3adec5 in roc::Device::~Device() /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/device/rocm/rocdevice.cpp:262:25
    #13 0x7f05ba3ae40f in roc::Device::~Device() /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/device/rocm/rocdevice.cpp:312:1
    #14 0x7f05ba328d43 in amd::Device::tearDown() /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/device/device.cpp:512:28
    #15 0x7f05ba3a19b5 in amd::Runtime::tearDown() /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/platform/runtime.cpp:94:19
    #16 0x7f05ba3a1ade in amd::hipTearDown() /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/platform/runtime.cpp:126:22
    #17 0x7f05be3dc24d in _dl_fini elf/dl-fini.c:142:9
    #18 0x7f05bd445494 in __run_exit_handlers stdlib/exit.c:113:8
    #19 0x7f05bd44560f in exit stdlib/exit.c:143:3
    #20 0x7f05bd429d96 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:74:3
    #21 0x7f05bd429e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #22 0x5629a2f7c1a4 in _start (/home/nmeganat/boian/ws/TheRock/build/Debug/dlopen-hip+0x11a4) (BuildId: eb1b2c1b24144806b4b78e5191024dfd867e58da)

0x604000022410 is located 0 bytes inside of 48-byte region [0x604000022410,0x604000022440)
freed by thread T0 here:
    #0 0x7f05bd8de982 in operator delete(void*, unsigned long) (/usr/lib/llvm-15/lib/clang/15.0.7/lib/linux/libclang_rt.asan-x86_64.so+0xde982) (BuildId: 03f6e5fb88d7ac33c0c90c7822c039793c28fca2)
    #1 0x7f05ba3922cc in __gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>::deallocate(std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*, unsigned long) /usr/include/c++/11/ext/new_allocator.h:145:19
    #2 0x7f05ba391d18 in std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>>::deallocate(std::allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>&, std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*, unsigned long) /usr/include/c++/11/bits/alloc_traits.h:496:23
    #3 0x7f05ba391250 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_put_node(std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*) /usr/include/c++/11/bits/stl_tree.h:565:34
    #4 0x7f05ba3900a5 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_drop_node(std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*) /usr/include/c++/11/bits/stl_tree.h:632:13
    #5 0x7f05ba38f456 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_erase(std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>*) /usr/include/c++/11/bits/stl_tree.h:1891:16
    #6 0x7f05ba38ea6f in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::~_Rb_tree() /usr/include/c++/11/bits/stl_tree.h:984:17
    #7 0x7f05ba392809 in std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::~map() /usr/include/c++/11/bits/stl_map.h:302:7
    #8 0x7f05bd445494 in __run_exit_handlers stdlib/exit.c:113:8

previously allocated by thread T0 here:
    #0 0x7f05bd8ddd1d in operator new(unsigned long) (/usr/lib/llvm-15/lib/clang/15.0.7/lib/linux/libclang_rt.asan-x86_64.so+0xddd1d) (BuildId: 03f6e5fb88d7ac33c0c90c7822c039793c28fca2)
    #1 0x7f05ba392352 in __gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>::allocate(unsigned long, void const*) /usr/include/c++/11/ext/new_allocator.h:127:41
    #2 0x7f05ba391d47 in std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>>::allocate(std::allocator<std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>>&, unsigned long) /usr/include/c++/11/bits/alloc_traits.h:464:28
    #3 0x7f05ba39127c in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_get_node() /usr/include/c++/11/bits/stl_tree.h:561:39
    #4 0x7f05ba3900df in std::_Rb_tree_node<std::pair<unsigned long const, unsigned long>>* std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_create_node<std::pair<unsigned long, unsigned long>>(std::pair<unsigned long, unsigned long>&&) /usr/include/c++/11/bits/stl_tree.h:611:34
    #5 0x7f05ba38f4c9 in std::pair<std::_Rb_tree_iterator<std::pair<unsigned long const, unsigned long>>, bool> std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned long>, std::_Select1st<std::pair<unsigned long const, unsigned long>>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long>>>::_M_emplace_unique<std::pair<unsigned long, unsigned long>>(std::pair<unsigned long, unsigned long>&&) /usr/include/c++/11/bits/stl_tree.h:2384:33
    #6 0x7f05ba38eaf7 in _ZNSt3mapImmSt4lessImESaISt4pairIKmmEEE6insertIS2_ImmEEENSt9enable_ifIXsrSt16is_constructibleIS4_JT_EE5valueES2_ISt17_Rb_tree_iteratorIS4_EbEE4typeEOSB_ /usr/include/c++/11/bits/stl_map.h:817:33
    #7 0x7f05ba38b386 in amd::SvmBuffer::Add(unsigned long, unsigned long) /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/platform/memory.cpp:1506:20
    #8 0x7f05ba38b6a4 in amd::SvmBuffer::malloc(amd::Context&, unsigned long, unsigned long, unsigned long, amd::Device const*) /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/platform/memory.cpp:1534:6
    #9 0x7f05ba3b0174 in roc::Device::init() /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/device/rocm/rocdevice.cpp:605:66
    #10 0x7f05ba328c07 in amd::Device::init() /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/device/device.cpp:488:28
    #11 0x7f05ba3a169c in amd::Runtime::init() /home/nmeganat/boian/ws/TheRock/repo/sources/clr/rocclr/platform/runtime.cpp:75:56
    #12 0x7f05ba0519fd in hip::init(bool*) /home/nmeganat/boian/ws/TheRock/repo/sources/clr/hipamd/src/hip_context.cpp:45:26
    #13 0x7f05ba068481 in void std::__invoke_impl<void, void (&)(bool*), bool*>(std::__invoke_other, void (&)(bool*), bool*&&) /usr/include/c++/11/bits/invoke.h:61:36
    #14 0x7f05ba0673f9 in std::__invoke_result<void (&)(bool*), bool*>::type std::__invoke<void (&)(bool*), bool*>(void (&)(bool*), bool*&&) /usr/include/c++/11/bits/invoke.h:96:40
    #15 0x7f05ba06576b in void std::call_once<void (&)(bool*), bool*>(std::once_flag&, void (&)(bool*), bool*&&)::'lambda'()::operator()() const /usr/include/c++/11/mutex:776:17
    #16 0x7f05ba06742c in std::once_flag::_Prepare_execution::_Prepare_execution<void std::call_once<void (&)(bool*), bool*>(std::once_flag&, void (&)(bool*), bool*&&)::'lambda'()>(void (&)(bool*))::'lambda'()::operator()() const /usr/include/c++/11/mutex:712:64
    #17 0x7f05ba067441 in std::once_flag::_Prepare_execution::_Prepare_execution<void std::call_once<void (&)(bool*), bool*>(std::once_flag&, void (&)(bool*), bool*&&)::'lambda'()>(void (&)(bool*))::'lambda'()::_FUN() /usr/include/c++/11/mutex:712:16
    #18 0x7f05bd499ee7 in __pthread_once_slow nptl/pthread_once.c:116:7

SUMMARY: AddressSanitizer: double-free (/usr/lib/llvm-15/lib/clang/15.0.7/lib/linux/libclang_rt.asan-x86_64.so+0xde982) (BuildId: 03f6e5fb88d7ac33c0c90c7822c039793c28fca2) in operator delete(void*, unsigned long)
==3709619==ABORTING

Can you file an issue, include our simple reproducer, and ask that they have a test for library unload on their side.

sogartar commented 5 months ago

Regarding the HSA_AMD_AGENT_INFO_MEMORY_PROPERTIES related missing symbols. I am not sure we can just remove that piece. It is used during Device creation.

sogartar commented 5 months ago

@stellaraccident I kind of cargo culted the patches with one fix. They seem reasonable. Do you see a reason not to include them now?

sogartar commented 3 months ago

This is stale. closing.