tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
303 stars 26 forks source link

ASan error from watcher #9420

Open yan-zaretskiy opened 3 weeks ago

yan-zaretskiy commented 3 weeks ago

ASan reports the heap-use-after-free error coming from watcher:

=================================================================
==385391==ERROR: AddressSanitizer: heap-use-after-free on address 0x5120000f01c0 at pc 0x7f2cc47c4751 bp 0x7ffebf076170 sp 0x7ffebf076168
WRITE of size 1 at 0x5120000f01c0 thread T0
    #0 0x7f2cc47c4750 in std::__1::char_traits<char>::assign[abi:ue170006](char&, char const&) /usr/lib/llvm-17/bin/../include/c++/v1/__string/char_traits.h:189:73
    #1 0x7f2cc47c4750 in std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::__assign_short(char const*, unsigned long) /usr/lib/llvm-17/bin/../include/c++/v1/string:2028:7
    #2 0x7f2cc47c4750 in std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::assign(char const*) /usr/lib/llvm-17/bin/../include/c++/v1/string:2649:25
    #3 0x7f2cc47c4750 in std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::operator=[abi:ue170006](char const*) /usr/lib/llvm-17/bin/../include/c++/v1/string:1122:60
    #4 0x7f2cc47c4750 in tt::watcher_attach(tt::tt_metal::Device*) /home/ubuntu/dev/tt-metal/build/../tt_metal/impl/debug/watcher_server.cpp:1085:44
    #5 0x7f2cc4565bb9 in tt::DevicePool::initialize_device(tt::tt_metal::Device*) const /home/ubuntu/dev/tt-metal/build/../tt_metal/impl/device/device_pool.cpp:115:5
    #6 0x7f2cc4568d93 in tt::DevicePool::init_firmware_on_active_devices() const /home/ubuntu/dev/tt-metal/build/../tt_metal/impl/device/device_pool.cpp:223:15
    #7 0x55559448f58e in tt::DevicePool::initialize(std::__1::vector<int, std::__1::allocator<int>>, unsigned char, unsigned long, std::__1::vector<unsigned int, std::__1::allocator<unsigned int>> const&, bool) /home/ubuntu/dev/tt-metal/build/../tt_metal/impl/device/device_pool.hpp
    #8 0x55559452ec84 in CommonFixture::SetUp() /home/ubuntu/dev/tt-metal/build/../tests/tt_metal/tt_metal/unit_tests_common/common/common_fixture.hpp:90:9
    #9 0x55559498997a in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe8797a) (BuildId: ab735720930ae913)
    #10 0x555594961ed9 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe5fed9) (BuildId: ab735720930ae913)
    #11 0x555594945c2b in testing::Test::Run() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe43c2b) (BuildId: ab735720930ae913)
    #12 0x5555949466b1 in testing::TestInfo::Run() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe446b1) (BuildId: ab735720930ae913)
    #13 0x555594946dbc in testing::TestSuite::Run() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe44dbc) (BuildId: ab735720930ae913)
    #14 0x555594954ddc in testing::internal::UnitTestImpl::RunAllTests() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe52ddc) (BuildId: ab735720930ae913)
    #15 0x55559498fc2a in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe8dc2a) (BuildId: ab735720930ae913)
    #16 0x555594964249 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe62249) (BuildId: ab735720930ae913)
    #17 0x5555949549a4 in testing::UnitTest::Run() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe529a4) (BuildId: ab735720930ae913)
    #18 0x555594931b60 in RUN_ALL_TESTS() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe2fb60) (BuildId: ab735720930ae913)
    #19 0x555594931b3c in main (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe2fb3c) (BuildId: ab735720930ae913)
    #20 0x7f2cc30aa082 in __libc_start_main /build/glibc-LcI20x/glibc-2.31/csu/../csu/libc-start.c:308:16
    #21 0x5555943aa94d in _start (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0x8a894d) (BuildId: ab735720930ae913)

0x5120000f01c0 is located 0 bytes inside of 304-byte region [0x5120000f01c0,0x5120000f02f0)
freed by thread T0 here:
    #0 0x555594482d5d in operator delete(void*) (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0x980d5d) (BuildId: ab735720930ae913)
    #1 0x5555946b6379 in void std::__1::__libcpp_operator_delete[abi:ue170006]<void*>(void*) /usr/lib/llvm-17/bin/../include/c++/v1/new:278:3
    #2 0x5555946b6379 in void std::__1::__do_deallocate_handle_size[abi:ue170006]<>(void*, unsigned long) /usr/lib/llvm-17/bin/../include/c++/v1/new:302:10
    #3 0x5555946b6379 in std::__1::__libcpp_deallocate[abi:ue170006](void*, unsigned long, unsigned long) /usr/lib/llvm-17/bin/../include/c++/v1/new:318:14
    #4 0x5555946b6379 in std::__1::allocator<char>::deallocate[abi:ue170006](char*, unsigned long) /usr/lib/llvm-17/bin/../include/c++/v1/__memory/allocator.h:130:13
    #5 0x5555946b6379 in std::__1::allocator_traits<std::__1::allocator<char>>::deallocate[abi:ue170006](std::__1::allocator<char>&, char*, unsigned long) /usr/lib/llvm-17/bin/../include/c++/v1/__memory/allocator_traits.h:288:13
    #6 0x5555946b6379 in std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::~basic_string() /usr/lib/llvm-17/bin/../include/c++/v1/string:1096:7
    #7 0x5555946b6379 in RunTest(WatcherFixture*, tt::tt_metal::Device*, debug_sanitize_which_riscv) /home/ubuntu/dev/tt-metal/build/../tests/tt_metal/tt_metal/unit_tests_common/watcher/test_assert.cpp:178:1
    #8 0x5555946bf711 in std::__1::function<void (WatcherFixture*, tt::tt_metal::Device*)>::operator()(WatcherFixture*, tt::tt_metal::Device*) const /usr/lib/llvm-17/bin/../include/c++/v1/__functional/function.h:1168:12
    #9 0x5555946bf711 in WatcherFixture::RunTestOnDevice(std::__1::function<void (WatcherFixture*, tt::tt_metal::Device*)> const&, tt::tt_metal::Device*)::'lambda'()::operator()() const /home/ubuntu/dev/tt-metal/build/../tests/tt_metal/tt_metal/unit_tests_common/common/watcher_fixture.hpp:80:13
    #10 0x5555946bf711 in decltype(std::declval<WatcherFixture::RunTestOnDevice(std::__1::function<void (WatcherFixture*, tt::tt_metal::Device*)> const&, tt::tt_metal::Device*)::'lambda'()&>()()) std::__1::__invoke[abi:ue170006]<WatcherFixture::RunTestOnDevice(std::__1::function<void (WatcherFixture*, tt::tt_metal::Device*)> const&, tt::tt_metal::Device*)::'lambda'()&>(WatcherFixture::RunTestOnDevice(std::__1::function<void (WatcherFixture*, tt::tt_metal::Device*)> const&, tt::tt_metal::Device*)::'lambda'()&) /usr/lib/llvm-17/bin/../include/c++/v1/__type_traits/invoke.h:340:25
    #11 0x5555946bf711 in void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ue170006]<WatcherFixture::RunTestOnDevice(std::__1::function<void (WatcherFixture*, tt::tt_metal::Device*)> const&, tt::tt_metal::Device*)::'lambda'()&>(WatcherFixture::RunTestOnDevice(std::__1::function<void (WatcherFixture*, tt::tt_metal::Device*)> const&, tt::tt_metal::Device*)::'lambda'()&) /usr/lib/llvm-17/bin/../include/c++/v1/__type_traits/invoke.h:415:5
    #12 0x5555946bf711 in std::__1::__function::__alloc_func<WatcherFixture::RunTestOnDevice(std::__1::function<void (WatcherFixture*, tt::tt_metal::Device*)> const&, tt::tt_metal::Device*)::'lambda'(), std::__1::allocator<WatcherFixture::RunTestOnDevice(std::__1::function<void (WatcherFixture*, tt::tt_metal::Device*)> const&, tt::tt_metal::Device*)::'lambda'()>, void ()>::operator()[abi:ue170006]() /usr/lib/llvm-17/bin/../include/c++/v1/__functional/function.h:192:16
    #13 0x5555946bf711 in std::__1::__function::__func<WatcherFixture::RunTestOnDevice(std::__1::function<void (WatcherFixture*, tt::tt_metal::Device*)> const&, tt::tt_metal::Device*)::'lambda'(), std::__1::allocator<WatcherFixture::RunTestOnDevice(std::__1::function<void (WatcherFixture*, tt::tt_metal::Device*)> const&, tt::tt_metal::Device*)::'lambda'()>, void ()>::operator()() /usr/lib/llvm-17/bin/../include/c++/v1/__functional/function.h:363:12
    #14 0x55559460459c in std::__1::function<void ()>::operator()() const /usr/lib/llvm-17/bin/../include/c++/v1/__functional/function.h:1168:12
    #15 0x55559460459c in CommonFixture::RunTestOnDevice(std::__1::function<void ()> const&, tt::tt_metal::Device*) /home/ubuntu/dev/tt-metal/build/../tests/tt_metal/tt_metal/unit_tests_common/common/common_fixture.hpp:127:9
    #16 0x5555946bc924 in WatcherFixture::RunTestOnDevice(std::__1::function<void (WatcherFixture*, tt::tt_metal::Device*)> const&, tt::tt_metal::Device*) /home/ubuntu/dev/tt-metal/build/../tests/tt_metal/tt_metal/unit_tests_common/common/watcher_fixture.hpp:82:24
    #17 0x5555946adff5 in WatcherFixture_TestWatcherAssertBrisc_Test::TestBody() /home/ubuntu/dev/tt-metal/build/../tests/tt_metal/tt_metal/unit_tests_common/watcher/test_assert.cpp:185:11
    #18 0x55559498997a in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe8797a) (BuildId: ab735720930ae913)
    #19 0x555594961ed9 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe5fed9) (BuildId: ab735720930ae913)
    #20 0x555594945c82 in testing::Test::Run() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe43c82) (BuildId: ab735720930ae913)
    #21 0x5555949466b1 in testing::TestInfo::Run() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe446b1) (BuildId: ab735720930ae913)
    #22 0x555594946dbc in testing::TestSuite::Run() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe44dbc) (BuildId: ab735720930ae913)
    #23 0x555594954ddc in testing::internal::UnitTestImpl::RunAllTests() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe52ddc) (BuildId: ab735720930ae913)
    #24 0x55559498fc2a in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe8dc2a) (BuildId: ab735720930ae913)
    #25 0x555594964249 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe62249) (BuildId: ab735720930ae913)
    #26 0x5555949549a4 in testing::UnitTest::Run() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe529a4) (BuildId: ab735720930ae913)
    #27 0x555594931b60 in RUN_ALL_TESTS() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe2fb60) (BuildId: ab735720930ae913)
    #28 0x555594931b3c in main (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe2fb3c) (BuildId: ab735720930ae913)
    #29 0x7f2cc30aa082 in __libc_start_main /build/glibc-LcI20x/glibc-2.31/csu/../csu/libc-start.c:308:16

previously allocated by thread T255 here:
    #0 0x5555944824fd in operator new(unsigned long) (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0x9804fd) (BuildId: ab735720930ae913)
    #1 0x7f2cc367d1bf in std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::operator=(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) (/lib/x86_64-linux-gnu/libc++.so.1+0x561bf) (BuildId: 94ad47568b00dbdffade4d2fb445e9eef46bbe0f)
    #2 0x7f2cc47c749b in tt::watcher::dump(_IO_FILE*) /home/ubuntu/dev/tt-metal/build/../tt_metal/impl/debug/watcher_server.cpp:706:21
    #3 0x7f2cc47c531c in tt::watcher::watcher_loop(int) /home/ubuntu/dev/tt-metal/build/../tt_metal/impl/debug/watcher_server.cpp:816:17
    #4 0x7f2cc47f7044 in decltype(std::declval<void (*)(int)>()(std::declval<int>())) std::__1::__invoke[abi:ue170006]<void (*)(int), int>(void (*&&)(int), int&&) /usr/lib/llvm-17/bin/../include/c++/v1/__type_traits/invoke.h:340:25
    #5 0x7f2cc47f7044 in void std::__1::__thread_execute[abi:ue170006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(int), int, 2ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(int), int>&, std::__1::__tuple_indices<2ul>) /usr/lib/llvm-17/bin/../include/c++/v1/__thread/thread.h:221:5
    #6 0x7f2cc47f7044 in void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(int), int>>(void*) /usr/lib/llvm-17/bin/../include/c++/v1/__thread/thread.h:232:5
    #7 0x55559444372a in asan_thread_start(void*) crtstuff.c

Thread T255 created by T0 here:
    #0 0x55559442b5cd in pthread_create (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0x9295cd) (BuildId: ab735720930ae913)
    #1 0x7f2cc47db338 in std::__1::__libcpp_thread_create[abi:ue170006](unsigned long*, void* (*)(void*), void*) /usr/lib/llvm-17/bin/../include/c++/v1/__threading_support:371:10
    #2 0x7f2cc47db338 in std::__1::thread::thread<void (*)(int), int&, void>(void (*&&)(int), int&) /usr/lib/llvm-17/bin/../include/c++/v1/__thread/thread.h:248:16
    #3 0x7f2cc47c4470 in tt::watcher_attach(tt::tt_metal::Device*) /home/ubuntu/dev/tt-metal/build/../tt_metal/impl/debug/watcher_server.cpp:1090:38
    #4 0x7f2cc4565bb9 in tt::DevicePool::initialize_device(tt::tt_metal::Device*) const /home/ubuntu/dev/tt-metal/build/../tt_metal/impl/device/device_pool.cpp:115:5
    #5 0x7f2cc4568d93 in tt::DevicePool::init_firmware_on_active_devices() const /home/ubuntu/dev/tt-metal/build/../tt_metal/impl/device/device_pool.cpp:223:15
    #6 0x55559448f58e in tt::DevicePool::initialize(std::__1::vector<int, std::__1::allocator<int>>, unsigned char, unsigned long, std::__1::vector<unsigned int, std::__1::allocator<unsigned int>> const&, bool) /home/ubuntu/dev/tt-metal/build/../tt_metal/impl/device/device_pool.hpp
    #7 0x55559452ec84 in CommonFixture::SetUp() /home/ubuntu/dev/tt-metal/build/../tests/tt_metal/tt_metal/unit_tests_common/common/common_fixture.hpp:90:9
    #8 0x55559498997a in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe8797a) (BuildId: ab735720930ae913)
    #9 0x555594961ed9 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe5fed9) (BuildId: ab735720930ae913)
    #10 0x555594945c2b in testing::Test::Run() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe43c2b) (BuildId: ab735720930ae913)
    #11 0x5555949466b1 in testing::TestInfo::Run() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe446b1) (BuildId: ab735720930ae913)
    #12 0x555594946dbc in testing::TestSuite::Run() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe44dbc) (BuildId: ab735720930ae913)
    #13 0x555594954ddc in testing::internal::UnitTestImpl::RunAllTests() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe52ddc) (BuildId: ab735720930ae913)
    #14 0x55559498fc2a in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe8dc2a) (BuildId: ab735720930ae913)
    #15 0x555594964249 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe62249) (BuildId: ab735720930ae913)
    #16 0x5555949549a4 in testing::UnitTest::Run() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe529a4) (BuildId: ab735720930ae913)
    #17 0x555594931b60 in RUN_ALL_TESTS() (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe2fb60) (BuildId: ab735720930ae913)
    #18 0x555594931b3c in main (/home/ubuntu/dev/tt-metal/build/test/tt_metal/unit_tests_fast_dispatch+0xe2fb3c) (BuildId: ab735720930ae913)
    #19 0x7f2cc30aa082 in __libc_start_main /build/glibc-LcI20x/glibc-2.31/csu/../csu/libc-start.c:308:16

SUMMARY: AddressSanitizer: heap-use-after-free /usr/lib/llvm-17/bin/../include/c++/v1/__string/char_traits.h:189:73 in std::__1::char_traits<char>::assign[abi:ue170006](char&, char const&)
Shadow bytes around the buggy address:
  0x5120000eff00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x5120000eff80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x5120000f0000: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x5120000f0080: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x5120000f0100: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa
=>0x5120000f0180: fa fa fa fa fa fa fa fa[fd]fd fd fd fd fd fd fd
  0x5120000f0200: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x5120000f0280: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa
  0x5120000f0300: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x5120000f0380: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x5120000f0400: fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==385391==ABORTING
yan-zaretskiy commented 3 weeks ago

For context, #9383 is currently blocked on the CI job segfaulting inside a unit test for watcher's handing of asserts (https://github.com/tenstorrent/tt-metal/actions/runs/9495089835/job/26187495213). It prompted me to run those tests with ASAN.

pgkeller commented 3 weeks ago

David, can you take a look?

tt-dma commented 3 weeks ago

Will take a look once I finish #6430, let me know if this is more important than that

tt-dma commented 3 weeks ago

@yan-zaretskiy Is there a specific commit or patch that I can use to repro this issue?

yan-zaretskiy commented 3 weeks ago

You can pull my branch: yan-zaretskiy/libc++-build. You'd also need to add these to the root CMakeLists.txt, I put them right after setting the C++ standard:

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=address,undefined -fno-omit-frame-pointer")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -fsanitize=address,undefined")

It should also reproduce on the main branch with sanitizers enabled, I just didn't try it myself. I'm gonna give it a try and report here.

yan-zaretskiy commented 3 weeks ago

Hmm, that's a bummer, I don't get that particular error with libstdc++ on main. I do see unaligned memory loads though from the dprint server.

yan-zaretskiy commented 3 weeks ago

Oh, I think I know what's going on, we hit SIOF when trying to check the value of the static string watcher::watcher_exception_message.

Nope, that didn't fix it. Somehow we delete watcher::watcher_exception_message and then try to write to it afterwards.