wolfpld / tracy

Frame profiler
https://tracy.nereid.pl/
Other
8.61k stars 592 forks source link

Assertion `m_pendingSingleString.ptr != nullptr' failed #813

Closed moritztng closed 2 weeks ago

moritztng commented 2 weeks ago

I use tracy for profiling a Tenstorrent card (https://github.com/tenstorrent/tt-metal). The profiler correctly shows the infos of cpu profiling, then it takes some time reading the infos from the tenstorrent card and then it crashes with the following error message. It happens with the profiler and the capture tool.

Connecting to 192.168.0.69:8086...
Queue delay: 24 ns
Timer resolution: 5 ns
   2.19 Kbps /109.3% =   0.00 Mbps | Tx: 71.43 KB | 64.06 MB | 32.75 stracy-capture: /home/moritz/tracy/server/TracyWorker.cpp:4365: uint32_t tracy::Worker::GetSingleStringIdx(): Assertion `m_pendingSingleString.ptr != nullptr' failed.
Aborted (core dumped)
wolfpld commented 2 weeks ago

Can you provide a call stack?

moritztng commented 2 weeks ago

From the server or the client with the card?

wolfpld commented 2 weeks ago

From the server.

moritztng commented 2 weeks ago

Commands:

gdb ./tracy-profiler
(gdb) run
(gdb) backtrace //after the failed assertion

Output:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff78e2537 in __GI_abort () at abort.c:79
#2  0x00007ffff78e240f in __assert_fail_base (fmt=0x7ffff7a59688 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5555564be210 "m_pendingSingleString.ptr != nullptr", 
    file=0x55555646e6a8 "/home/moritz/tracy/server/TracyWorker.cpp", line=4365, function=<optimized out>) at assert.c:92
#3  0x00007ffff78f1662 in __GI___assert_fail (assertion=0x5555564be210 "m_pendingSingleString.ptr != nullptr", file=0x55555646e6a8 "/home/moritz/tracy/server/TracyWorker.cpp", line=4365, 
    function=0x5555564be1e0 "uint32_t tracy::Worker::GetSingleStringIdx()") at assert.c:101
#4  0x000055555606c51d in tracy::Worker::GetSingleStringIdx() ()
#5  0x0000555556017eee in tracy::Worker::Exec() ()
#6  0x0000555555fe8501 in tracy::Worker::Worker(char const*, unsigned short, long)::{lambda()#1}::operator()() const ()
#7  0x0000555556102422 in void std::__invoke_impl<void, tracy::Worker::Worker(char const*, unsigned short, long)::{lambda()#1}>(std::__invoke_other, tracy::Worker::Worker(char const*, unsigned short, long)::{lambda()#1}&&) ()
#8  0x00005555561020ee in std::__invoke_result<tracy::Worker::Worker(char const*, unsigned short, long)::{lambda()#1}>::type std::__invoke<tracy::Worker::Worker(char const*, unsigned short, long)::{lambda()#1}>(tracy::Worker::Worker(char const*, unsigned short, long)::{lambda()#1}&&) ()
#9  0x0000555556101d84 in void std::thread::_Invoker<std::tuple<tracy::Worker::Worker(char const*, unsigned short, long)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) ()
#10 0x0000555556101b74 in std::thread::_Invoker<std::tuple<tracy::Worker::Worker(char const*, unsigned short, long)::{lambda()#1}> >::operator()() ()
#11 0x0000555556100c56 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<tracy::Worker::Worker(char const*, unsigned short, long)::{lambda()#1}> > >::_M_run() ()
#12 0x00007ffff7ce4ed0 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x00007ffff7a9bea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#14 0x00007ffff79bba6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Profiler until it fails and closes (click on it to make it sharp): Screenshot 2024-06-13 17 41 23

I ran it with my own tenstorrent kernel but it also happens with one the tenstorrent profiler programming examples (https://github.com/tenstorrent/tt-metal/tree/main/tt_metal/programming_examples/profiler/test_full_buffer). I'm at commit 80a330f of tt-metal. There is the macro DeviceZoneScopedN("TEST-FULL") inserted in the kernel that runs on the device. And on the host it runs tt_metal::detail::DumpDeviceProfileResults(device) (this is the host instruction where the assertion fails).

wolfpld commented 2 weeks ago

You will need to build the profiler with -DCMAKE_BUILD_TYPE=Debug to produce debug information that will show source file names and line numbers in the call stack.

moritztng commented 2 weeks ago

Okay, this is the new call stack:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff78e2537 in __GI_abort () at abort.c:79
#2  0x00007ffff78e240f in __assert_fail_base (fmt=0x7ffff7a59688 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x555556d71520 "m_pendingSingleString.ptr != nullptr", 
    file=0x555556d6a218 "/home/moritz/tracy/server/TracyWorker.cpp", line=4365, function=<optimized out>) at assert.c:92
#3  0x00007ffff78f1662 in __GI___assert_fail (assertion=0x555556d71520 "m_pendingSingleString.ptr != nullptr", file=0x555556d6a218 "/home/moritz/tracy/server/TracyWorker.cpp", line=4365, 
    function=0x555556d714f0 "uint32_t tracy::Worker::GetSingleStringIdx()") at assert.c:101
#4  0x000055555605b90f in tracy::Worker::GetSingleStringIdx (this=0x555558024330) at /home/moritz/tracy/server/TracyWorker.cpp:4365
#5  0x0000555556007b3c in tracy::Worker::ProcessMessageAppInfo (ev=..., this=0x555558024330) at /home/moritz/tracy/server/TracyWorker.cpp:5621
#6  tracy::Worker::Process (ev=..., this=0x555558024330) at /home/moritz/tracy/server/TracyWorker.cpp:4525
#7  tracy::Worker::DispatchProcess (ptr=<optimized out>, ev=..., this=0x555558024330) at /home/moritz/tracy/server/TracyWorker.cpp:3253
#8  tracy::Worker::Exec (this=0x555558024330) at /home/moritz/tracy/server/TracyWorker.cpp:2789
#9  0x0000555555fbe0c5 in operator() (__closure=0x55555801efd8) at /home/moritz/tracy/server/TracyWorker.cpp:303
#10 0x000055555608810c in std::__invoke_impl<void, tracy::Worker::Worker(char const*, uint16_t, int64_t)::<lambda()> >(std::__invoke_other, struct {...} &&) (__f=...) at /usr/include/c++/10/bits/invoke.h:60
#11 0x0000555556087e1e in std::__invoke<tracy::Worker::Worker(char const*, uint16_t, int64_t)::<lambda()> >(struct {...} &&) (__fn=...) at /usr/include/c++/10/bits/invoke.h:95
#12 0x0000555556087ae0 in std::thread::_Invoker<std::tuple<tracy::Worker::Worker(char const*, uint16_t, int64_t)::<lambda()> > >::_M_invoke<0>(std::_Index_tuple<0>) (this=0x55555801efd8) at /usr/include/c++/10/thread:264
#13 0x0000555556087928 in std::thread::_Invoker<std::tuple<tracy::Worker::Worker(char const*, uint16_t, int64_t)::<lambda()> > >::operator()(void) (this=0x55555801efd8) at /usr/include/c++/10/thread:271
#14 0x0000555556087074 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<tracy::Worker::Worker(char const*, uint16_t, int64_t)::<lambda()> > > >::_M_run(void) (this=0x55555801efd0) at /usr/include/c++/10/thread:215
#15 0x00007ffff7ce4ed0 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#16 0x00007ffff7a9bea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#17 0x00007ffff79bba6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
wolfpld commented 2 weeks ago

You are using a custom client library that does not properly follow the wire protocol. You must follow what Profiler::Dequeue() does in your code. Note that the protocol is a deliberately undocumented implementation detail, and will change in the future.

moritztng commented 2 weeks ago

Thanks, I'll look into that. And btw, tracy is great and I can understand their decision.