Open dmirys opened 11 months ago
@dmirys Did you find a solution here. I am also running into a similar issue.
I hang at exit even with TRACY_NO_SYSTEM_TRACING
No. Still using workaround. Here is the full set of flags that I use: PUBLIC TRACY_NO_CALLSTACK TRACY_NO_SYSTEM_TRACING TRACY_NO_CODE_TRANSFER
Sorry, you have to debug if it doesn't help.
More info on this hang:
It is specifically on the join in Thread
destructor call in ~Profiler
.
Here is the back trace of the hang:
#0 __pthread_clockjoin_ex (threadid=139643671152384, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>)
at pthread_join_common.c:145
#1 0x00007f015a709348 in tracy::Profiler::~Profiler() () from /home/mmemarian/models/tt-metal/build/lib/libtracy.so.0.10.0
Server is also reporting:
Here is the cmake option list:
set_option(TRACY_ON_DEMAND "On-demand profiling" OFF)
set_option(TRACY_CALLSTACK "Enforce callstack collection for tracy regions" OFF)
set_option(TRACY_NO_CALLSTACK "Disable all callstack related functionality" ON)
set_option(TRACY_NO_CALLSTACK_INLINES "Disables the inline functions in callstacks" ON)
set_option(TRACY_ONLY_LOCALHOST "Only listen on the localhost interface" OFF)
set_option(TRACY_NO_BROADCAST "Disable client discovery by broadcast to local network" OFF)
set_option(TRACY_ONLY_IPV4 "Tracy will only accept connections on IPv4 addresses (disable IPv6)" OFF)
set_option(TRACY_NO_CODE_TRANSFER "Disable collection of source code" ON)
set_option(TRACY_NO_CONTEXT_SWITCH "Disable capture of context switches" ON)
set_option(TRACY_NO_EXIT "Client executable does not exit until all profile data is sent to server" OFF)
set_option(TRACY_NO_SAMPLING "Disable call stack sampling" ON)
set_option(TRACY_NO_VERIFY "Disable zone validation for C API" OFF)
set_option(TRACY_NO_VSYNC_CAPTURE "Disable capture of hardware Vsync events" ON)
set_option(TRACY_NO_FRAME_IMAGE "Disable the frame image support and its thread" ON)
set_option(TRACY_NO_SYSTEM_TRACING "Disable systrace sampling" ON)
set_option(TRACY_PATCHABLE_NOPSLEDS "Enable nopsleds for efficient patching by system-level tools (e.g. rr)" OFF)
set_option(TRACY_DELAYED_INIT "Enable delayed initialization of the library (init on first call)" OFF)
set_option(TRACY_MANUAL_LIFETIME "Enable the manual lifetime management of the profile" OFF)
set_option(TRACY_FIBERS "Enable fibers support" OFF)
set_option(TRACY_NO_CRASH_HANDLER "Disable crash handling" OFF)
set_option(TRACY_TIMER_FALLBACK "Use lower resolution timers" OFF)
set_option(TRACY_LIBUNWIND_BACKTRACE "Use libunwind backtracing where supported" OFF)
set_option(TRACY_SYMBOL_OFFLINE_RESOLVE "Instead of full runtime symbol resolution, only resolve the image path and offset to enable offline symbol resolution" OFF)
set_option(TRACY_LIBBACKTRACE_ELF_DYNLOAD_SUPPORT "Enable libbacktrace to support dynamically loaded elfs in symbol resolution resolution after the first symbol resolve operation" OFF)
It is specifically on the join in
Thread
destructor call in~Profiler
.
What are other threads doing when this join is pending? Specifically, the LaunchWorker
thread?
It is stuck on poll
Was wondering if there was any progress here? Is there any more data I can provide or tests to run?
According to the man page,
If the value of timeout is 0, poll() shall return immediately.
The glibc code at frame #0 in your call stack is basically doing a syscall into the kernel, which shouldn't be affected by the program environment.
I don't know what's wrong here, it shouldn't be happening.
I'm trying to reduce amount of data collected by tracy. For that purpose I use the following set of flags:
My app hangs at the exit in the infinite loop, after sending terminate command to the server. Weanwhile server processed terminate instruction and detects there are non zero m_pendingCallstackFrames. Looks like it supposed to get more data from client in such case. I think some misslogic here: client is waiting for data from server, while server is waiting data from client, while the client said to terminate.
With debugger I found that client sends QueueType::CallstackSample commands to the server during tracy session. Disabling system tracing with
TRACY_NO_SYSTEM_TRACING
solves the issue. Am I lost something useful with this option? What is a correct way to solve the problem? I'm ready to test some ideas.