Closed mcourteaux closed 2 months ago
Aha! The debugger indicates: tracy::s_profiler+728
in the first line of the call-stack. This means that we are looking at this instead:
This looks prone to initialization order issues. This seems to be taken without TRACY_DELAYED_INIT
.
Okay, the MWE is even simpler, and doesn't involve making the Lock static:
#define TRACY_ON_DEMAND 1
#include <tracy/Tracy.hpp>
#include <mutex>
int main() {
TracyLockable(std::mutex, mutex);
std::printf("This never prints.\n");
return 0;
}
The source location static initializer now causes the problem. I sort of conclude that this initialization-order issue renders TRACY_ON_DEMAND fully broken?
I'm suspecting I have not respected the compilation rules of Tracy. I precompiled and installed Tracy on my system to reduce compilation times of my project, but the TRACY_ON_DEMAND
flag also influences the static library for TracyClient.a
.
I'll try this and report back.
Yes, this fixes it. I switched back to CMake's add_subdirectory
after setting the options I want. My bad... :shrug:
Reflecting a bit about this, it'd be nice if Tracy can somehow check that the feature flags match across TUs or static libraries at link-time or run time. Perhaps we can figure out something elegant, that would save people in the future the equivalent of the day that I spent on this.
Perhaps something like this in tracy/Tracy.hpp
:
inline static bool validate_feature_flags = [](){
#ifdef TRACY_ON_DEMAND
assert(tracy::g_ON_DEMAND);
#else
assert(!tracy::g_ON_DEMAND);
#endif
// same for the other flags...
return true;
}();
Making use of the static inline
linkage. This way, every TU will check if the command-line-passed flag (-DTRACY_ON_DEMAND
) matches the one in the TracyClient.cpp.o
object (or static library), at program initialization.
Where, of course, in TracyClient.cpp
, we'd have:
#ifdef TRACY_ON_DEMAND
bool g_ON_DEMAND = true;
#else
bool g_ON_DEMAND = false;
#endif
The above will check it at runtime, but perhaps we can even enable this check on link-time, by simply ONLY declaring the feature flags that do exist.
namespace tracy {
extern bool g_ON_DEMAND;
static void validate_feature_flags() {
#ifdef TRACY_ON_DEMAND
assert(tracy::g_ON_DEMAND);
#endif
// same for the other flags...
};
}
The extern
keyword will make the file compile, but make the linker report an undefined symbol when the symbol cannot be found in the other TUs or (precompiled) static library. The static
will not make the symbol visible during link-time and not violate the ODR. For the sake of completeness, TracyClient.cpp
would then contain:
#ifdef TRACY_ON_DEMAND
bool g_ON_DEMAND = true;
#endif
The nice thing is that the runtime-overhead is zero in time, and a few bytes for storing those bool
globals. Note that we don't even need to call validate_feature_flags()
at runtime nor compile time. However, this has the problem that if TracyClient.cpp
was compiled WITH a particular flag, and the user-files are compiled WITHOUT it, this will not be detected... Fixing this could be done by creating opposite flags to mark that they are not present:
#ifdef TRACY_ON_DEMAND
bool g_ON_DEMAND = true;
else
bool g_not_ON_DEMAND = true;
#endif
with
namespace tracy {
extern bool g_ON_DEMAND;
static void validate_feature_flags() {
#ifdef TRACY_ON_DEMAND
assert(tracy::g_ON_DEMAND);
#else
assert(tracy::g_not_ON_DEMAND);
#endif
// same for the other flags...
};
}
(Tagging you explicitly, as I already closed this, so IDK if you will get any notification if I don't. @wolfpld)
Compiler optimizations will remove any dead code and the linker will never see these variables.
Here is a minimal reproducable:
with:
which hangs here:
So, the
m_deferredLock
is the one being deadlocked on. There is two places that it's locked:Profiler::DeferItem()
: https://github.com/wolfpld/tracy/blob/d1300c491ca50962b5ba4d5b7d5a56b34b9e99ad/public/client/TracyProfiler.hpp#L716-L722 Hardly anything can go wrong here.Profiler::Worker()
: https://github.com/wolfpld/tracy/blob/d1300c491ca50962b5ba4d5b7d5a56b34b9e99ad/public/client/TracyProfiler.cpp#L1879-L1915 This also looks fine.After some thinking and following the code paths of the
Profiler::Worker()
, I am unsure of where the problem hides. The second place where the lock is taken (inProfiler::Worker()
) does not even get reached as far as I can tell, because that thread is waiting onListenSocket::Accept()
.So my other option is that this is an initialization-order problem, and the
m_deferredLock
is actually not initialized yet. However, following that route, throughGetProfilerData()
I don't see a way that the mutex is not properly initialized:https://github.com/wolfpld/tracy/blob/d1300c491ca50962b5ba4d5b7d5a56b34b9e99ad/public/client/TracyProfiler.cpp#L1216-L1233
Line 1227 in particular should initialize it, before returning it to to call
DeferItem()
on, here: https://github.com/wolfpld/tracy/blob/d1300c491ca50962b5ba4d5b7d5a56b34b9e99ad/public/client/TracyLock.hpp#L17-L36I'm don't really know how to proceed debugging this. This is related to #632.