nagisa / rust_tracy_client

Tracy client libraries for Rust
Apache License 2.0
178 stars 37 forks source link

Set `TRACY_NO_VERIFY` to reduce instrumentation overhead #107

Closed jasontt closed 3 months ago

jasontt commented 4 months ago

Currently the TRACY_NO_VERIFY flag is being defaulted to false.

As per the Tracy Profiler User Manual

Since all C API instrumentation has to be done by hand, it is possible to miss some code paths where a zone should be started or ended. Tracy will perform additional validation of instrumentation correctness to prevent bad profiling runs. Read section 4.7 for more information. However, the validation comes with a performance cost, which you may not want to pay. Therefore, if you are entirely sure that the instrumentation is not broken in any way, you may use the TRACY_NO_VERIFY macro, which will disable the validation code.

We are handling the scope lifetimes in rust through the Span struct thus I don't think we need this extra protection/overhead and importantly if you check the source code in TracyProfiler.cpp some of this work is done after getting the time from the profiler. e.g.

#ifndef TRACY_NO_VERIFY
    {
        TracyQueuePrepareC( tracy::QueueType::ZoneValidation );
        tracy::MemWrite( &item->zoneValidation.id, id );
        TracyQueueCommitC( zoneValidationThread );
    }
#endif
    {
        TracyQueuePrepareC( tracy::QueueType::ZoneBegin );
        tracy::MemWrite( &item->zoneBegin.time, tracy::Profiler::GetTime() );
        tracy::MemWrite( &item->zoneBegin.srcloc, (uint64_t)srcloc );
        TracyQueueCommitC( zoneBeginThread );
    }
    return ctx;
}

With a one line change to the build.rs this halves the time taken per iteration in the span_callstack/0 benchmark on my machine and subtracts ~20ns from the timed portion of real spans I'm using in production code. "span_callstack/0" before:

span_callstack/0        time:   [61.739 ns 65.917 ns 70.116 ns]
                        change: [-16.371% -11.778% -7.1771%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

"span_callstack/0" after:

span_callstack/0        time:   [37.138 ns 37.721 ns 38.281 ns]
                        change: [-39.220% -36.368% -33.468%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
nagisa commented 3 months ago

A couple of contextual observations:

  1. It is safe to call mem::forget(span) in Rust, so we should check that nothing explodes if that's done;
  2. Although mem::forget(span) is screaming "something is weird here", it is still possible to do something with our current scoped Spans that breaks tracy's instrumentation well-formedness invariants:

    let span1 = span!("hello");
    let span2 = span!("world");
    drop(span1);
    drop(span2);

    We should check that nothing explodes in this case either.

Currently the TRACY_NO_VERIFY flag is being defaulted to false.

That means verification remains enabled by default, correct? How does one go about disabling verification (there's no feature flag to enable/disable?)

jasontt commented 3 months ago

Let me get back to you on your first point(s) - it seems initially hard to guarantee safety here, it raises a couple of questions to me:

  1. What can we consider public/private here? Is there something we can do with the span macro to sufficiently hide/guard against drop being called on our span?
  2. Would it be reasonable to hide the safety concern behind a feature much like tracy is already doing?
  3. What exactly happens if we do forget to close a scope? Garbage output is quite a different outcome from safety issues in our code although encapsulating all of this would be ideal

Your final question is easy to answer, yes verification remains enabled by default (more precisely because the flag is not set). To disable verification you set the TRACY_NO_VERIFY flag which is the change I made in build.rs.

nagisa commented 3 months ago

I don't think we can do much about guaranteeing well-formed use of tracy, and furthermore tracing-tracy currently relies on the currently imperfect API to bridge between the two ecosystems where tracing is more flexible in what it allows than tracy.

Your final question is easy to answer, yes verification remains enabled by default (more precisely because the flag is not set). To disable verification you set the TRACY_NO_VERIFY flag which is the change I made in build.rs.

My question was leading somewhat. Once this change lands we'd compile with -DTRACY_NO_VERIFY unconditionally. There isn't a mechanism for users to adjust this define if they needed to. IMO we should add something like we have for all other defines. Remember -- tracy-client-sys is not necessarily only for tracy-client to use, the raw C APIs are also as public interface for this project as are the wrappers from the higher-level crate.

jasontt commented 3 months ago

Okay so a few thoughts to add here and a minor change to the commit itself:

First off I've put -DTRACY_NO_VERIFY behind a flag, I've left it defaulted to false (running verification) for now, although I think there's a pretty good argument to be made for it to be defaulted true. My thinking here is that this is Tracy you use this profiler to be as low overhead as possible and having a bunch of verification overhead is surprising. That said, I don't like that this isn't strictly sound, but I don't think there's much we can do there without exposing our own C API or converting back to Tracy in some manner. Digging into their source a bit this problem doesn't actually seem to be constrained to Rust, we just handle it better by using their safer C API. It's perfectly possible for a C++ destructor to not get run and this exact same issue will arise, I'm actually mildly surprised they don't do something smart in their capture process here as there has clearly been some thought put into the catching of signals so this mode of failure has likely been considered...

Asynchronously of this change I have been investigating other modes of recording Spans safely in native Rust that can be consumed by Tracy. I do intend on trying to get this merged but I will likely be devoting more of my time to this effort as while this change has brought my measured instrumentation overhead down from 60ns -> 30ns on my machine, experimentally ~14.5ns seems to be achievable (two rtdsc calls + a mov/vectorised mov), the remaining overhead may be the lack of inlining of the C API or just Tracy itself but that's another matter.

jasontt commented 3 months ago

I've inverted no-verify to verify, defaulted it to off and adjusted the docs. In hindsight many of Tracy's NO features are already inverted so this does make more sense.

Ralith commented 3 months ago

Does setting TRACY_NO_VERIFY make tracy-client's API unsound?

nagisa commented 3 months ago

That's a good question. I was assuming that it was only "undefined behaviour" with regards to how the profiles will be interpreted by the profiler, not undefined behaviour proper seeing as its about failure to clean up the resources proper. If its the latter, then we indeed need to be quite a bit more wary of exposing it or enabling it by default.

Ralith commented 3 months ago

"can cause undefined behavior" very clearly communicates "unsound" to me. A resource leak and/or garbage trace data is a much lesser issue than UB.

nagisa commented 3 months ago

Tracy's manual confirms my understanding. And I indeed believe we don't need to have validation turned on at all times. The workflow I imagine here is going to be enabling verification once you do get a faulty profile to investigate where the fault is exactly.

Ralith commented 3 months ago

Tracy's manual confirms my understanding

Then the README changes merged in this PR are incorrect. The README now says "can cause undefined behavior", but your claim is that they can only cause inconsistent profiling data.

nagisa commented 3 months ago

I took care of adjusting the documentation before the release already.