ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.5k stars 2.52k forks source link

Genericize tracing interface #5987

Open sqfink opened 4 years ago

sqfink commented 4 years ago

Ideally, Zig internally should support multiple plugable tracing frameworks.

The current Zig tracing interface is exclusively targeted to https://github.com/wolfpld/tracy. There are a few other competing tracing frameworks, primarily https://github.com/opentracing, which provide similar functionality with slightly different semantics.

daurnimator commented 4 years ago

Isn't opentracing more focused around tracing remote procedure calls rather than in-process?

andrewrk commented 4 years ago

current Zig tracing interface is exclusively targeted to https://github.com/wolfpld/tracy.

To be clear, the current self-hosted compiler tracing interface is. There is no tracing interface in the standard library (yet).

Anyway, that said, I agree, this is one of those things that would be nice for the standard library to standardize on. I think we could do it similarly to how std.log works. Idea being, you could leave the trace calls in your code, and they will end up as no-ops unless activated by overriding the default tracing function.

ghost commented 4 years ago

Should we also do this for emulation/virtualisation? Special-casing Wine and QEMU seems like a suboptimal solution.

nektro commented 4 years ago

OpenTracing is now https://opentelemetry.io/

Spec: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/overview.md Go Client: https://pkg.go.dev/go.opentelemetry.io/otel/api@v0.12.0 Example: https://github.com/open-telemetry/opentelemetry-go/blob/master/example/basic/main.go

huntr3ss commented 1 year ago

Hi, I created trace.zig which may be helpful to implement tracing in the standard library: https://gitlab.com/zig_tracing/trace.zig

Whether some of the ideas can be used as inspiration or avoiding the mistakes I did.

leroycep commented 1 week ago

I've been writing a zig module for the opentelemetry tracing specification. Some thoughts:

Questions:

  1. Should the standard library API aim for OpenTelemetry spec compliance?
  2. Should the standard library provide an SDK for that API? E.G. should it provide a implementation that sends data to an OpenTelemetry collector and/or an implementation that send data to Tracy?
  3. Should the standard library have APIs for the other "signals" that OpenTelemetry defines (metrics, baggage, the WIP profiling)?
  4. If we make OpenTelemetry compatible APIs, should they live under std.otel?

For now I will continue work on making an API targeting spec compliance, and I plan on submitting a pull request at some point.


A comparison of the tracing API vs std.log:

A comparison of the OpenTelemetry tracing API vs zig/src/tracy.zig:

leroycep commented 3 days ago

I reviewed a couple profiler/tracing APIs.

Here is a proposal for a tracing API for the standard library:

//! std.trace, spans
const trace = @This();

const root = @import("root");

/// An opaque type that will be passed to `std.trace.end` to indicate that a zone has ended.
///
/// - The default no-op backend returns an empty struct type
/// - Tracy returns a struct containing a u32 id and a c_int active flag
/// - a (theoretic) `palanteer` backend would return src.fn_name
pub const SpanToken = if (@hasDecl(root, "SpanTokenType")) root.SpanTokenType else DefaultSpanToken;

pub const BeginFn = fn(comptime scope: @Type(.enum_literal), comptime src: SourceLocation, args: anytype) SpanToken;
pub const EndFn = fn(SpanToken) void;

pub const begin = if (@hasDecl(root, "trace_begin")) root.trace_begin else defaultBegin;
pub const end = if (@hasDecl(root, "trace_end")) root.trace_end else defaultEnd;

pub fn scoped(comptime scope: @Type(.enum_literal)) type {
    return struct {
        pub fn begin(comptime src: SourceLocation, args: anytype) SpanToken {
            return trace.begin(scope, src, args);
        }
    };
}

// Default no-op implementations

pub const DefaultSpanToken = struct {};

pub fn defaultBegin(comptime scope: @Type(.enum_literal), comptime src: SourceLocation, args: anytype) SpanToken {
    _ = scope;
    _ = src;
    _ = args;
    return .{};
}

pub fn defaultEnd(ctx: SpanToken) void {
    _ = ctx;
}

Note that root.SpanTokenType, root.trace_begin, root.trace_end can't all be merged into std.Options or else we get a cyclic reference error.

General usage would look like this:

const std = @import("std");
const tracer = std.trace.scoped(.a_scope_that_is_hopefully_very_unique);

pub fn main() !void {
    const span = tracer.begin(@src(), .{});
    defer std.trace.end(span);

    childFunction();
}

pub fn childFunction() void {
    const span = tracer.begin(@src(), .{
      .tracy_color = 0x0000FF, // make it a ghastly blue, tracy
    });
    defer std.trace.end(span);
}

This would cover most of the usecases out there, especially in libraries that may not want to add a dependency on a specific profiler.

What this doesn't cover:

Some of the stuff mentioned above would be useful to have, but perhaps as a separate proposal.

What I'm thinking of doing after:

I will probably start working on a pull request, but feedback is welcome.

leroycep commented 1 day ago

Spent the weekend learning about SystemTap static probes. I think they're pretty neat! I'd heard of DTrace before, but digging into the details has been kind of mind blowing to me. You add a nop instruction your code, and then add a ELF note that looks like this:

  stapsdt              0x0000002e   NT_STAPSDT (SystemTap probe descriptors)
    Provider: main
    Name: loop
    Location: 0x00000000010019e9, Base: 0x0000000001000238, Semaphore: 0x0000000000000000
    Arguments: 8@112(%rsp)

And boom! Now you can break on a specific probe, even in a stripped ELF binary:

(No debugging symbols found in ./zig-out/bin/infinite-loop-example)
(gdb) info probe
Type Provider Name  Where              Semaphore          Object                                                                
stap do_thing BEGIN 0x0000000001001a55 0x0000000001004042 /home/geemili/src/1_projects/probes/zig-out/bin/infinite-loop-example 
stap do_thing END   0x0000000001001bdb                    /home/geemili/src/1_projects/probes/zig-out/bin/infinite-loop-example 
stap main     BEGIN 0x00000000010019b8 0x0000000001004040 /home/geemili/src/1_projects/probes/zig-out/bin/infinite-loop-example 
stap main     loop  0x00000000010019e9                    /home/geemili/src/1_projects/probes/zig-out/bin/infinite-loop-example 
(gdb) break -probe main:BEGIN
Breakpoint 1 at 0x10019b8
(gdb) run
Starting program: /home/geemili/src/1_projects/probes/zig-out/bin/infinite-loop-example 
Downloading separate debug info for system-supplied DSO at 0x7ffff7ffd000

Breakpoint 1, 0x00000000010019b8 in ?? ()
(gdb) 

It also allows passing in up to 12 integer/pointer/float arguments, which I use here to pass a unique id and the fields of std.builtin.SourceLocation:

(gdb) printf "%x %s %s %d %d\n",$_probe_arg0,$_probe_arg1,$_probe_arg2,$_probe_arg3,$_probe_arg4
2a0ead infinite-loop.zig main 2 37

And it's not just gdb that supports them. perf, bcc/bpftrace, and lttng all support USDT probes.

One downside is that it's Linux specific. As I understand it MacOS and the BSDs have DTrace, which SystemTap is based on. Windows has "Event Tracing for Windows".

I ported the C preprocessor macros and assembly (and assembly macros) from SystemTap into Zig. I'm wondering if the overhead of USDT probes is small enough to keep them in the Zig compiler by default, and if the Tracy build option could be replaced with a script that places Tracy instrumentation on the appropriate probes. Most of the tools (except for LTTng) do tracing from the kernel with eBPF, which may add some overhead compared to keeping it all in userland.

Anyway thanks for reading this infodump about probe points