tokio-rs / tracing

Application level tracing for Rust.
https://tracing.rs
MIT License
5.34k stars 699 forks source link

RFC: C client API for `tracing-core` #948

Open mystor opened 4 years ago

mystor commented 4 years ago

Feature Request

Crates

Motivation

tracing provides a useful and efficient API for performing structured logging across your program, unfortunately it can currently only be used by Rust clients in a codebase, making it unsuitable as the logging interface for large multi-language codebases.

Many languages have support for C FFI, making it a common baseline for defining core implementation libraries. By exposing a low-level C API to the client APIs from tracing-core, it would be possible to use a common tracing subscriber for logging from all languages in a single project.

This RFC does not propose exposing the ability to define subscribers or full power of tracing-core's Dispatch and Value APIs. In many cases this would be inefficient due to back-and-forth FFI calls, and could make it too difficult to evolve the crate backwards-compatibly in the future. Instead, it proposes exposing a minimal set of APIs to allow efficiently creating and dispatching spans and events from non-Rust code.

Unfortunately, making an API C-compatible places a number of constraints on the implementation, especially when trying to make it efficient. Due to the upcoming breaking changes for tracing-core 0.2 (#922), which is likely to include other changes to the Metadata and Callsite APIs, now seems like a good time to consider what changes would be necessary.

Constraints

Proposal

Introduce a new "c-ffi" feature to the tracing-core crate, and include a new tracing-core.h header file in the crate's distribution. When the "c-ffi" feature is enabled, tracing-core would expose a new c_ffi module in the root, containing a series of #[no_mangle] extern "C" methods corresponding to those defined in the tracing-core.h header.

No high-level APIs are exposed. High level interfaces in other languages, such as C++, would be built using this API, and maintained/versioned separately from other tracing components.

Potential tracing-core.h Header

This RFC proposes an API which is relatively simple. It does not attempt to perfectly mimic the existing Rust API, in order to allow it to evolve separately.

This header does not take into account potential changes to the interface coming in 0.2, and may be missing important features, but should outline the general idea.

gist: https://gist.github.com/mystor/3c979b1a903986b2d30c2f74bfe4d6e8

Design Decisions

Impact on tracing-core

Adding this new API will have an impact on the tracing-core APIs which will likely require breaking changes. The following are some of the potential impacts:

Open Questions

EDIT: Noticed an oversight in the initial proposal. As tracing_span_current returns a const tracing_metadata *, the Rust Metadata<'_> struct would be required to contain that type as a prefix. I've noted this in the "Impacts" section, and added an alternative suggestion using an opaque typedef and additional functions.

hawkw commented 4 years ago

cc @str4d, I imagine that this is relevant to your interests!

hawkw commented 4 years ago

Some initial thoughts:

Introduce a new "c-ffi" feature to the tracing-core crate, and include a new tracing-core.h header file in the crate's distribution. When the "c-ffi" feature is enabled, tracing-core would expose a new c_ffi module in the root, containing a series of #[no_mangle] extern "C" methods corresponding to those defined in the tracing-core.h header.

What's the motivation behind defining the extern "C" API surface in tracing-core behind a feature flag, versus simply defining another crate that wraps the tracing-core API and defines C bindings?

It seems to me that defining the C bindings in a separate crate could have advantages for versioning: we can make internal changes that don't break Rust code, but do break the C bindings, in minor/patch versions of tracing-core, if the C FFI crate uses a pinned tracing-core dependency, and then issue breaking releases of just the C FFI crate.

This RFC does not propose exposing the ability to define subscribers or full power of tracing-core's Dispatch and Value APIs. In many cases this would be inefficient due to back-and-forth FFI calls, and could make it too difficult to evolve the crate backwards-compatibly in the future. Instead, it proposes exposing a minimal set of APIs to allow efficiently creating and dispatching spans and events from non-Rust code.

IMO, this is definitely correct. Subscriber implementations should definitely be defined in Rust. The surface area of the API massively increases when including everything necessary to implement a subscriber.

Exposing enums like Level, Kind, and Interest to C code effectively stabilizes the variant's values in the ABI.

  • These values have already been changed once in a minor version for optimization reasons (#853), and this could prevent making that kind of change in the future.

  • This may be avoidable by performing runtime conversions between C and Rust enum representations.

I'm uncomfortable with exposing Interest to C as an enum. Interests are usizes, and we only use the first 2 bits of them currently. This is by design; the intention was to reserve the remaining bits for other uses, like using the rest as a bitset so a single Interest can store separate values from multiple subscribers, or to implement efficient sampling by storing a number representing a percentage of times a callsite should be sampled.

As an alternative, what do you think about having a struct implementing the Callsite trait that's exposed to C as an opaque pointer? We could define Rust functions for accessing the callsite's internal Interest value atomically --- this would also avoid having to expose atomics in the C FFI surface but allow us to update callsites atomically.

Should the MAX_LEVEL static introduced in #853 be exposed to the C API?

  • This could help avoid calling tracing_callsite_register in some cases, however would be both difficult to implement (due to atomics being difficult to expose to C), and may not be worthwhile if hidden behind a function call.

I think even if it's behind a function call, it might still be worth exposing. Since Interests must be updated atomically on the Rust side, the callsite cache is also going to have to be behind a function call, so I don't think that C will ever be able to benefit from skipping a callsite without at least one function call.

  • Should tracing-core support being dynamically linked/loaded?

What changes would this entail? It seems like it could be a good way of ensuring that C and Rust see one consistent version of tracing-core's statics. It might also be important because FFI libraries are likely to themselves be dylibs...

hawkw commented 4 years ago
  • Structs which have the potential to be extended are prefixed with a uint32_t version; field, and have a corresponding TRACING_???_VERSION define.

    • Whenever defining a struct, the client is required to set the version field to the define's value from the header version which the commit was pulled from.
    • When the struct is passed to tracing-core, it can read the version prefix to determine the layout of the struct. This allows new fields to be added without breaking existing clients.

This gives me an idea for a similar defensive programming technique that I imagine we could use to keep C from corrupting Rust's memory. There are several places in tracing-core that rely on something being behind a 'static reference, but there is no way for the C FFI layer to ensure that pointers are are never going to be freed. This isn't great! We store a bunch of those static references internally, and if we're given a stack pointer or a pointer to an allocation that's accidentally freed, we now have a use-after-free.

Something we could consider doing is requiring all the structs that are static on the Rust side, like Metadata, to be prefixed with a magic number of some kind. When we use these static references later, we could check if the magic number is there, and skip it if it's not, rather than reading the rest of the data from behind

hawkw commented 4 years ago

(whoops, didn't mean to close the issue, clicked the wrong button!)

hawkw commented 4 years ago

Also, it would be really great if folks who are interested in working on/using the C FFI API would be willing to make some commitment to maintaining it...I'm happy to help with changes on the Rust side, but I'm not too familiar with maintaining a FFI interface, so I'm not really comfortable with being responsible for it.

mystor commented 4 years ago

What's the motivation behind defining the extern "C" API surface in tracing-core behind a feature flag, versus simply defining another crate that wraps the tracing-core API and defines C bindings?

I considered putting this API outside of tracing-core, but I figured it would make the most sense to put it in tracing-core as exposing an FFI like this requires pretty deep control over the layout of the Metadata and Callsite data structures in order to allow C to statically allocate the data. It wouldn't be possible to implement this API without tracing-core's help.

It seems to me that defining the C bindings in a separate crate could have advantages for versioning: we can make internal changes that don't break Rust code, but do break the C bindings, in minor/patch versions of tracing-core, if the C FFI crate uses a pinned tracing-core dependency, and then issue breaking releases of just the C FFI crate.

I think I like this solution to the semver issue, as it allows tracing to evolve freely without worrying about breaking C FFI. Some changes will still need to be made to tracing-core, but we could keep the C interface in a tracing-ffi crate by exporting the necessary internal APIs from tracing-core behind a feature flag and #[doc(hidden)].

IMO, this is definitely correct. Subscriber implementations should definitely be defined in Rust. The surface area of the API massively increases when including everything necessary to implement a subscriber.

Agreed.

I'm uncomfortable with exposing Interest to C as an enum. Interests are usizes, and we only use the first 2 bits of them currently. This is by design; the intention was to reserve the remaining bits for other uses, like using the rest as a bitset so a single Interest can store separate values from multiple subscribers, or to implement efficient sampling by storing a number representing a percentage of times a callsite should be sampled.

I think this could be dealt with by having the interest type be passed through FFI as an opaque uintptr_t value, and providing inline methods in the C header to decode parts of it to get useful values. This would fix which bits of the usize are being used for Interest, but would allow the other bits to be used in the future for other pieces of information.

inline bool tracing_interest_is_never(uintptr_t interest) {
  return (interest & 0x3) == 0;
}
inline bool tracing_interest_is_always(uintptr_t interest) {
  return (interest & 0x3) == 2;
}

As an alternative, what do you think about having a struct implementing the Callsite trait that's exposed to C as an opaque pointer? We could define Rust functions for accessing the callsite's internal Interest value atomically --- this would also avoid having to expose atomics in the C FFI surface but allow us to update callsites atomically.

We can't have callsites be opaque types unknown to C code, as that would prevent C code from allocating it in static memory. Needing to allocate every callsite on the heap would be quite unfortunate.

We also don't want them to be opaque so that the Interest atomic can be read as efficiently as possible. In C++ the atomic would probably be stored in a std::atomic<uintptr_t>, for example, which would allow for super fast checks similar to the Rust API.

I think a potential alternative approach would be to have the API be like:

typedef struct {
  const tracing_callsite_vtable *vtable;
} tracing_callsite_header;

This makes the pointers passed from C code to Rust be a thin tracing_callsite_header * pointer, allowing it to be wrapped into a Rust dyn Callsite, with the Rust trait implementation forwarding actions to the C vtable header.

Should the MAX_LEVEL static introduced in #853 be exposed to the C API?

I think even if it's behind a function call, it might still be worth exposing. Since Interests must be updated atomically on the Rust side, the callsite cache is also going to have to be behind a function call, so I don't think that C will ever be able to benefit from skipping a callsite without at least one function call.

With the callback API I propose above, the interest would be maintained in something like a std::atomic<uintptr_t> (for C++) or an _Atomic uintptr_t (for pure C11). This would allow FFI clients to benefit from skipping callsites without a function call.

I'm vaguely inclined to put off making a decision here until benchmarks or profiles start showing it to be useful like they did for Rust.

  • Should tracing-core support being dynamically linked/loaded?

What changes would this entail? It seems like it could be a good way of ensuring that C and Rust see one consistent version of tracing-core's statics. It might also be important because FFI libraries are likely to themselves be dylibs...

I don't imagine it would have much of an impact, but I am not super familiar with all of the implications of putting a Rust crate into a cdylib. I imagine it will "just work" with some annotations in the C header.

This gives me an idea for a similar defensive programming technique that I imagine we could use to keep C from corrupting Rust's memory. There are several places in tracing-core that rely on something being behind a 'static reference, but there is no way for the C FFI layer to ensure that pointers are are never going to be freed. This isn't great! We store a bunch of those static references internally, and if we're given a stack pointer or a pointer to an allocation that's accidentally freed, we now have a use-after-free.

Something we could consider doing is requiring all the structs that are static on the Rust side, like Metadata, to be prefixed with a magic number of some kind. When we use these static references later, we could check if the magic number is there, and skip it if it's not, rather than reading the rest of the data from behind

To a certain extent, we need to trust the C code to obey the lifetime rules set out by the API. The current header doesn't make these invariants very clear in their documentation, but a final API would need to have strict requirements. A C++ binding would try its best to use the type system to ensure things are actually allocated statically, but the C API simply doesn't have the ability to express this. I'm not sure if it's worth the effort or performance overhead to use sentinel magic numbers to guess if the value has been freed.

Also, it would be really great if folks who are interested in working on/using the C FFI API would be willing to make some commitment to maintaining it...I'm happy to help with changes on the Rust side, but I'm not too familiar with maintaining a FFI interface, so I'm not really comfortable with being responsible for it.

I'm hoping we can design the API to be minimal enough that it won't require much maintenance, but I can help with maintaining it.

xd009642 commented 3 years ago

Just commenting with my use case to add to the discussion :eyes:. I'd like to use tracing from C++ similar to zcash and ideally make use of some of the existing subscribers I use in rust, so for at least me I'm happy with keeping all the subscriber setup and implementation in rust call off to some ffi init function and then just create spans and traces in C++.

nagisa commented 3 years ago

I was requested to leave some comments here about my experiences integrating a conventional C library (gstreamer) logs into tracing. The very first iteration can be seen here.

The primary pain points to me was the prevalence of the &'static Metadata<'static> requirement. I understand that this is something that tracing itself can easily provide, but it would be nice if there was some point at which the required lifetime became less strict. Right now I'm forced to allocate and maintain a map of leaked metadata structures.

That said, I do believe that the native rust code needs are way more important and if reducing the lifetime will mean something actually gets slower, then I'd rather just do the leaking thing ^^