RFC: C client API for `tracing-core`

Feature Request

Crates

tracing-core

Motivation

tracing provides a useful and efficient API for performing structured logging across your program, unfortunately it can currently only be used by Rust clients in a codebase, making it unsuitable as the logging interface for large multi-language codebases.

Many languages have support for C FFI, making it a common baseline for defining core implementation libraries. By exposing a low-level C API to the client APIs from tracing-core, it would be possible to use a common tracing subscriber for logging from all languages in a single project.

This RFC does not propose exposing the ability to define subscribers or full power of tracing-core's Dispatch and Value APIs. In many cases this would be inefficient due to back-and-forth FFI calls, and could make it too difficult to evolve the crate backwards-compatibly in the future. Instead, it proposes exposing a minimal set of APIs to allow efficiently creating and dispatching spans and events from non-Rust code.

Unfortunately, making an API C-compatible places a number of constraints on the implementation, especially when trying to make it efficient. Due to the upcoming breaking changes for tracing-core 0.2 (#922), which is likely to include other changes to the Metadata and Callsite APIs, now seems like a good time to consider what changes would be necessary.

Constraints

Introduce no additional overhead for Rust code using the tracing APIs when the "c-ffi" feature is disabled, and minimal overhead if the feature is enabled.
In order to support a C++ API around the C API, the API must only use the common C/C++ language subset. Unfortunately, this prevents the use of C11's _Atomic and makes it difficult to include atomic values in the API.
The C API should avoid introducing new limits on the ability to evolve tracing-core. Some additional constraints may be inevitable, but should be avoided.

Proposal

Introduce a new "c-ffi" feature to the tracing-core crate, and include a new tracing-core.h header file in the crate's distribution. When the "c-ffi" feature is enabled, tracing-core would expose a new c_ffi module in the root, containing a series of #[no_mangle] extern "C" methods corresponding to those defined in the tracing-core.h header.

No high-level APIs are exposed. High level interfaces in other languages, such as C++, would be built using this API, and maintained/versioned separately from other tracing components.

Potential `tracing-core.h` Header

This RFC proposes an API which is relatively simple. It does not attempt to perfectly mimic the existing Rust API, in order to allow it to evolve separately.

This header does not take into account potential changes to the interface coming in 0.2, and may be missing important features, but should outline the general idea.

gist: https://gist.github.com/mystor/3c979b1a903986b2d30c2f74bfe4d6e8

Design Decisions

Structs which have the potential to be extended are prefixed with a uint32_t version; field, and have a corresponding TRACING_???_VERSION define.
- Whenever defining a struct, the client is required to set the version field to the define's value from the header version which the commit was pulled from.
- When the struct is passed to tracing-core, it can read the version prefix to determine the layout of the struct. This allows new fields to be added without breaking existing clients.
The Dispatch type is not exposed to C code, and each method implicitly acts on the current dispatcher.
- If a desire for non-default dispatches in non-Rust languages appears in the future, these interfaces can be added backwards-compatibly.
- This avoids the need for multiple non-inlineable FFI calls with a function pointer callback interface, and reduces the exposed API surface.
Methods which would normally take a callsite instead take const tracing_metadata *, as thin pointers are easier to work with in C.
Values are described by a single tracing_str, rather than supporting the full Value trait API, in order to keep cross-language calls down and reduce the impact on tracing-core's Rust API.

Impact on `tracing-core`

Adding this new API will have an impact on the tracing-core APIs which will likely require breaking changes. The following are some of the potential impacts:

Existing &Metadata<'_> references will need to be changed to support the Metadata struct having a variable size. This could be done in a few different ways:
1. Split into a zero-sized type Metadata<'_>, which has getters defined on it for reading from the valid C Metadata struct prefix, and a separate MetadataImpl<'_> type which is created by callsites, may contain additional fields, and is deref-able to &Metadata<'_>.
2. Pass around a special MetadataRef<'_> type, instead of &Metadata<'_> which provides getters, and can hold references to the valid metadata prefix type.
3. Use the C tracing_metadata type directly as the rust Metadata<'_> type.
  - note: This would prevent changing size_of::<Metadata<'_>>() (e.g. by adding additional fields), without breaking changes, as C callers would not provide a large enough allocation after new fields are added
The Rust Metadata<'_> type must contain the C tracing_metadata struct internally, to allow returning it from the tracing_span_current API.
- This could be avoided by having that function return a const tracing_opaque_metadata * instead, and require FFI calls to read properties from it.
Callsites which directly use &dyn Callsite will need to be changed to use a special struct with a custom VTable, as C callsite references cannot be directly converted to a trait object without allocations.
- Many existing callsites should not be impacted, due to already using callsite::Identifier which can act as that wrapper type.
- If this proves to be an issue, it is possible to avoid by using an extra indirect function call if the const tracing_callsite_vtable * is stored as a prefix of the callsite object, instead of alongside the data pointer.
```
typedef struct {
const tracing_callsite_vtable *vtable;
// ...
} tracing_callsite_header;
typedef tracing_callsite_header *tracing_callsite_ref;
```
Exposing enums like Level, Kind, and Interest to C code effectively stabilizes the variant's values in the ABI.
- These values have already been changed once in a minor version for optimization reasons (#853), and this could prevent making that kind of change in the future.
- This may be avoidable by performing runtime conversions between C and Rust enum representations.

Open Questions

Should downstream implementations be required to always build against the latest version of tracing-core.h? (i.e. should a source-compatible, but ABI-breaking, change be considered a "breaking change")
- If an source-compatible ABI change is not considered breaking, this may limit the ability to bind tracing-core into non-C languages, as many require restating the C header file in a language-specific way.
Should the MAX_LEVEL static introduced in #853 be exposed to the C API?
- This could help avoid calling tracing_callsite_register in some cases, however would be both difficult to implement (due to atomics being difficult to expose to C), and may not be worthwhile if hidden behind a function call.
Should tracing-core support being dynamically linked/loaded?

EDIT: Noticed an oversight in the initial proposal. As tracing_span_current returns a const tracing_metadata *, the Rust Metadata<'_> struct would be required to contain that type as a prefix. I've noted this in the "Impacts" section, and added an alternative suggestion using an opaque typedef and additional functions.

cc @str4d, I imagine that this is relevant to your interests!

Some initial thoughts:

Introduce a new "c-ffi" feature to the tracing-core crate, and include a new tracing-core.h header file in the crate's distribution. When the "c-ffi" feature is enabled, tracing-core would expose a new c_ffi module in the root, containing a series of #[no_mangle] extern "C" methods corresponding to those defined in the tracing-core.h header.

What's the motivation behind defining the extern "C" API surface in tracing-core behind a feature flag, versus simply defining another crate that wraps the tracing-core API and defines C bindings?

It seems to me that defining the C bindings in a separate crate could have advantages for versioning: we can make internal changes that don't break Rust code, but do break the C bindings, in minor/patch versions of tracing-core, if the C FFI crate uses a pinned tracing-core dependency, and then issue breaking releases of just the C FFI crate.

This RFC does not propose exposing the ability to define subscribers or full power of tracing-core's Dispatch and Value APIs. In many cases this would be inefficient due to back-and-forth FFI calls, and could make it too difficult to evolve the crate backwards-compatibly in the future. Instead, it proposes exposing a minimal set of APIs to allow efficiently creating and dispatching spans and events from non-Rust code.

IMO, this is definitely correct. Subscriber implementations should definitely be defined in Rust. The surface area of the API massively increases when including everything necessary to implement a subscriber.

Exposing enums like Level, Kind, and Interest to C code effectively stabilizes the variant's values in the ABI.

These values have already been changed once in a minor version for optimization reasons (#853), and this could prevent making that kind of change in the future.

This may be avoidable by performing runtime conversions between C and Rust enum representations.

I'm uncomfortable with exposing Interest to C as an enum. Interests are usizes, and we only use the first 2 bits of them currently. This is by design; the intention was to reserve the remaining bits for other uses, like using the rest as a bitset so a single Interest can store separate values from multiple subscribers, or to implement efficient sampling by storing a number representing a percentage of times a callsite should be sampled.

As an alternative, what do you think about having a struct implementing the Callsite trait that's exposed to C as an opaque pointer? We could define Rust functions for accessing the callsite's internal Interest value atomically --- this would also avoid having to expose atomics in the C FFI surface but allow us to update callsites atomically.

Should the MAX_LEVEL static introduced in #853 be exposed to the C API?

This could help avoid calling tracing_callsite_register in some cases, however would be both difficult to implement (due to atomics being difficult to expose to C), and may not be worthwhile if hidden behind a function call.

I think even if it's behind a function call, it might still be worth exposing. Since Interests must be updated atomically on the Rust side, the callsite cache is also going to have to be behind a function call, so I don't think that C will ever be able to benefit from skipping a callsite without at least one function call.

Should tracing-core support being dynamically linked/loaded?

What changes would this entail? It seems like it could be a good way of ensuring that C and Rust see one consistent version of tracing-core's statics. It might also be important because FFI libraries are likely to themselves be dylibs...

Structs which have the potential to be extended are prefixed with a uint32_t version; field, and have a corresponding TRACING_???_VERSION define.

Whenever defining a struct, the client is required to set the version field to the define's value from the header version which the commit was pulled from.

When the struct is passed to tracing-core, it can read the version prefix to determine the layout of the struct. This allows new fields to be added without breaking existing clients.

This gives me an idea for a similar defensive programming technique that I imagine we could use to keep C from corrupting Rust's memory. There are several places in tracing-core that rely on something being behind a 'static reference, but there is no way for the C FFI layer to ensure that pointers are are never going to be freed. This isn't great! We store a bunch of those static references internally, and if we're given a stack pointer or a pointer to an allocation that's accidentally freed, we now have a use-after-free.

Something we could consider doing is requiring all the structs that are static on the Rust side, like Metadata, to be prefixed with a magic number of some kind. When we use these static references later, we could check if the magic number is there, and skip it if it's not, rather than reading the rest of the data from behind

(whoops, didn't mean to close the issue, clicked the wrong button!)

Also, it would be really great if folks who are interested in working on/using the C FFI API would be willing to make some commitment to maintaining it...I'm happy to help with changes on the Rust side, but I'm not too familiar with maintaining a FFI interface, so I'm not really comfortable with being responsible for it.

What's the motivation behind defining the extern "C" API surface in tracing-core behind a feature flag, versus simply defining another crate that wraps the tracing-core API and defines C bindings?

I considered putting this API outside of tracing-core, but I figured it would make the most sense to put it in tracing-core as exposing an FFI like this requires pretty deep control over the layout of the Metadata and Callsite data structures in order to allow C to statically allocate the data. It wouldn't be possible to implement this API without tracing-core's help.

It seems to me that defining the C bindings in a separate crate could have advantages for versioning: we can make internal changes that don't break Rust code, but do break the C bindings, in minor/patch versions of tracing-core, if the C FFI crate uses a pinned tracing-core dependency, and then issue breaking releases of just the C FFI crate.

I think I like this solution to the semver issue, as it allows tracing to evolve freely without worrying about breaking C FFI. Some changes will still need to be made to tracing-core, but we could keep the C interface in a tracing-ffi crate by exporting the necessary internal APIs from tracing-core behind a feature flag and #[doc(hidden)].

IMO, this is definitely correct. Subscriber implementations should definitely be defined in Rust. The surface area of the API massively increases when including everything necessary to implement a subscriber.

Agreed.

I'm uncomfortable with exposing Interest to C as an enum. Interests are usizes, and we only use the first 2 bits of them currently. This is by design; the intention was to reserve the remaining bits for other uses, like using the rest as a bitset so a single Interest can store separate values from multiple subscribers, or to implement efficient sampling by storing a number representing a percentage of times a callsite should be sampled.

I think this could be dealt with by having the interest type be passed through FFI as an opaque uintptr_t value, and providing inline methods in the C header to decode parts of it to get useful values. This would fix which bits of the usize are being used for Interest, but would allow the other bits to be used in the future for other pieces of information.

inline bool tracing_interest_is_never(uintptr_t interest) {
  return (interest & 0x3) == 0;
}
inline bool tracing_interest_is_always(uintptr_t interest) {
  return (interest & 0x3) == 2;
}

As an alternative, what do you think about having a struct implementing the Callsite trait that's exposed to C as an opaque pointer? We could define Rust functions for accessing the callsite's internal Interest value atomically --- this would also avoid having to expose atomics in the C FFI surface but allow us to update callsites atomically.

We can't have callsites be opaque types unknown to C code, as that would prevent C code from allocating it in static memory. Needing to allocate every callsite on the heap would be quite unfortunate.

We also don't want them to be opaque so that the Interest atomic can be read as efficiently as possible. In C++ the atomic would probably be stored in a std::atomic<uintptr_t>, for example, which would allow for super fast checks similar to the Rust API.

I think a potential alternative approach would be to have the API be like:

typedef struct {
  const tracing_callsite_vtable *vtable;
} tracing_callsite_header;

This makes the pointers passed from C code to Rust be a thin tracing_callsite_header * pointer, allowing it to be wrapped into a Rust dyn Callsite, with the Rust trait implementation forwarding actions to the C vtable header.

Should the MAX_LEVEL static introduced in #853 be exposed to the C API?

I think even if it's behind a function call, it might still be worth exposing. Since Interests must be updated atomically on the Rust side, the callsite cache is also going to have to be behind a function call, so I don't think that C will ever be able to benefit from skipping a callsite without at least one function call.

With the callback API I propose above, the interest would be maintained in something like a std::atomic<uintptr_t> (for C++) or an _Atomic uintptr_t (for pure C11). This would allow FFI clients to benefit from skipping callsites without a function call.

I'm vaguely inclined to put off making a decision here until benchmarks or profiles start showing it to be useful like they did for Rust.

Should tracing-core support being dynamically linked/loaded?

What changes would this entail? It seems like it could be a good way of ensuring that C and Rust see one consistent version of tracing-core's statics. It might also be important because FFI libraries are likely to themselves be dylibs...

I don't imagine it would have much of an impact, but I am not super familiar with all of the implications of putting a Rust crate into a cdylib. I imagine it will "just work" with some annotations in the C header.

This gives me an idea for a similar defensive programming technique that I imagine we could use to keep C from corrupting Rust's memory. There are several places in tracing-core that rely on something being behind a 'static reference, but there is no way for the C FFI layer to ensure that pointers are are never going to be freed. This isn't great! We store a bunch of those static references internally, and if we're given a stack pointer or a pointer to an allocation that's accidentally freed, we now have a use-after-free.

Something we could consider doing is requiring all the structs that are static on the Rust side, like Metadata, to be prefixed with a magic number of some kind. When we use these static references later, we could check if the magic number is there, and skip it if it's not, rather than reading the rest of the data from behind

To a certain extent, we need to trust the C code to obey the lifetime rules set out by the API. The current header doesn't make these invariants very clear in their documentation, but a final API would need to have strict requirements. A C++ binding would try its best to use the type system to ensure things are actually allocated statically, but the C API simply doesn't have the ability to express this. I'm not sure if it's worth the effort or performance overhead to use sentinel magic numbers to guess if the value has been freed.

Also, it would be really great if folks who are interested in working on/using the C FFI API would be willing to make some commitment to maintaining it...I'm happy to help with changes on the Rust side, but I'm not too familiar with maintaining a FFI interface, so I'm not really comfortable with being responsible for it.

I'm hoping we can design the API to be minimal enough that it won't require much maintenance, but I can help with maintaining it.

Just commenting with my use case to add to the discussion :eyes:. I'd like to use tracing from C++ similar to zcash and ideally make use of some of the existing subscribers I use in rust, so for at least me I'm happy with keeping all the subscriber setup and implementation in rust call off to some ffi init function and then just create spans and traces in C++.

I was requested to leave some comments here about my experiences integrating a conventional C library (gstreamer) logs into tracing. The very first iteration can be seen here.

The primary pain points to me was the prevalence of the &'static Metadata<'static> requirement. I understand that this is something that tracing itself can easily provide, but it would be nice if there was some point at which the required lifetime became less strict. Right now I'm forced to allocate and maintain a map of leaked metadata structures.

That said, I do believe that the native rust code needs are way more important and if reducing the lifetime will mean something actually gets slower, then I'd rather just do the leaking thing ^^

tokio-rs / tracing