Log record mutations are visible in next registered processors

pellared commented 4 weeks ago

What are you trying to achieve?

Define if log record mutations done in record processor are visible in next registered processors.

From https://github.com/open-telemetry/opentelemetry-specification/issues/4010#issuecomment-2100878288:

I believe the spec is not clear enough and can be interpreted in different ways leading to very different implementations (e.g. chained processors that can mutate data and see mutations of previous processor or parallel processors that don't see mutations of other processors).

We likely need to fix the spec. It is in a stable doc but is vague enough that I think it is a bug and should be fixed.

Since the spec is vague I am also worried that existing implementations in different language SDKs may have interpreted it differently and have made different choices. IMO, we need to start with auditing existing implementations and see where we stand currently and then decide how exactly we want to fix the spec (one of the options is that both "chained-allow-mutation" and "fanout-copy-to-parallel" should be possible like they are in the Collector pipelines).

Additional context.

The undocumented assumption was the log processor will follow the pattern that mutations will be visible to the next processor to be consistent with span processors. However, the C++ Logs SDK is not designed this way (log records are passed as a copy to subsequent registered processors). Other Logs SDKs follow the pattern or are not stable.

pellared commented 4 weeks ago

I will check if it is possible to follow the pattern in Go even though it's pitfalls I outlined https://github.com/open-telemetry/opentelemetry-specification/issues/4010#issuecomment-2126709559. My main goal is to keep a possibility to have zero allocations when the log records are not modified (as this the the usual use case).

If we succeed to find a satisfying design for Go (and Rust), we could update the specification to make this requirement explicit. However, what are we going to do with the fact that the stable C++ Logs SDK already has taken a different route? Would it have to be a "SHOULD" instead of a "MUST"?

CC @open-telemetry/rust-maintainers @open-telemetry/cpp-maintainers @open-telemetry/technical-committee

jack-berg commented 4 weeks ago

If a log processor's mutations were not visible to the next processor, then there would be no (or very little) point of onEmit receiving a ReadWriteLogRecord. If mutations are only visible to the processor being called, then the SDK might as well call onEmit with a ReadableLogRecord, since that processor can easily mutate with copy-on-write semantics.

Other evidence that the "mutations are not visible to subsequent processors" is the wrong interpretation:

The log SDK was modeled after the trace SDK, and SpanProcessor#onStart modifications are visible to subsequent processors. It would be a very strange design if they weren't. .
"LogRecordProcessors are invoked in the same order they were registered" (see spec). There's no point of this if mutations aren't visible to subsequent processors.
"A LogRecordProcessor may freely modify logRecord for the duration of the OnEmit call. If logRecord is needed after OnEmit returns (i.e. for asynchronous processing) only reads are permitted" (see spec). The intent of this line is clarify that log processors can only mutate log records synchronously, since if they hold onto the record and mutate asynchronously, the mutation may conflict with one from a subsequent processor. See the #2955 for more context, which was resolved by this statement.

However, what are we going to do with the fact that the stable C++ Logs SDK already has taken a different route?

Wondering if you can elaborate on this. I can't seem to find the LogRecordProcessor interface definition in c++ (I don't know c++ so I'm very naive navigating the code). Do log record processors call subsequent processors in c++? Is it possible to write a log processor which extracts and includes baggage attributes on log records in c++?

pellared commented 4 weeks ago

If a log processor's mutations were not visible to the next processor, then there would be no (or very little) point of onEmit receiving a ReadWriteLogRecord.

The modified record can be passed to a decorated processor. This is how mutations can be currently done in C++, Rust, and Go. E.g. mutatingProcessor found here: https://github.com/open-telemetry/opentelemetry-specification/issues/4010#issue-2260773748

Notice that currently there is nothing in the spec that allows creating a new/custom ReadableLogRecord.

Wondering if you can elaborate on this.

Sure. Regarding C++ my information is based on https://github.com/open-telemetry/opentelemetry-specification/issues/4010#issuecomment-2096577303.

jack-berg commented 4 weeks ago

The modified record can be passed to a decorated processor. This is how mutations can be currently done in C++, Rust, and Go.

There's no mention of a decorated processor in the spec, so it sounds like C++, Rust and Go invented new concepts. To be clear, I support an effort to evolve the spec to include accommodations which allow for higher performance implementations. However, I feel strongly that the current language of the spec does imply that log record mutations are visible to the subsequent processors. The design juggles several priorities, including performance and conceptual symmetry with prior art in otel (the trace signal in particular). A design featuring chaining or decorating processors might pave a clear path for a high performance implementation, it does so at the expense of conceptual symmetry.

My rust, c++, and go skills are pretty poor, but there may be creative language-specific ways to keep the door open to zero allocations while also staying within the spirit of the spec. For example, a processor might have a method indicating whether or not it mutates data. If none of the processors mutate data, or if there is only one processor registered, then the SDK could use an alternative implementation of ReadWriteLogRecord which avoids allocations.

pellared commented 4 weeks ago

Actually, it may occur that it won't be a problem for Go. I am in progress of evaluating it. I remember from my previous prototypes for most cases the Go escape analysis was doing good job. Reference https://github.com/open-telemetry/opentelemetry-go/blob/main/log/DESIGN.md#passing-record-as-pointer-to-loggeremit

Rust Logs SDK is also not stable and there was a discussion whether to use the same pattern as for span processors. See https://github.com/open-telemetry/opentelemetry-specification/issues/4010#issuecomment-2096523643

It may be a good compromise to follow span processor pattern to achieve better consistency across languages as well as with tracing.

pellared commented 3 weeks ago

FYI https://github.com/open-telemetry/opentelemetry-go/pull/5469

pellared commented 3 weeks ago

For sake of transparency I want to share some thoughts based on feedback I received from @MrAlias.

The specification does not require log records to be concurrent safe (notice that the spans are required to be concurrent safe by the spec here). What happens when a subsequent processor modifies a log record that was passed to a batch processor? Does the batch processor need to make a deep copy of processed records?

How will we make sure that users will not modify the log record asynchronously? In Go, I thought about adding a comment. However, commenting something like this is not going to be a safe solution. There are going to be many user use cases that will violate this and having a comment is not an adequate design choice. We could make log record concurrent safe, but this would decrease the performance even further.

jack-berg commented 3 weeks ago

A LogRecordProcessor may freely modify logRecord for the duration of the OnEmit call. If logRecord is needed after OnEmit returns (i.e. for asynchronous processing) only reads are permitted

From the spec). A processor which asynchronously modifies the ReadWriteLogRecord passed to LogRecordProcessor#onEmit is violating the spec and the behavior is undefined. In practice, some processors will need to make changes asynchronously. These implementations should make a local copy of the log record. I imagine an implementation could take stronger measures to prevent this by doing something to lock the log record after a log record processor has returned, which may be at the expense of performance (not sure because I haven't tried).

This appears to be one of those areas of the spec where we have a possible foot gun. And personally, I'm fine with it. We should have a well paved path that works out of the box for most of the things that users want to do. But there will be advanced users who want to step off that path, and they should have the ability to do things and take responsibility for those things. Its the "we're all adults here" argument.

pellared commented 3 weeks ago

A processor which asynchronously modifies the ReadWriteLogRecord passed to LogRecordProcessor#onEmit is violating the spec and the behavior is undefined.

A processor is synchronously modifying a record which is retained by another processor which retains is and only makes asynchronous reads. This could happen if e.g. one registers (e.g. by mistake) a "retracting processor" after a batch processor.

I think that the specification should say something like:

If the log record is not concurrent safe then OnEmit implementations:

must not modify the record asynchronously

must not be retain the record

must copy the record if it is needed after OnEmit returns

Currently the spec makes it easy to have implementations with race conditions.

jack-berg commented 3 weeks ago

I'm supportive of clarifying the language. But it wouldn't effectively change any implementation would it? At the end of the day, the user is still responsible for not making asynchronous changes to the record. This is just providing better language explaining "why".

pellared commented 3 weeks ago

the user is still responsible for not making asynchronous changes to the record.

The user can make a synchronous change and another processor makes an asynchronous read. This leads to a race condition.

But it wouldn't effectively change any implementation would it?

E.g. you are not making a clone of record here: https://github.com/open-telemetry/opentelemetry-java/blob/d9cef81e298f93b2379aa87d96edacf834fd7fcc/sdk/logs/src/main/java/io/opentelemetry/sdk/logs/export/BatchLogRecordProcessor.java#L92 and the next registered processor can make log record modifications that would lead to a race condition.

MrAlias commented 3 weeks ago

Keep in mind that a user defined processor is not constrained by this specification. Saying they should or shouldn't do something is a meaningless statement. We can only make prescriptive statements about OTel processor behaviors.

Having designs the syntactically restrict or promote good user use or experience is the way you influence 3rd-party use cases.

jack-berg commented 3 weeks ago

E.g. you are not making a clone of record here:

Nah, we're always going to make that clone in otel java. We have different representations for SDK processors vs. exporters. But it could save a non-stable implementation from needing to do something like that. Although I would argue that an implementation doesn't need to make that distinction today - they can safely at the record to the batch processor queue. Any issues with the record being mutated at the same time as the exporter is serializing it is the fault of the processor which violated the spec.

pellared commented 3 weeks ago

If we want consistency with tracing (over performance) for log processors then we should consider requiring the SDK's log record record to be concurrent safe.

EDIT: For reference https://github.com/open-telemetry/opentelemetry-go/pull/5478

pellared commented 3 weeks ago

Leaving some thoughts why I still think that requiring log record mutations to be visible in next registered processors may be a performance+design footgun and why it may be undesired in some languages.

The design juggles several priorities, including performance and conceptual symmetry with prior art in otel (the trace signal in particular).

We should answer what is more important.

Performance or symmetry with tracing? Symmetry with OTel tracing or language's standard logging library?

A design featuring chaining or decorating processors might pave a clear path for a high performance implementation, it does so at the expense of conceptual symmetry.

In my opinion, there can be application which emits "tons" of logs and performance is more important. Especially that having a similar design to tracing involves overhead that would not be possible to overcome. Some users are already complaining that tracing and metrics involves to many heap allocations (e.g. here) and adding allocations to logs would bring the problem to the next level. Reference: https://github.com/open-telemetry/opentelemetry-go/pull/5478

But there will be advanced users who want to step off that path, and they should have the ability to do things and take responsibility for those things.

More advanced users may want to squeeze as much performance as possible.

From https://opentelemetry.io/community/mission/#we-value-_performance_:

OpenTelemetry users should not have to choose between high-quality telemetry and a performant application. High performance is a requirement for OpenTelemetry, and unexpected interference effects in the host application are unacceptable.

I think that in Go ecosystem having a design which allows zero-heap allocation logging (especially in the hot path) is important. Reference: https://go.dev/blog/slog#performance

A "decorator-based" design would be Go idiomatic as it would follow the design of the standard Go structures logging library: https://pkg.go.dev/log/slog#Handler. Creating pipelines for OTel Go Logs SDK and slog (Go standard library) would be very similar. Notice that it is possible to make a processor which affect others by making a "fan-out" processor.

My rust, c++, and go skills are pretty poor, but there may be creative language-specific ways to keep the door open to zero allocations while also staying within the spirit of the spec.

This is how I believe it can be done in Go: https://github.com/open-telemetry/opentelemetry-go/pull/5470 However, I feel that this design is ugly. It gives the user several options how to propagate changes to other processors (as well how to isolate the changes).

For example, a processor might have a method indicating whether or not it mutates data. If none of the processors mutate data, or if there is only one processor registered, then the SDK could use an alternative implementation of ReadWriteLogRecord which avoids allocations.

In Go, representing a record as interface (to give a possibility to have several implementation) would already decrease the performance (slower calls and more heap allocations; more; I was doing some benchmarks months ago but I hope you will just trust me).

To sum up, I lean towards favoring logging performance and symmetry with structured logging package from the Go standard library over symmetry with OTel tracing design.

pellared commented 3 weeks ago

SIG meeting summary

We have a common agreement that for logs the performance is extremely important as we might expect more load than for e.g. tracing.

It looks that we are leaning towards changing the spec so that SDKs can require chaining processors for making mutations. Processors are responsible for calling the next in the chain which allows the records to stay on the stack. This is also how C++, Rust, Go Logs SDKs are currently designed.

pellared commented 3 weeks ago

changing the spec so that SDKs can require chaining processors for making mutations.

I will be unavailable for about next 3 weeks. I try to make some PR once I am back.

Thanks a lot for your feedback and support.

open-telemetry / opentelemetry-specification

Log record mutations are visible in next registered processors #4065