open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.02k stars 2.33k forks source link

[pkg/ottl] Add LogEvent context #34695

Closed codefromthecrypt closed 2 months ago

codefromthecrypt commented 2 months ago

Component(s)

processor/transform

Is your feature request related to a problem? Please describe.

https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/33737 introduces the Blob Attribute Uploader Connector. One driver of that discussion was LLM (GenAI) semantics, which were initially written to have prompt/completion data recorded as span events. With this tool, a backend which uses span attributes for the same data, could convert accordingly.

A proof of concept to showcase this, included span and span event contexts.

    "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl/contexts/ottlspan"
    "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl/contexts/ottlspanevent"

The use case in question, LLM (GenAI) semantics, is on a path to change again, now from span events to log events, using the Logs Event API.

I asked if the uploader would adjust and was told to make new issues for these pending changes.

I think new, separate issues for "Log Events -> Span Event Connector" and "Span Events -> Logs Connector" would make sense.

Along the way, I noticed that while there is an ottllog context, there is no ottllogevent context. This made me curious whether we intend to heuristically handle log events, or if there should be another context, which for example exposes the attribute event.name and a body where present is AnyValue.

Describe the solution you'd like

I would like to see an ottlogevent context, so that we can continue the work begun in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/33737 towards the pending reality of switching LLM (GenAI) prompt/completion events from span to logs.

Especially as there are concerns such as how to map semantic conventions (possibly weaver) to the body AnyValue, I believe this should be a separate package from ottlog even if there are shared code.

Describe alternatives you've considered

One way is to make a custom transform processor which doesn't use ottl.

Additional context

the ultimate goal is to help implementors who have to deal with mixed mode, where some instrumentation emit span events and others log events, or where the backend, or UI cannot yet support log events.

github-actions[bot] commented 2 months ago

Pinging code owners:

TylerHelmuth commented 2 months ago

@codefromthecrypt I am very out of the loop here.

Contexts defined in pkg/ottl/contexts all relate to a structure in pdata which is our representation of OTLP protos. I am not familiar with log event as a structure in the proto.

Are you referring to the Logs Event API? I thought the plan was for events to be represented as LogRecords. Is there going to be a new structure in pdata for us to create a context?

TylerHelmuth commented 2 months ago

I'll add that OTTL currently has no way to do cross-signal transformations. It is possible a context could be crafted that has references to different data types, but there are likely some problems that would need to be solved.

codefromthecrypt commented 2 months ago

@TylerHelmuth so I think many of your notes are hitting nails. Yes, this is the Logs Event API I corrected my PR desc.

It sounds first that cross signals is indeed a prereq to the use case I mention (span event -> log event). So, this may rule out ottl as a solution for this until/unless there's scope for that.

The Logs Event API is not often put to practice, yet. Indeed, the spec says it manifests as LogRecords, where the body is AnyValue and attributes must also include "event.name".

So, this is why I mentioned perhaps since this is an opinionated layer, perhaps a context that narrows to these could be helpful.

The other point is at least in the pending change to LLM, there are structured definitions, and leaves have types to them. For example, some fields in the event body (type AnyValue) are string where others are AnyValue or something else. To do something like access an ID described in the body, requires some knowledge of the types which are in the semantic conventions. So, I wonder if such a binding would make sense, or too far out for this change.

TylerHelmuth commented 2 months ago

So, this is why I mentioned perhaps since this is an opinionated layer, perhaps a context that narrows to these could be helpful.

This is probably possible, but I wonder what that would look like. If the pdata LogRecord does not contain an attribute named event.name and is passed into ottllogevent.NewTransformContext should it error? Or would it be better to use the ottllog TransformContext and build your own wrapper that enforces the event API requirements?

At the moment, a ottllogevent path parser would behave exactly the same as the ottllog parser, because an otel event is just an otel log in the eyes of pdata.

To do something like access an ID described in the body, requires some knowledge of the types which are in the semantic conventions

I could see how this is the differentiator. If there are defined, enforceable body structures, then an ottllogevent path parser could enforce "you can only index the body if the event.name attribute is in some list of approved event names".

However, OTTL syntax doesn't expose that such a statement is allowed. A user, why writing a statement, would still need to understand their data to know if it is safe to write such a statement.

codefromthecrypt commented 2 months ago

interesting points. I think maybe we should use ottllog and then revisit this to see if there is really any value to be gained by an event facade.

For the genai use case of log events -> span events, we can't use ottl anyway due to lack of cross signal. So, I won't likely be able to help contribute motivating ottl in the near term (directly or indirectly).

Should I close and re-open this with more info, or leave it open?

TylerHelmuth commented 2 months ago

I think since we don't currently have any plans to move forward with this idea, closing is best.