open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
251 stars 163 forks source link

Resolve correlation impact for the switch from span events to (log) events in GenAI #1294

Closed codefromthecrypt closed 2 months ago

codefromthecrypt commented 2 months ago

Area(s)

area:gen-ai

Is your change request related to a problem? Please describe.

https://github.com/open-telemetry/semantic-conventions/pull/980 prefers the (log) events api for prompt/completion data in cases including chat and tool operations in GenAI.

Formerly, this was done in span events, which have a direct causality link (of trace/spanID) regardless of whether a correlation genai scoped attribute was added to each event. While there are good reasons to use the (log) events api, there is no explicit link anymore, so it leaves this up to instrumentation to choose to add a correlating attribute or a link.

We don't yet have direct advice on how to handle this transition, even though in practice it won't manifest for months or longer. For example, python is the dominant language in GenAI and it lacks the Event/Events api for the time being. This means many will use the compatibility mode (span events) and so not even see this issue until this status quo changes.

Describe the solution you'd like

I would like for us to offer guidance either in the events document for genai, or link to a practices doc defined at a higher level. This can help avoid a situation where users need to use temporal correlation or employ backend tricks to stitch-together the (log) events caused by a span. It would avoid this, as implementors can adopt a strategy before the experience miss manifests in practice.

Describe alternatives you've considered

Leaving this out of scope is ok for experts who also have context in how this is designed. They can choose to use practices such as span links or find matching correlation IDs. However, this doesn't help the backends achieve something consistently.

For example, such special casing may work for a vendor or couple based on tacit practice sharing. The goal of the semantics, as I understand it, it to allow both sides of the spec, instrumentors and those collecting the data, to have coherency and consistency. Some stronger advice, particularly when spans are in use, would be helpful, even if in the future we may see (log) event only or (log) event + metrics only implementations.

Additional context

link discussing this directly https://github.com/open-telemetry/semantic-conventions/pull/980#discussion_r1619541018

karthikscale3 commented 2 months ago

Thanks for bringing this up. This definitely needs some guidance. Adding it to log events will break the link with the span. This is crucial because today, LLM observability focused vendors such as Langtrace, Traceloop etc. offer a critical feature as part of the SDK for recording user feedback on model responses as scores which is attached as an attributed to a new span and linked to the parent span of the trace where the span corresponding to the LLM api call was generated. This is further shown in the UI to understand the score given by the user for a particular LLM generation.

Without a reliable way to link spans with logs, this feature is at risk of being impacted which means migrating to logs will be a blocker for vendor instrumentations.

lmolkova commented 2 months ago

/cc @open-telemetry/semconv-event-approvers

trask commented 2 months ago

Events (and logs) do have an optional reference to a parent span (in the proto, spec, and API). Is this asking for something different from what exists already? Thanks

lmolkova commented 2 months ago

If I understand the concern:

Is the question here is whether it's still reasonable to have incoherence during the experimentation phase?

codefromthecrypt commented 2 months ago

Events (and logs) do have an optional reference to a parent span (in the proto, spec, and API). Is this asking for something different from what exists already? Thanks

So, one ask I had to resolve this issue is "link to a practices doc defined at a higher level" I may be looking at the wrong docs. I was looking at the following:

So, I am looking at the wrong docs, if it is already mentioned about the transition from span events to (log) event API. specifically the correlation concern with spans. It didn't seem obvious to me this is already the case. Can you point me to what I missed?

codefromthecrypt commented 2 months ago

Is the question here is whether it's still reasonable to have incoherence during the experimentation phase?

My goal wasn't about the abstract acceptance of incoherence, rather the specific case of transition from span events to log events, and to help folks implement that properly. The PR #980 can go in anyway, hence me forking this as a separate concern to follow-up on vs suggest that PR should not go in.

To paraphrase the description, I was asking for either practical concerns in the genAI text, or in a top-level text about the event/s API wrt span correlation. I'm told this already exists for event/s API, but I can't find it yet.

Once we find/create this content, I can make a follow-up PR to enhance the spec with either a link to that advice or inline if there isn't one.

Meanwhile, most instrumentation won't hit this anyway as python has no event/s api yet. That said, if folks implement this in in javascript/typescript they would have access to the Event/s api. https://github.com/open-telemetry/opentelemetry-js/tree/main/experimental/packages/api-events

Both for other folks like @karthikscale3 and myself, I really just want concrete guidance. We are near, but not done yet. This is independent of community acceptance of things that break as we attempt to move things forward in an experimental SIG.

Hope this helps

codefromthecrypt commented 2 months ago

ps as I think it is easier to do vs suggest, I'll raise a PR to the spec doc as I feel folks aren't getting my intent. It isn't if we mention there is a context field or not in docs. Rather, the impact of moving from span events, and that if you do that and don't use that field, you will not achieve the same correlation. I think adding this remark directly is easier than what I've attempted so far 🤞

codefromthecrypt commented 2 months ago

here's what I was looking for, somewhere. I hope it clarifies my intent. Once we have something like this, we can link to it in GenAI and any other new spec that migrates from span events to (log) events without people having to think through the nuance of this topic. I'm happy to expand, etc. this is small just to get things started.

https://github.com/open-telemetry/opentelemetry-specification/pull/4167

codefromthecrypt commented 2 months ago

I'm going to close this out and re-open it when the topic leads to a concrete problem. I think that's a better use of our time vs trying to prevent one with a doc change. Thanks for the feedback @karthikscale3 @trask @lmolkova