🐛 Bug Report: incompatibilities with LLM semantics

codefromthecrypt commented 3 days ago

Which component is this bug for?

LLM Semantic Conventions

📜 Description

As a first timer, I tried the ollama instrumentation, and sent a trace to a local collector. Then I compared the output with llm semantics defined by otel. I noticed as many compatibilities as incompatibilities, and it made me concerned that other instrumentation may have other large glitches.

👟 Reproduction steps

use olllama-python with the instrumentation here. It doesn't matter if you use the traceloop-sdk or normal otel to initialize the instrumentation ( I checked both just in case)

👍 Expected behavior

otel specs should be a subset of openllmetry semantics, so no incompatible attributes.

👎 Actual Behavior with Screenshots

compatible:

kind=client
name=ollama.chat
attributes['gen_ai.system']='Ollama'
attributes['gen_ai.response.model']='codegemma:2b-code'
attributes['gen_ai.usage.completion_tokens']=11

Incompatible:

attributes['gen_ai.prompt.0.content']='prompt_text' otel semantics declare this as a non-indexed attribute 'gen_ai.prompt'
attributes['gen_ai.completion.0.role']='assistant' otel semantics declare this as a non-indexed attribute 'gen_ai.request.model.role'

not yet defined in the standard:

attributes['llm.request.type']='chat'
attributes['llm.is_streaming']=false
attributes['llm.usage.total_tokens']=11

🤖 Python Version

3.12

📃 Provide any additional context for the Bug.

partially addressed by @gyliu513 in https://github.com/traceloop/openllmetry/pull/884

👀 Have you spent some time to check if this bug has been raised before?

[X] I checked and didn't find similar issue

Are you willing to submit PR?

None

codefromthecrypt commented 3 days ago

what seems similar to the spec is if you splat out json decoded events into attributes, the gen_ai.prompt is correlated with the attributes set here, just if that's the case, basically I would expect some explanation in the spec or in the README here about special casing. The main goal is to be able to analyze the data coherently, so these points are important to understand even if the spec is just missing some info.

It is also possible that there are some implicit understandings of how to interpret the spec that I'm lacking, so feel free to correct me.

nirga commented 3 days ago

Thanks for this @codefromthecrypt. OpenLLMetry was released on Oct 2023 and pre-dates the semantic conventions defined by otel. The semantic convention work is actually the result of this project, OpenLLMetry, and is very much still work in progress.

When we started the OSS, there were no semantic conventions for LLMs and so we decided to add attributes to things that we think would be important to users. These became the basis for the discussions we've done in the otel working group where some attributes were officially adopted, some were changed slightly (for example, we've decided to change the prefix llm to gen_ai) and some are still in discussions (for example how to log prompts). OpenLLMetry keeps up to date with all the agreements we're making in the otel working group while keeping the older conventions in place (like prompts) so that our users can still get the visibility they need.

The incompatibilities you mentioned are just things we haven't gotten the chance to formalize in the otel working group but will be adopted soon.

codefromthecrypt commented 3 days ago

can you please link to issues upstream about "The incompatibilities you mentioned are just things we haven't gotten the chance to formalize in the otel working group but will be adopted soon."? because that's easier to track

nirga commented 3 days ago

Sure: https://github.com/open-telemetry/semantic-conventions/issues/834 https://github.com/open-telemetry/semantic-conventions/issues/930 https://github.com/open-telemetry/semantic-conventions/issues/1170

traceloop / openllmetry