open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
261 stars 167 forks source link

GenAI: do we need to support multiple finish reasons? #1277

Open lmolkova opened 2 months ago

lmolkova commented 2 months ago

See https://github.com/open-telemetry/semantic-conventions/pull/980#discussion_r1586695157.

Context:

Having array attribute is problematic since it's harder to query and not really possible to use on metrics or per-choice events.

Multiple choices are supported by a limited set of models. Event when multiple choices are supported some SDKs (e.g. openai-dotnet) make a choice not to expose it on the convenience API level to simplify the design and provide much more friendly experience. Most of examples and documentation assumes there is just one choice.

Given this, it seems that in most of the cases there will be just one choice and just one finish reason on each span.

The proposal is to

lmolkova commented 2 months ago

More context on comma-separated list:

lmolkova commented 2 months ago

Based on offline discussions, we need to figure out batching story too and it might be related.

People may use n > 1 to save on costs (input tokens are charged once) - https://community.openai.com/t/how-does-n-parameter-work-in-chat-completions/288725.

Assuming it's one of the popular scenarios, the alternative to squishing finish reasons could be:

  1. maybe populate finish_reason on span if and only if there is one choice. There could be other things we can do here:
    • maybe populate error.type if the worst-of-finish-reasons indicates an error?
  2. have a metric that measures number of choices (in addition to the number of requests) - report finish_reason as an attribute there
  3. populate finish_reason as an attribute on the relevant event, so it's easier to query.

P2 and p3 seem mostly non-controversial and don't necessarily depend on p1.