Do we need to distinguish client side and server side llm call?

gyliu513 commented 4 months ago

Area(s)

area:gen-ai

What happened?

Description

There is a PR trying to enable vllm support metrics as well and it is adopting this semantic convention as well https://github.com/vllm-project/vllm/pull/4687 , here the question is vllm is server side, but I can see seems now the llm semantic convention are manily trying to instrument client side code like https://github.com/traceloop/openllmetry/tree/main/packages, do we need to distinguish the client side and server side semantic convetion? Thanks!

@nirga @lmolkova ^^

Semantic convention version

NA

Additional context

No response

lmolkova commented 4 months ago

Adding @SergeyKanzhelev who might be interested in GCP server LLM metrics.

The assumption is that client and server would have different information available.

E.g. gen_ai.client.operation.duration is different from gen_ai.server.operation.duration - depending on the client network, the difference can be significant. The same request that's timed out on the client could be successful on the server side resulting in different duration, error rate, usage and other metrics.

Client and server metrics might have different attributes. E.g. server might have information about pricing tier, region, availability zone that is not available on the client, but are very useful to know.

Therefore we either need:

different metrics names.
extra attribute(s) that distinguish client from server.

Other otel semantic conventions use option 1 and this is a good reason for gen_ai semconv to also defined different metrics for client and server. But nothing stops us from adding gen_ai server semantic conventions and reusing anything we can between client and server.

gyliu513 commented 4 months ago

@lmolkova @nirga there is an issue tracking the vLLM metrics https://github.com/vllm-project/vllm/issues/5041 and also a metrics proposal at https://docs.google.com/document/d/1SpSp1E6moa4HSrJnS4x3NpLuj88sMXr2tbofKlzTZpk/edit?resourcekey=0-ob5dR-AJxLQ5SvPlA4rdsg#heading=h.qmzyorj64um1

lmolkova commented 1 month ago

I believe this can be resolved:

spans have client kind (server spans will have server kind and may have different attributes)
metrics have client/server in the name

@gyliu513 please comment if you believe there is something else we need to do on this issue (and feel free to reopen it)

open-telemetry / semantic-conventions