Traceparent header not propagated as expected when calling TGI API.

ToolKami commented 11 months ago

System Info

model=meta-llama/Llama-2-7b-chat-hf docker run -d --gpus all \ --shm-size 1g -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 \ -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest \ --model-id $model --trust-remote-code --otlp-endpoint http://otel-collector:4317

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

Spin up any inference engine with OTLP enabled

Call the API as below:

curl --location 'http://text-generation-inference-service:8080/generate' \
--header 'traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01' \
--header 'Content-Type: application/json' \
--data '{
"inputs": "What is Deep Learning?",
"parameters": {
    "max_new_tokens": 20
}
}'

Expected behavior

Root span is not received and the parent trace id is not propagated.

Narsil commented 11 months ago

Could you link were this behavior is documented ?

I don't seem to be able to find relevant docs. I think the setup for the oltp is rather the default, there might be a flag we're missing for this.

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

huggingface / text-generation-inference