huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.48k stars 968 forks source link

Traceparent header not propagated as expected when calling TGI API. #880

Closed ToolKami closed 3 months ago

ToolKami commented 11 months ago

System Info

model=meta-llama/Llama-2-7b-chat-hf docker run -d --gpus all \ --shm-size 1g -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 \ -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest \ --model-id $model --trust-remote-code --otlp-endpoint http://otel-collector:4317

Information

Tasks

Reproduction

  1. Spin up any inference engine with OTLP enabled
  2. Call the API as below:
    curl --location 'http://text-generation-inference-service:8080/generate' \
    --header 'traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01' \
    --header 'Content-Type: application/json' \
    --data '{
    "inputs": "What is Deep Learning?",
    "parameters": {
        "max_new_tokens": 20
    }
    }'

Expected behavior

Root span is not received and the parent trace id is not propagated.

Narsil commented 11 months ago

Could you link were this behavior is documented ?

I don't seem to be able to find relevant docs. I think the setup for the oltp is rather the default, there might be a flag we're missing for this.

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.