Closed npuichigo closed 5 months ago
Update: if --trace-config level=TIMESTAMPS
is provided, it works fine. With default --trace-config level=OFF
, the request just hang. Then after ctrl-c and try again, the server crashes.
Please, make sure to start opentelemetry
tracing with --trace-config level=TIMESTAMPS
, since by default it is OFF
. SegFault issue will be fixed in triton starting 24.03, but if you don't specify level
, spans will not be generated and sent from triton side.
This issue should be fixed in 24.03
System Info
GPU: H100 image: nvcr.io/nvidia/tritonserver:24.02-trtllm-python-py3 trtllm version: 0.8.0
Who can help?
@kaiyux @byshiue @schetlur-nv
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The TensorRT-LLM model I use is Baichuan, and I follow the official guidance to do model conversion
Follow the official guidance of https://github.com/triton-inference-server/tensorrtllm_backend/tree/v0.8.0/all_models/inflight_batcher_llm to build an ensemble for my model.
Also, since Baichuan needs
sentencepiece
python package and I follow https://github.com/triton-inference-server/python_backend?tab=readme-ov-file#creating-custom-execution-environments to build a python execution env withsentencepiece
installed.After that, the final model repository looks like:
Now I launch the tritonserver with opentelemetry configured like
and call with W3C header attached to trigger trace:
Now, the server crashes like:
Expected behavior
Expected behavior Since it's an ensemble model, I tested part of it like
preprocessing
to validate the tracing works for at least python backendactual behavior
additional notes
opentelemetry endpoint is an opentelemetry-collector to accept traces