Closed monsterlyg closed 1 year ago
Hi @monsterlyg, I believe @oandreeva-nv has some references for recent work on this feature.
Hi @monsterlyg , could you please provide more information, i.e. what version of Triton you are using, what tracing mode you are using, what is the structure of your model, what you expect to see and what you are currently seeing.
We've added nested spans support in 23.07 version for tracing with OpenTelemetry mode: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/trace.md#opentelemetry-trace-support. If this is something, you haven't tried, I encourage you to check it out and see if this solves your issue.
I'll take a try.
---Original--- From: "Olga @.> Date: Sat, Aug 12, 2023 03:34 AM To: @.>; Cc: @.**@.>; Subject: Re: [triton-inference-server/server] Nested submodels cat not betraced ? (Issue #6175)
We've added nested spans support in 23.07 version for tracing with OpenTelemetry mode: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/trace.md#opentelemetry-trace-support. If this is something, you haven't tried, I encourage you to check it out and see if this solves your issue.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
22.12 with TIMESTAMP. The structure is like: ensemble
---Original--- From: "Olga @.> Date: Sat, Aug 12, 2023 03:01 AM To: @.>; Cc: @.**@.>; Subject: Re: [triton-inference-server/server] Nested submodels cat not betraced ? (Issue #6175)
Hi @monsterlyg , could you please provide more information, i.e. what version of Triton you are using, what tracing mode you are using, what is the structure of your model, what you expect to see and what you are currently seeing.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
We've added nested spans support in 23.07 version for tracing with OpenTelemetry mode: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/trace.md#opentelemetry-trace-support. If this is something, you haven't tried, I encourage you to check it out and see if this solves your issue.
I am not sure whether I used OpenTelemetry successfully bur i got error like:
[Error] File: /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/opentelemetry-cpp/src/opentelemetry-cpp/exporters/otlp/src/otlp_http,Body:t.Content-Type: text/plain] Export failed, Status:400, Header: Content-Length: 0
[Error] File: /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/opentelemetry-cpp/src/opentelemetry-cpp/exporters/otlp/src/otlp_http_exporter.cc:107 [OTLP HTTP Client] ERROR: Export 1 trace span(s) error: 1
The triton was launched with:
tritonserver --model-repository=./ \
--http-port=5104 \
--metrics-port=5105 \
--grpc-port=5106 \
--trace-config mode=opentelemetry \
--trace-config opentelemetry,resource=service.name=tritonserver \
--trace-config opentelemetry,url=http://localhost:5104/v1/traces \
--trace-config opentelemetry,file=/ssd2/lyg/files/trace_files/ottrace.json \
--trace-config opentelemetry,log-frequency=50 \
--trace-config rate=100 \
--trace-config level=TIMESTAMPS \
--trace-config count=100
It seems like Triton is trying to export collected traces, but it can't. May I ask if you've set up a collector on localhost:5104/v1/traces
? For a debugging purposes, you can try to run Jaeger docker container, as shown here.
22.12 container is a little bit old, I recommend trying both tracing APIs with newer versions.
--trace-config opentelemetry,log-frequency=50
- is not going to work, log-frequency
is only valid for Triton Trace API
I also recommend to set up a different port for opentelemetry exporter. I believe, now you have Triton's http port and port set up for OpenTelemetry's exporter matching --trace-config opentelemetry,url=http://localhost:5104/v1/traces
It seems like Triton is trying to export collected traces, but it can't. May I ask if you've set up a collector on
localhost:5104/v1/traces
? For a debugging purposes, you can try to run Jaeger docker container, as shown here.22.12 container is a little bit old, I recommend trying both tracing APIs with newer versions.
--trace-config opentelemetry,log-frequency=50
- is not going to work,log-frequency
is only valid for Triton Trace API In fact, I switched to r23.07 container in this experiment. I have no experience on using OpenTelemetry before and usually triton runs on the remote server. So it may be difficult to visualize the tracing info in a browser through jaeger. And I have no idea about how to "set up a collector". So what it should be like if anything runs correctly with OpenTelemetry mode. I mean, the traced requests info. Will it be saved to a file or only displayed on a window page? can you share an example ?
@monsterlyg May I ask you to try Triton Trace API in 23.07 version and see if your issue persists?
For OpenTelemetry, you can run Jaeger docker container, that will be your collector and visualization tool. Simply run:
docker run -d --name jaeger -e COLLECTOR_OTLP_ENABLED=true -p 16686:16686 -p 4318:4318 jaegertracing/all-in-one:latest
More on Jaeger's Docker image you can find here. And access the UI in http://localhost:16686/
. Note, the above command will expect traces to be exported to localhost:4310/v1/traces
, which is a default url for trace-config opentelemetry,url
.
Now you can start your Triton server instance with:
tritonserver --model-repository=./ \
--http-port=5104 \
--metrics-port=5105 \
--grpc-port=5106 \
--trace-config mode=opentelemetry \
--trace-config opentelemetry,resource=service.name=tritonserver \
--trace-config rate=100 \
--trace-config level=TIMESTAMPS \
--trace-config count=100
and send requests. Jaeger's UI located at http://localhost:16686/
is quite straightforward, it will look like this. Pick tritonserver
under Service
drop down menu and you should be able to see traces. Note Service name was specified by --trace-config opentelemetry,resource=service.name=tritonserver
option.
@oandreeva-nv It runs well. Thanks.
@monsterlyg Happy to hear. I will close this issue then. Feel free to reach out here if you have any other questions, we can always re-open the closed issue.
I am tracing a model-repository which contains multi-models but I found the nested sub-ensemble models which are not scheduled in the top parent ensemble model were not traced. Is it a supported feature ?