triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.23k stars 1.47k forks source link

Triton server with python backend slow for YOLO inferencing #4959

Closed rahul1728jha closed 2 years ago

rahul1728jha commented 2 years ago

Objective:

Running YOLOv5 with triton server for performing inference. Input source is a real time video stream through RTSP URL

Setup: Followed the below template to run my own custom code: https://github.com/triton-inference-server/python_backend/tree/main/examples/add_sub

Server Image: nvcr.io/nvidia/tritonserver:22.08-pyt-python-py3

Code Change:

Changes to examples/custom_yolo/model.py

Running the model.py cd python_backend mkdir -p models/custom_yolo/1/ cp examples/custom_yolo/model.py models/custom_yolo/1/model.py cp examples/custom_yolo/config.pbtxt models/custom_yolo/config.pbtxt tritonserver --model-repositorypwd/models

Client Image: nvcr.io/nvidia/tritonserver:22.08-py3-sdk /bin/bash

Code Change:

Changes to client.py -- Input source RTSP stream

Running client: python3 triton_inference_server_python_backend/examples/custom_yolo/client.py

Results Ran the triton server in Azure GPU VMs (Standard NC6s v3 (6 vcpus, 112 GiB memory)).

The triton server performs at a 15 FPS rate which is very slow. Without triton the result is at >= 30 FPS when only running YOLO

Question

My final inferencing pipeline is :

krishung5 commented 2 years ago

Hi @rahul1728jha, there are some factors that would affect the performance like the model configuration, the http/grpc latency, etc. I would suggest using Perf Analyzer and Model Analyzer to better understand what would be a possible bottleneck and better configuration for the custom model. For video streaming workloads, we would recommend using Nvidia DeepStream, which has Triton plugin that you might be interested in.

rahul1728jha commented 2 years ago

@krishung5 Thanks for your reply.

rahul1728jha commented 2 years ago

@krishung5 Thanks a lot for your reply. I have one final question. For the below use case:

My final inferencing pipeline is :

Is deepstream enough for the entire architecture or is triton needed ?

My objective is :

Any help would be appreciated.

krishung5 commented 2 years ago

@tanmayv25 Are you familiar with Nvidia DeepStream enough to provide more context here?

rahul1728jha commented 2 years ago

@krishung5 I do not have much familiarity with Nvidia Deep Stream. I have read that Deep Stream is the suitable framework for implementing my use case. So i was wondering if Deep Stream is the correct selection for the mentioned use case

tanmayv25 commented 2 years ago

Unfortunately, I too don't have any hands-on experience with Nvidia DeepStream. From their documentation, it definitely looks like they support most of the use-cases. For Triton plugin within deep stream, they say tensorflow and pytorch backends are supported. So, I am not so sure whether custom triton backends would be supported. Python backend suffers from extra data copies which will have adverse affect on performance. You can write your custom logic in C++ backend to derive more performance. See example backends here: https://github.com/triton-inference-server/backend/tree/main/examples

Looks like there are lots of webinars and technical blogpost here: https://developer.nvidia.com/deepstream-getting-started#introduction

rahul1728jha commented 2 years ago

@tanmayv25 Thanks a lot for your help. Will go through that

zengqingfu1442 commented 3 months ago

m