isarsoft / yolov4-triton-tensorrt

This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server
http://www.isarsoft.com
Other
276 stars 63 forks source link

Triton server through KFserving #25

Closed ontheway16 closed 3 years ago

ontheway16 commented 3 years ago

Hi, As a non-developer, but more a user, I have managed to create a fully local kubernetes cluster (currently single node [12CPU, 16GB Ram], more physical GPU nodes waiting aside) installed with KFserving, knative and istio components. Now I am looking to find a way to use yolov4-triton under KFserving, in hope of utilizing autoscaling features. You can see a general view of my cluster in attached screenshot images.

I am looking to find out what else is missing to send inference requests. Naturally, a triton pod and service will be needed, but not sure about their configuration to let it receive requests through KFserving components. Unfortunately, the sample provided in github/KFserving for Triton server is for the Bert model, and looks complicated then object detection (several python files involved etc.).

If anyone experimented on this, or wants to cooperate, happy to hear.

And, I wish the best for all of you, in 2021 !

Screenshot01

Screenshot02

kiali-ss

philipp-schmidt commented 3 years ago

Hi, please consider adding the images as file attachments to the issue. The website you are using is unresponsive and full of ads.

Happy new year!

ontheway16 commented 3 years ago

Apologies, images uploaded in correct way.

Meanwhile, focused on issue and difficulty is, lack of Triton samples for setting up istio-ingress gateway and virtual services. Still looking for candidate .yaml files to let it work through the gateway which (hopefully) let knative autoscaling to function as expected. Also forgot to mention, I am using MetalLB, for local load balancing solution.

philipp-schmidt commented 3 years ago

Hi, what tool is the last screenshot of? What's it good for?

ontheway16 commented 3 years ago

Hi, what tool is the last screenshot of? What's it good for?

Hi, its the dashboard named 'Kiali' for Istio. It lets you see all the networking connections, clusterIPs, dns names etc, plus you can see the animated flow of requests, as req. per sec., or percents. Allow view/edit .yaml files of services etc. A very capable tool.

Edit. Allows you to see the flow if sidecar proxies enabled.

philipp-schmidt commented 3 years ago

I've enabled the new Github "Discussions" feature. Feel free to start a discussion over there about deploying this with KFserving, this will probably be helpful for future use of this repo in production for many people.

I will probably close this issue in the near future to keep the issue tracker clean.

If you are successful in deploying it we might also write a wiki entry.

philipp-schmidt commented 3 years ago

Also feel free to write instructions on how to spin up a kubernetes cluster with this if you have the time, this will most likely be helpful for many people as well

ontheway16 commented 3 years ago

Yes but first I have to make it running, I am not there yet. Another issue is, I am trying to setup a fully local install. Majority of people using cloud load balancing solutions.

ontheway16 commented 3 years ago

I made some progress on making it work under kfserving but experiencing a problem with current client.py. since it's gRPC only, wanted to ask if there's a http client script somewhere ?

philipp-schmidt commented 3 years ago

Currently not, but the difference between the Triton GRPC client interface and HTTP interface is really marginal. If you check the Triton Client examples you will find that it's probably just a few lines to change in the current client.