triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.14k stars 1.46k forks source link

Container images for Jetson devices #4781

Open rgov opened 2 years ago

rgov commented 2 years ago

Currently Triton is released as a container but the supported method of installing for Jetson devices is tarball of binaries that must be unpacked to /opt/tritonserver, along with additional instructions in docs/jetson.md.

Would it be possible to get official container images of Triton for Jetson devices that simply include the binaries and other dependencies? This would be a great way to get spun up with Triton on an embedded device, and for developers to make their own container images that add layers with their models and API clients.

Nvidia also evangelizes using containers on Jetson so it would align with the expectations that developers have about the Jetson ecosystem.

I took a stab at a Dockerfile and although it's not polished, it appears to work on my Jetson Nano after brief testing.

dyastremsky commented 2 years ago

Hi Ryan! Thank you for sharing those valid points and providing a draft of a Dockerfile. That's quite helpful.

We have this on a Jetson container on our roadmap. I'll share your request to add to its priority, and you helping get a Dockerfile ready helps as well. As far as your second question, the jetson.md instructions you linked differentiate between build-time and runtime dependencies, though it looks like you included both in your Dockerfile. Could you try only installing the runtime dependencies?

rgov commented 2 years ago

Thanks @dyastremsky. I see that the documentation was updated in #3974 to split the dependencies, but I was looking at the file from release 22.02 which is the last for JetPack 4.x. (The Nano is among the devices not supported in JetPack 5.0 – please keep that in mind when you decide your path forward with providing your own containers, especially in light of supply chain constraints.)

I took that information and backported it to my L4T release and updated the Dockerfile in the linked gist.

(Maybe there's some more room for improvement though. Does PyTorch really need ninja, gcc, and clang at runtime? Probably not. I took most of them out, and switched to getting libomp5 from the apt repositories.)

I should also note I only tested this with ONNX Runtime which I assume executes using TensorRT but I don't actually know.

dyastremsky commented 2 years ago

Any time. Appreciate you looking into this all and providing extra context. I will say that we don't typically make changes to older versions after release (i.e. if we released a container for version 22.09, then we wouldn't also release one for 22.02). That said, I've linked to this issue in our internal tracking so that all your comments can be looked at when we work on this. If nothing else, we'd be looking at the dependencies when creating the Dockerfile, which could help inform the way you write your Dockerfile for 22.02.