huggingface / optimum-nvidia

Apache License 2.0
887 stars 86 forks source link

Triton Inference Server #69

Open TheCodeWrangler opened 8 months ago

TheCodeWrangler commented 8 months ago

I would like to use this as a python backend within triton-inference-server in order to allow for bringing my production parameters in better alignment with training / validation.

Are there plans to make a supported triton server tag for optimum-nvidia? https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver/tags

TheCodeWrangler commented 8 months ago

I see that the example uses "tensorrt_llm" backend.

https://github.com/huggingface/optimum-nvidia/blob/main/templates/inference-endpoints/text-generation/config.pbtxt

Does this inference example utilize optimum-nvidia?

Elsayed91 commented 7 months ago

Yeah, first thing I thought about was oh my Transformers Python backend can now be become faster, but then this only exists as a docker image. Gonna have to wait for that Pip package I guess.