Open TheCodeWrangler opened 10 months ago
I see that the example uses "tensorrt_llm" backend.
Does this inference example utilize optimum-nvidia?
Yeah, first thing I thought about was oh my Transformers Python backend can now be become faster, but then this only exists as a docker image. Gonna have to wait for that Pip package I guess.
I would like to use this as a python backend within
triton-inference-server
in order to allow for bringing my production parameters in better alignment with training / validation.Are there plans to make a supported triton server tag for optimum-nvidia? https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver/tags