triton-inference-server / pytriton

PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
https://triton-inference-server.github.io/pytriton/
Apache License 2.0
746 stars 51 forks source link

Model repo #91

Open tylerweitzman opened 6 days ago

tylerweitzman commented 6 days ago

Is it possible to use pytriton to load a full models repository that is otherwise requiring the full Triton server docker container? One of the things I love about pytriton is how easy it is to install in new machines without needing a container. IT could be a great go-between

I imagine doing projects like this as they mature 1/ Start with pytriton with no models folder at start 2/ Add models folder and still use pytriton 3/ Deploy production with full triton container but continue dev using pytriton when containers are not desired

piotrm-nvidia commented 5 days ago

Thank you for your question.

The PyTriton library, while functional for simple use cases where a model is directly linked to a server for deployment, has limitations in feature support and does not facilitate integration with external model stores. For scenarios requiring more complex operations, such as dynamic loading and unloading of models, it is recommended to use the Triton Inference Server instead. This server supports a Python backend, enabling the serving of models via Python scripts. For further optimization, you might also explore the Triton Model Navigator. This utility aids in converting models from frameworks like PyTorch to TensorRT, thus boosting performance. For more detailed information, you can refer to the Python backend documentation and the Triton Model Navigator GitHub repository.

Is there anything else you'd like to know or any specific details you need assistance with?