Open jaystary opened 2 years ago
i think the biggest reservation people have with seldon is the steep onboarding, which this would alleviate. while there are a ton of other inference server offerings out there, seldon is the only one i've seen that offers multi armed bandits out of the box: https://docs.seldon.io/projects/seldon-core/en/latest/analytics/routers.html
Use Case
Onboarding Seldon Core makes sense in consideration of MLFlow, Kubeflow, Grafana/Prometheus as possible Inference server. This combination can either run standalone or e.g. in combination with MLFlow / Kubeflow and can utilize all common serving backends (probably most interesting Triton).
It should be coupled with Prometheus / Grafana and potentially even include the ability to scale serving based on metrics
Ideas of Implementation
This combo would allow best in class inference performance if configured correctly (see here - https://towardsdatascience.com/hugging-face-transformer-inference-under-1-millisecond-latency-e1be0057a51c or here - https://developer.nvidia.com/blog/deploying-nvidia-triton-at-scale-with-mig-and-kubernetes/) for almost any use case out there.
Additional Info
Message from the maintainers:
Excited about this feature? Give it a :thumbsup:. We factor engagement into prioritization.