pluralsh / plural-artifacts

Artifacts for applications deployable by plural
Apache License 2.0
47 stars 34 forks source link

Onboard Seldon Core #180

Open jaystary opened 2 years ago

jaystary commented 2 years ago

Use Case

Onboarding Seldon Core makes sense in consideration of MLFlow, Kubeflow, Grafana/Prometheus as possible Inference server. This combination can either run standalone or e.g. in combination with MLFlow / Kubeflow and can utilize all common serving backends (probably most interesting Triton).

It should be coupled with Prometheus / Grafana and potentially even include the ability to scale serving based on metrics

Ideas of Implementation

This combo would allow best in class inference performance if configured correctly (see here - https://towardsdatascience.com/hugging-face-transformer-inference-under-1-millisecond-latency-e1be0057a51c or here - https://developer.nvidia.com/blog/deploying-nvidia-triton-at-scale-with-mig-and-kubernetes/) for almost any use case out there.

Additional Info


Message from the maintainers:

Excited about this feature? Give it a :thumbsup:. We factor engagement into prioritization.

troyyyang commented 2 years ago

i think the biggest reservation people have with seldon is the steep onboarding, which this would alleviate. while there are a ton of other inference server offerings out there, seldon is the only one i've seen that offers multi armed bandits out of the box: https://docs.seldon.io/projects/seldon-core/en/latest/analytics/routers.html