stikkireddy / mlflow-extensions

Deploy models quickly to databricks via mlflow based serving infra.
https://stikkireddy.github.io/mlflow-extensions/
Apache License 2.0
19 stars 11 forks source link

[FEATURE] Support ray serve engine #35

Open stikkireddy opened 2 months ago

stikkireddy commented 2 months ago

Ray Serve is a phenomenal serving engine that abstracts serving and some throughput optimization features like batching, async execution, pipelining, etc. Supports torch and other popular frameworks. This can be used for the following models:

  1. custom embedding models with post processors
  2. standard embedding models
  3. encoder decoder models like whisper
  4. diffusion models
  5. multi model serving
stikkireddy commented 2 months ago

some common embedding models: