stikkireddy / mlflow-extensions

Deploy models quickly to databricks via mlflow based serving infra.
Apache License 2.0
11 stars 6 forks source link

[RFC] Batch inference using ez_deploy_config #19

Open stikkireddy opened 1 week ago

stikkireddy commented 1 week ago

initially we can stick to:

ideal interface:

def perform_batch(table, ez_deploy_config, batch_config) -> bool

The batch_config should let you control very few knobs, like (endpoint or local), (ray vs spark vs httpx) (parallelism), checkpointing strategy. For spark we can use streaming but partitions need to be tuned so that you can transact to delta table maybe every 1000 rows. GPUs MUST be saturated with requests (this may be easier to achieve with ray actors.

The output must always be a delta table.

If you need ray refer to this. This is very outdated but may be helpful: https://github.com/stikkireddy/llm-batch-inference/blob/main/01_batch_scoring_single_node.py

stikkireddy commented 1 week ago

using ray + gpu vms works phenomenally

stikkireddy commented 1 week ago

For batch using sglang it requires this: https://github.com/sgl-project/sglang/pull/1127