stikkireddy / mlflow-extensions

Deploy models quickly to databricks via mlflow based serving infra.
https://stikkireddy.github.io/mlflow-extensions/
Apache License 2.0
19 stars 11 forks source link

[FEATURE] GPU Config or Cloud Specific flags/arguments #37

Open stikkireddy opened 2 months ago

stikkireddy commented 2 months ago

There are descrepancies of how things like cuda graph, flash attn work between hardware and being able to do gpu or cloud specific configs in EzDeployConfig is gonna be important.

COHERE_FOR_AYA_23_35B = EzDeployConfig(
    name="cohere_aya_35b",
    engine_proc=_ENGINE,
    engine_config=_ENGINE_CONFIG(
        model="CohereForAI/aya-23-35B",
        guided_decoding_backend="outlines",
        vllm_command_flags={
            "--max-num-seqs": 128,
            "--gpu-memory-utilization": 0.98,
            "--distributed-executor-backend": "ray",
        },
    ),
    gpu_specific_config= {...}
    cloud_specific_config={...}
    serving_config=ServingConfig(
        # rough estimate for the engines this includes model weights + kv cache + overhead + intermediate states
        minimum_memory_in_gb=76,
    ),
)