[FEATURE] GPU Config or Cloud Specific flags/arguments

There are descrepancies of how things like cuda graph, flash attn work between hardware and being able to do gpu or cloud specific configs in EzDeployConfig is gonna be important.

COHERE_FOR_AYA_23_35B = EzDeployConfig(
    name="cohere_aya_35b",
    engine_proc=_ENGINE,
    engine_config=_ENGINE_CONFIG(
        model="CohereForAI/aya-23-35B",
        guided_decoding_backend="outlines",
        vllm_command_flags={
            "--max-num-seqs": 128,
            "--gpu-memory-utilization": 0.98,
            "--distributed-executor-backend": "ray",
        },
    ),
    gpu_specific_config= {...}
    cloud_specific_config={...}
    serving_config=ServingConfig(
        # rough estimate for the engines this includes model weights + kv cache + overhead + intermediate states
        minimum_memory_in_gb=76,
    ),
)

stikkireddy / mlflow-extensions

[FEATURE] GPU Config or Cloud Specific flags/arguments #37