redhat-ai-services / ai-accelerator

The AI Accelerator is a template project for setting up Red Hat OpenShift AI using GitOps
29 stars 60 forks source link

Error deploying vLLM with Cluster Autoscaling #22

Closed strangiato closed 2 months ago

strangiato commented 6 months ago

Currently the vllm example fails to deploy before knative serving times out due to the time it takes to scale a GPU node.

An bug report has been filed here to track the issue:

https://issues.redhat.com/browse/RHOAIRFE-193

As a work around, once the GPU node as scaled, the deployment object created the knative service can be deleted to reset the timeout.

Alternatively if you create and scale a GPU node before the inference service is deployed it will be able to deploy the model server before the timeout is reached.

As a temporary workaround we may be able to configure the timeout as per the instructions here:

https://knative.dev/docs/serving/configuration/deployment/