Error deploying vLLM with Cluster Autoscaling

Currently the vllm example fails to deploy before knative serving times out due to the time it takes to scale a GPU node.

An bug report has been filed here to track the issue:

https://issues.redhat.com/browse/RHOAIRFE-193

As a work around, once the GPU node as scaled, the deployment object created the knative service can be deleted to reset the timeout.

Alternatively if you create and scale a GPU node before the inference service is deployed it will be able to deploy the model server before the timeout is reached.

As a temporary workaround we may be able to configure the timeout as per the instructions here:

https://knative.dev/docs/serving/configuration/deployment/

redhat-ai-services / ai-accelerator

Error deploying vLLM with Cluster Autoscaling #22