As a work around, once the GPU node as scaled, the deployment object created the knative service can be deleted to reset the timeout.
Alternatively if you create and scale a GPU node before the inference service is deployed it will be able to deploy the model server before the timeout is reached.
As a temporary workaround we may be able to configure the timeout as per the instructions here:
Currently the vllm example fails to deploy before knative serving times out due to the time it takes to scale a GPU node.
An bug report has been filed here to track the issue:
https://issues.redhat.com/browse/RHOAIRFE-193
As a work around, once the GPU node as scaled, the deployment object created the knative service can be deleted to reset the timeout.
Alternatively if you create and scale a GPU node before the inference service is deployed it will be able to deploy the model server before the timeout is reached.
As a temporary workaround we may be able to configure the timeout as per the instructions here:
https://knative.dev/docs/serving/configuration/deployment/