Open gavrissh opened 1 month ago
@gavrishp Thank you for opening this issue.
Per the scenario reducing the number of pods for the old revision before increasing the number of pods for the new revision, we have implemented a strategy called resourceUtil
. I have not updated the doc yet, since this rollout strategy is still under test, and has some existing bugs as in https://github.com/knative-extensions/serving-progressive-rollout/issues/190 and https://github.com/knative-extensions/serving-progressive-rollout/issues/191.
One way to run this strategy is to add this annotation into your ksvc/isvc: rollout.knative.dev/progressive-rollout-strategy: "resourceUtil"
, like
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "test"
annotations:
# This is the annotation used to configure whether to enable the progressive rollout.
rollout.knative.dev/progressive-rollout-enabled: "true"
# This is the annotation used to configure the rollout strategy.
rollout.knative.dev/progressive-rollout-strategy: "resourceUtil"
In this strategy, if your new revision fails to launch, you will EVEN end up with no revision serving your isvc.ksvc, so it is still a bit risky.
Per the issue you have, I am not sure whether the new revision for your service has fully launched. You can check route
resource to see if all traffic has been shifted to the new revision. If all traffic is shifted to new, the old revision will scale down ultimately.
Based on the info you provided, I saw one pod is launched, but the other is still on its way. If it unable to launch, there can be a lot of reasons, e.g. hitting quota limit, scheduled on a bad node, etc.
@houshengbo Thanks for your response
In the case above, the second pod of the new revision cannot be created due to resource constraints. There are only 2 GPUs available for utilization, and the old revision still has one pod using 1 GPU.
Is the suggestion to try out the annotations?
annotations:
# This is the annotation used to configure whether to enable the progressive rollout.
rollout.knative.dev/progressive-rollout-enabled: "true"
# This is the annotation used to configure the rollout strategy.
rollout.knative.dev/progressive-rollout-strategy: "resourceUtil"
Yes, you can configure them on the service level, or globally with configmap as well: https://github.com/knative-extensions/serving-progressive-rollout/blob/main/config/core/configmaps/config-rolloutorchestrator.yaml#L47
@houshengbo The problem is when scaling up from 1 replica to 2 replica where cluster has resources only for 2 replicas.
Final state: Old revision : 1 running replica New revision: 1 running, 1 pending
I am trying to understand why we can't ensure availability as well during the availability rollout. we can do the following eg: Stage 1: UP: new revision scales up to 1, when old revision stays at 1.
Action: Make new revision active(because 1 replica of the new revision is active). old revision scales down to 0 and new revision scale to 2
The main problem is that the current logic needs minimum of new_replica_count+1
resources in the cluster instead of new_replica_count
resources . How can we avoid the need of extra resources ?
Adding the route
details
Name: gemma-predictor
Labels: component=predictor
model=hf-model1
serving.knative.dev/service=gemma-predictor
serving.kserve.io/inferenceservice=gemma
Annotations: serving.knative.dev/creator: system:serviceaccount:kserve:kserve-controller-manager
serving.knative.dev/lastModifier: system:serviceaccount:kserve:kserve-controller-manager
API Version: serving.knative.dev/v1
Kind: Route
Metadata:
Creation Timestamp: 2024-05-31T15:40:07Z
Finalizers:
routes.serving.knative.dev
Generation: 2
Owner References:
API Version: serving.knative.dev/v1
Block Owner Deletion: true
Controller: true
Kind: Service
Name: gemma-predictor
UID: 2aaab451-a04d-4c58-a72f-0bbeba831f74
Resource Version: 1786648
UID: e57c30c7-9e9f-474e-9f81-5cfb8dc243e7
Spec:
Traffic:
Latest Revision: false
Percent: 0
Revision Name: gemma-predictor-00001
Configuration Name: gemma-predictor
Latest Revision: true
Percent: 100
Status:
Address:
URL: http://gemma-predictor.svc.cluster.local/
Conditions:
Last Transition Time: 2024-05-31T15:52:45Z
Message: Revision "gemma-predictor-00002" failed to become ready.
Reason: RevisionMissing
Status: False
Type: AllTrafficAssigned
Last Transition Time: 2024-05-31T15:41:09Z
Message: external-domain-tls is not enabled
Reason: TLSNotEnabled
Status: True
Type: CertificateProvisioned
Last Transition Time: 2024-05-31T15:42:44Z
Status: True
Type: IngressReady
Last Transition Time: 2024-05-31T15:52:45Z
Message: Revision "gemma-predictor-00002" failed to become ready.
Reason: RevisionMissing
Status: False
Type: Ready
Observed Generation: 2
Traffic:
Latest Revision: false
Percent: 0
Revision Name: gemma-predictor-00001
Latest Revision: true
Percent: 100
Revision Name: gemma-predictor-00002
URL: http://gemma-predictor.svc.cluster.local/
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FinalizerUpdate 13m route-controller Updated "gemma-predictor" finalizers
Normal Created 12m route-controller Created placeholder service "gemma-predictor"
Normal Created 12m route-controller Created Ingress "gemma-predictor"
@johnugeorge The working principle for the availability strategy is to scale up the new revisions first and then scale down the old revisions to ensure the service availability. It requires a small proportion more of the required resources. For example, your new revision is running with 2 pods, old revision is running with 1 pod. During the process of the rolling out, it could reach totally 3 replicas for both old and new replicas. In this case, you need to at least 3 for your quota to make sure the upgrade can succeed.
There is another strategy called resourceUtil
, which works the opposite as the availability
strategy. It scales down the old revisions first and then scaling up the new revisions, making sure there is no additional resource consumed.
However, this resourceUtil
strategy is still risky, as you can see there quite a few issues open for this one. We can still trying to fix or rethink the issues for this strategy.
We noticed a particular scenario with the Knative progressive serving rollout, which we need input on.
We have a cluster with 2 GPUs Initially, we apply isvc with replicas as 1
Later we update the isvc to have replicas as 2 With the progressive rollout, we noticed it got stuck at this juncture with the traffic being moved to the new revision, but with this stalemate
The Knative progressive serving rollout design docs mention this -
"To keep the total number of replicas for both the old and new revisions remain the same, we can reduce the number for the old revisions before increasing the number for the new revisions. However, this is against the principle of the Knative service to serve the workload with demand."
Is there any approach where the above scenario can be achieved?