Can there be an approach to terminate the last pod of the previous revision when most pods of new revision is up

gavrissh commented 1 month ago

We noticed a particular scenario with the Knative progressive serving rollout, which we need input on.

We have a cluster with 2 GPUs Initially, we apply isvc with replicas as 1

NAME                         READY STATUS  RESTARTS AGE
predictor-00001-xyz           2/2  Running 0

Later we update the isvc to have replicas as 2 With the progressive rollout, we noticed it got stuck at this juncture with the traffic being moved to the new revision, but with this stalemate

NAME                         READY STATUS  RESTARTS AGE
predictor-00001-xyz           2/2  Running 0     
predictor-00002-abc           0/2  Pending 0     
predictor-00002-xdqq          2/2  Running 0

The Knative progressive serving rollout design docs mention this - "To keep the total number of replicas for both the old and new revisions remain the same, we can reduce the number for the old revisions before increasing the number for the new revisions. However, this is against the principle of the Knative service to serve the workload with demand."

Is there any approach where the above scenario can be achieved?

houshengbo commented 1 month ago

@gavrishp Thank you for opening this issue. Per the scenario reducing the number of pods for the old revision before increasing the number of pods for the new revision, we have implemented a strategy called resourceUtil. I have not updated the doc yet, since this rollout strategy is still under test, and has some existing bugs as in https://github.com/knative-extensions/serving-progressive-rollout/issues/190 and https://github.com/knative-extensions/serving-progressive-rollout/issues/191.

One way to run this strategy is to add this annotation into your ksvc/isvc: rollout.knative.dev/progressive-rollout-strategy: "resourceUtil", like

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "test"
  annotations:
    # This is the annotation used to configure whether to enable the progressive rollout.
    rollout.knative.dev/progressive-rollout-enabled: "true"
    # This is the annotation used to configure the rollout strategy.
    rollout.knative.dev/progressive-rollout-strategy: "resourceUtil"

In this strategy, if your new revision fails to launch, you will EVEN end up with no revision serving your isvc.ksvc, so it is still a bit risky.

houshengbo commented 1 month ago

Per the issue you have, I am not sure whether the new revision for your service has fully launched. You can check route resource to see if all traffic has been shifted to the new revision. If all traffic is shifted to new, the old revision will scale down ultimately. Based on the info you provided, I saw one pod is launched, but the other is still on its way. If it unable to launch, there can be a lot of reasons, e.g. hitting quota limit, scheduled on a bad node, etc.

gavrissh commented 1 month ago

@houshengbo Thanks for your response

In the case above, the second pod of the new revision cannot be created due to resource constraints. There are only 2 GPUs available for utilization, and the old revision still has one pod using 1 GPU.

Is the suggestion to try out the annotations?

annotations:
   # This is the annotation used to configure whether to enable the progressive rollout.
    rollout.knative.dev/progressive-rollout-enabled: "true"
    # This is the annotation used to configure the rollout strategy.
    rollout.knative.dev/progressive-rollout-strategy: "resourceUtil"

houshengbo commented 1 month ago

Yes, you can configure them on the service level, or globally with configmap as well: https://github.com/knative-extensions/serving-progressive-rollout/blob/main/config/core/configmaps/config-rolloutorchestrator.yaml#L47

johnugeorge commented 1 month ago

@houshengbo The problem is when scaling up from 1 replica to 2 replica where cluster has resources only for 2 replicas.

Final state: Old revision : 1 running replica New revision: 1 running, 1 pending

I am trying to understand why we can't ensure availability as well during the availability rollout. we can do the following eg: Stage 1: UP: new revision scales up to 1, when old revision stays at 1.

Action: Make new revision active(because 1 replica of the new revision is active). old revision scales down to 0 and new revision scale to 2

The main problem is that the current logic needs minimum of new_replica_count+1 resources in the cluster instead of new_replica_count resources . How can we avoid the need of extra resources ?

gavrissh commented 1 month ago

Adding the route details

Name:         gemma-predictor
Labels:       component=predictor
              model=hf-model1
              serving.knative.dev/service=gemma-predictor
              serving.kserve.io/inferenceservice=gemma
Annotations:  serving.knative.dev/creator: system:serviceaccount:kserve:kserve-controller-manager
              serving.knative.dev/lastModifier: system:serviceaccount:kserve:kserve-controller-manager
API Version:  serving.knative.dev/v1
Kind:         Route
Metadata:
  Creation Timestamp:  2024-05-31T15:40:07Z
  Finalizers:
    routes.serving.knative.dev
  Generation:  2
  Owner References:
    API Version:           serving.knative.dev/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Service
    Name:                  gemma-predictor
    UID:                   2aaab451-a04d-4c58-a72f-0bbeba831f74
  Resource Version:        1786648
  UID:                     e57c30c7-9e9f-474e-9f81-5cfb8dc243e7
Spec:
  Traffic:
    Latest Revision:     false
    Percent:             0
    Revision Name:       gemma-predictor-00001
    Configuration Name:  gemma-predictor
    Latest Revision:     true
    Percent:             100
Status:
  Address:
    URL:  http://gemma-predictor.svc.cluster.local/
  Conditions:
    Last Transition Time:  2024-05-31T15:52:45Z
    Message:               Revision "gemma-predictor-00002" failed to become ready.
    Reason:                RevisionMissing
    Status:                False
    Type:                  AllTrafficAssigned
    Last Transition Time:  2024-05-31T15:41:09Z
    Message:               external-domain-tls is not enabled
    Reason:                TLSNotEnabled
    Status:                True
    Type:                  CertificateProvisioned
    Last Transition Time:  2024-05-31T15:42:44Z
    Status:                True
    Type:                  IngressReady
    Last Transition Time:  2024-05-31T15:52:45Z
    Message:               Revision "gemma-predictor-00002" failed to become ready.
    Reason:                RevisionMissing
    Status:                False
    Type:                  Ready
  Observed Generation:     2
  Traffic:
    Latest Revision:  false
    Percent:          0
    Revision Name:    gemma-predictor-00001
    Latest Revision:  true
    Percent:          100
    Revision Name:    gemma-predictor-00002
  URL:                http://gemma-predictor.svc.cluster.local/
Events:
  Type    Reason           Age   From              Message
  ----    ------           ----  ----              -------
  Normal  FinalizerUpdate  13m   route-controller  Updated "gemma-predictor" finalizers
  Normal  Created          12m   route-controller  Created placeholder service "gemma-predictor"
  Normal  Created          12m   route-controller  Created Ingress "gemma-predictor"

houshengbo commented 22 hours ago

@johnugeorge The working principle for the availability strategy is to scale up the new revisions first and then scale down the old revisions to ensure the service availability. It requires a small proportion more of the required resources. For example, your new revision is running with 2 pods, old revision is running with 1 pod. During the process of the rolling out, it could reach totally 3 replicas for both old and new replicas. In this case, you need to at least 3 for your quota to make sure the upgrade can succeed.

houshengbo commented 22 hours ago

There is another strategy called resourceUtil, which works the opposite as the availability strategy. It scales down the old revisions first and then scaling up the new revisions, making sure there is no additional resource consumed.

However, this resourceUtil strategy is still risky, as you can see there quite a few issues open for this one. We can still trying to fix or rethink the issues for this strategy.

knative-extensions / serving-progressive-rollout

Can there be an approach to terminate the last pod of the previous revision when most pods of new revision is up #200