knative-extensions / serving-progressive-rollout

Knative Serving extension to roll out the revision progressively
Apache License 2.0
3 stars 6 forks source link

With the resourceUtil strategy, graceful traffic transfer does not occur during consecutive update requests. #202

Open AyushSawant18588 opened 1 month ago

AyushSawant18588 commented 1 month ago

In a K8s cluster with 2 GPUs Initially, we apply isvc with replicas as 1

NAME                                                READY   STATUS    RESTARTS   AGE
test1-predictor-00001-deployment-775867dd96-44g92   2/2     Running   0          5m14s

Initial State Replicas: 1

1st Update: replicas 1 -> 2 Immediately make 2nd Update: 2-> 1

With the 2 consecutive updates, we notice 3 revisions. Initial State - Revision 1 After 1st Update: Revision 2 After 2nd Update: Revision 3

Transitions noticed:

NAME                                                READY   STATUS    RESTARTS   AGE
test1-predictor-00002-deployment-79cd96bd78-j44gp   1/2     Running   0          5s
test1-predictor-00002-deployment-79cd96bd78-ttxrj   1/2     Running   0          4s
test1-predictor-00003-deployment-7dd4575f4-ppsv8    0/2     Pending   0          1s

Traffic info:
Traffic:
        Latest Revision:  true
        Percent:          100
        Revision Name:    test1-predictor-00001
Traffic:
        Latest Revision:  true
        Percent:          100
        Revision Name:    test1-predictor-00001

NAME                                                READY   STATUS    RESTARTS      AGE
test1-predictor-00001-deployment-6c7d788877-phff5   0/2     Pending   0             2m16s
test1-predictor-00002-deployment-6f7f4f68c5-2chrb   2/2     Running   0             3m4s
test1-predictor-00002-deployment-6f7f4f68c5-8hvl5   2/2     Running   4 (82s ago)   3m4s
test1-predictor-00003-deployment-76f5959878-td4rl   0/2     Pending   0             3m

Expectation On consecutive updates, graceful transfers between routes need to occur. The final pod in the older revision should be terminated only when the 1 pod in the new revision is up, and the traffic route has been shifted to the new revision.

Should not get stuck and reach the final state of pods of revision 3 only running

NAME                                                READY   STATUS    RESTARTS   AGE
test1-predictor-00003-deployment-76f5959878-td4rl   2/2     Running   0          5m14s