mangglesh commented 1 month ago

get_clean_dict function in generate file is generating wrong steps config for rollout object, and making steps in strategy for argo rollout crd invalid.

self.spec.strategy before clean function for rollout object {'canary': {'maxSurge': '25%', 'maxUnavailable': 0, 'steps': [{'setWeight': 10}, {'pause': {'duration': '1h'}}, {'setWeight': 20}, {'pause': {}}]}}

after clean function get_clean_dict(self)['spec']['strategy'] {'canary': {'maxSurge': '25%', 'maxUnavailable': 0, 'steps': [{'setWeight': 10}, {'pause': {'duration': '1h'}}, {'setWeight': 20}, {}]}}

{'pause': {}} has been changed to {} making rollout object invalid.

after update rollout is marked degraded with message message: >- InvalidSpec: The Rollout "example-rollout-vartical" is invalid: spec.strategy.steps[3]: Invalid value: "step.Experiment: true step.Pause: true step.SetWeight: true step.Analysis: true step.SetCanaryScale: true step.SetHeaderRoute: true step.SetMirrorRoutes: true": Step must have one of the following set: experiment, setWeight, setCanaryScale or pause

steps to reproduce

step 1

create argo rollout with following config.

apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: > {"apiVersion":"argoproj.io/v1alpha1","kind":"Rollout","metadata":{"annotations":{},"name":"example-rollout-vartical","namespace":"default"},"spec":{"minReadySeconds":30,"replicas":10,"revisionHistoryLimit":3,"selector":{"matchLabels":{"app":"nginx"}},"strategy":{"canary":{"maxSurge":"25%","maxUnavailable":0,"steps":[{"setWeight":10},{"pause":{"duration":"1h"}},{"setWeight":20},{"pause":{}}]}},"template":{"metadata":{"labels":{"app":"nginx"}},"spec":{"containers":[{"image":"nginx:1.15.4","name":"nginx","ports":null,"resources":{"limits":{"cpu":"500m","memory":"128Mi"},"requests":{"cpu":"250m","memory":"64Mi"}}}]}}}} rollout.argoproj.io/revision: '6' creationTimestamp: '2024-07-10T10:28:18Z' generation: 18 labels: k8slens-edit-resource-version: v1alpha1 managedFields:

apiVersion: argoproj.io/v1alpha1 fieldsType: FieldsV1 fieldsV1: f:spec: {} manager: kubectl-client-side-apply operation: Update time: '2024-07-10T10:28:18Z'
apiVersion: argoproj.io/v1alpha1 fieldsType: FieldsV1 fieldsV1: f:spec: {} manager: rollouts-controller operation: Update time: '2024-07-12T17:20:16Z'
apiVersion: argoproj.io/v1alpha1 fieldsType: FieldsV1 fieldsV1: f:spec: f:template: {} manager: OpenAPI-Generator operation: Update time: '2024-07-12T18:08:07Z'
apiVersion: argoproj.io/v1alpha1 fieldsType: FieldsV1 fieldsV1: f:spec: f:strategy: f:canary: .: {} f:steps: {} manager: node-fetch operation: Update time: '2024-07-12T18:10:57Z'
apiVersion: argoproj.io/v1alpha1 fieldsType: FieldsV1 fieldsV1: f:status: .: {} f:conditions: {} f:observedGeneration: {} f:phase: {} manager: rollouts-controller operation: Update subresource: status time: '2024-07-12T18:10:57Z' name: example-rollout-vartical namespace: default resourceVersion: '249246189' uid: 37f55c60-42bf-4163-b084-ecb01f453d89 selfLink: >- /apis/argoproj.io/v1alpha1/namespaces/default/rollouts/example-rollout-vartical status: HPAReplicas: 2 availableReplicas: 2 blueGreen: {} canary: {} conditions:
lastTransitionTime: '2024-07-12T17:20:48Z' lastUpdateTime: '2024-07-12T17:20:48Z' message: Rollout has minimum availability reason: AvailableReason status: 'True' type: Available
lastTransitionTime: '2024-07-12T17:21:43Z' lastUpdateTime: '2024-07-12T17:21:43Z' message: Rollout is paused reason: RolloutPaused status: 'False' type: Paused
lastTransitionTime: '2024-07-12T17:22:14Z' lastUpdateTime: '2024-07-12T17:22:14Z' message: RolloutCompleted reason: RolloutCompleted status: 'True' type: Completed
lastTransitionTime: '2024-07-12T18:10:57Z' lastUpdateTime: '2024-07-12T18:10:57Z' message: Rollout is healthy reason: RolloutHealthy status: 'True' type: Healthy
lastTransitionTime: '2024-07-12T17:21:43Z' lastUpdateTime: '2024-07-12T18:10:57Z' message: >- ReplicaSet "example-rollout-vartical-db494c4f7" has successfully progressed. reason: NewReplicaSetAvailable status: 'True' type: Progressing currentPodHash: db494c4f7 currentStepHash: 66b4c88d5 currentStepIndex: 4 observedGeneration: '18' phase: Healthy readyReplicas: 2 replicas: 2 restartedAt: '2024-07-12T17:15:20Z' selector: app=nginx stableRS: db494c4f7 updatedReplicas: 2 spec: minReadySeconds: 30 replicas: 2 restartAt: '2024-07-12T17:15:20Z' revisionHistoryLimit: 3 selector: matchLabels: app: nginx strategy: canary: maxSurge: 25% maxUnavailable: 0 steps:
- setWeight: 10
- pause: duration: 1h
- setWeight: 20
- pause: {} template: metadata: labels: app: nginx spec: containers:
- image: nginx:1.15.4 name: nginx resources: limits: cpu: '0.55' memory: 144179Ki

step 2

update the resource using update function in huraku using

resource.update()

mangglesh commented 1 month ago

haxsaw commented 1 month ago

HI there- would you be able to provide a bit more info please? Could you let me know if you're using either the hikaru metapackage or the separate hikaru-core and hikaru-models, and regardless of which could you tell me the versions involved?

Edited: After taking a bit more time I see that you have shown what you expect to happen; sorry for not having time for that earlier. I have an idea about what's going on and I'll investigate in that direction, but if what I'm thinking is the case then it isn't clear what can be done about it.

The gist of my suspicion is this: the clean dict function is geared towards making minimal acceptable dicts for K8s, and to do that there are rules that are followed that K8s seems to want that prune empty entries from a model and not represent them in dicts. For example, we don't send empty dicts around for a value that would take a dict, and this seems to agree with what K8s expects (it doesn't seem to like empty lists/dicts). This clearly isn't a hard-fast rule-- as you point out we send an empty dict as a list element. It kind of depends on the kind of thing we're marshalling into a dict.

If this is indeed the problem, then we may have a bit of an issue on our hands because loosening this up to benefit CRDs may actually break talking to regular K8s resources. I could look into developing multiple rules for handling empty entries based on whether we're marshalling a CRD or a regular resource, but doing that could break other CRDs that observe the practices that K8s uses.

It could be a little tricky and may require input from users. I'll have a look and a bit of a think and follow up in a bit.

haxsaw commented 1 month ago

So I had a peek into the code and first thing I read was the doc string, which states:

`

Turns an instance of a HikaruBase into a dict without values of None

This function returns a Python dict object that represents the hierarchy
of objects starting at ``obj`` and recusing into any nested objects.
The returned dict **does not** include any key/value pairs where the value
of the key is None or empty.

`

So yeah, there's our problem, and it's documented behavior, behavior that's meant to satisfy K8s. Since the last element in the list is a dict whose only key has an 'empty' value, we don't marshall the key out. Returnig an empty dict for the final list element is probably a gray area according to these rules, but I certainly see where/why it's happening (it actually is ignoring the empty dict and returns the default value which is an empty dict, which is what you're seeing).

Since this is documented behavior, this isn't so much a request for a bug fix but rather a feature request. It certainly can't (and probably shouldn't) be done automatically for CRDs as it might break the code of a lot of people who are playing by the existing rules here. I could introduce something like an optional argument, allows_empty, that causes get_clean_dict() to ignore these rules, but that may wind up biting people later if a CRD ever contains standard K8s objects-- they would no longer marshall out correctly. I'm open to ideas as to a solution, but it certainly won't be simply changing Hikaru to now allow None/empty values-- that'll break too many other things. Got any ideas?

haxsaw commented 2 days ago

Since no one has responded with a better idea and this is working as documented I'm going to close this out.

haxsaw / hikaru

argo rollout getting degraded after using update. #41

steps to reproduce

step 1

step 2