numaproj / numaplane

Control Plane for Numaproj
Apache License 2.0
8 stars 5 forks source link

USDE - if excludedPath is not already present/set in original spec, upgrade decision result will be incorrect #330

Open dpadhiar opened 1 week ago

dpadhiar commented 1 week ago

Describe the bug

If an excludedPath or path to a field does not currently exist in the original spec of a Rollout object, the USDE method ResourceNeedsUpdating will not return the strategy DirectApply like is expected and instead will return PauseAndDrain.

Note that excluding the parent field of a path does currently work.

To Reproduce Steps to reproduce the behavior:

  1. Create default NumaflowController, ISBService and PipelineRollouts
  2. Set an excludedPath for one of the objects ie:
    isbServiceSpecExcludedPaths: |
    - "jetstream.containerTemplate.resources.limits"
  3. Watch the Pipelines.
  4. Update the Rollout with a new value in the excludedPath. In this case, I updated the limits of the ISBServiceRollout.
  5. Note that the Pipeline pauses even though we are updating an excludedPath. Confirm by checking controller logs for upgrade decision result log statement.

Expected behavior

We expect the Pipeline to not pause and the resulting upgrade strategy to be DirectApply.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.

juliev0 commented 1 week ago

Sorry, @afugazzotto - I know you probably had this working perfectly before. Wondering if you can look.

I'm guessing this probably got broken when I removed the calls to: removeNullValuesFromMap(onlyPaths) removeNullValuesFromMap(withoutPaths) in this PR.

We need to figure out the best way to handle this.

Background: The reason I removed those calls is that we cannot remove all empty maps from a given spec because some of them are meaningful, e.g. if we take this Kafka field (which is a Pointer), then:

sink:
   kafka: {}
   a: b

is meaningfully different from:

sink:
   a: b

and Derek updated numaflow such that it no longer writes zero values.

(more information about the analysis that caused this change here)

juliev0 commented 1 week ago

Actually, I reassigned it to myself to look at first. I have a feeling it may be related to how we’re testing vs an actual bug - need to check that

juliev0 commented 1 week ago

Okay, I was able to reproduce this just now by doing what Dillen was originally doing to cause this I believe. I took his PR and updated the original isbsvc Rollout spec to comment out these lines:

isbServiceSpec = numaflowv1.InterStepBufferServiceSpec{
        Redis: nil,
        JetStream: &numaflowv1.JetStreamBufferService{
            Version: "2.9.6",
            Persistence: &numaflowv1.PersistenceStrategy{
                VolumeSize: &volSize,
            },
            /*ContainerTemplate: &numaflowv1.ContainerTemplate{
                Resources: v1.ResourceRequirements{
                    Limits: v1.ResourceList{v1.ResourceMemory: volSize},
                },
            },*/
        },
    }

The changed version is as it is in the PR:

ISBServiceSpecExcludedField = numaflowv1.InterStepBufferServiceSpec{
        Redis: nil,
        JetStream: &numaflowv1.JetStreamBufferService{
            Version: "2.9.8",
            Persistence: &numaflowv1.PersistenceStrategy{
                VolumeSize: &volSize,
            },
            ContainerTemplate: &numaflowv1.ContainerTemplate{
                Resources: v1.ResourceRequirements{
                    Limits: v1.ResourceList{v1.ResourceMemory: memLimit},
                },
            },
        },
    }

The excluded field is: "jetstream.containerTemplate.resources.limits"

The excerpt from log is this:

...."newSpecWithoutApplyPaths":{"jetstream":{"containerTemplate":{"resources":{}},"persistence":{"volumeSize":"10Mi"},"version":"2.9.8"}}
....."existingSpecWithoutApplyPaths":{"jetstream":{"persistence":{"volumeSize":"10Mi"},"version":"2.9.6"}}

{"level":"debug","isbservicerollout":{"Namespace":"numaplane-system","Name":"test-isbservice-rollout"},"userUpgradeStrategy":"pause-and-drain","newSpecWithoutApplyPaths":{"jetstream":{"containerTemplate":{"resources":{}},"persistence":{"volumeSize":"10Mi"},"version":"2.9.8"}},"existingSpecWithoutApplyPaths":{"jetstream":{"persistence":{"volumeSize":"10Mi"},"version":"2.9.6"}},"logger":"numaplane.controller-manager.isbservicerollout-reconciler","caller":"usde/usde.go:70","ts":"2024-10-08T21:04:08.714486525Z","msg":"the specs without the 'apply' paths are different"}

So, the issue is that the parent fields of our excluded field are still being included in the comparison.

juliev0 commented 1 week ago

Here are my thoughts on this: https://docs.google.com/document/d/1nbJ9qjK49JgOsiOxKqS4Foi_ph3JQCmbFi3FZTHDqeQ/edit

juliev0 commented 2 days ago

Hey @afugazzotto - we don't necessarily need to do this in Q1 but maybe you can take a look at this in early Q2 - I wrote up a document above. Let me know if it makes sense (fine to look at it later).

afugazzotto commented 2 days ago

@juliev0 I'll take a look and get back to you.

afugazzotto commented 2 days ago

@juliev0 after running some tests, I see that the comparison will always result in a difference in case a path is removed or not previously present since now we do not "cleanup" the object by removing null values (objects, arrays, etc.) previously done by removeNullValuesFromMap func. I added a few comments on your doc and I think option 2 may be the only possible one at least for the time being. I'll try to think of other possible options. We should further discuss and brainstorm.