numaproj / numaplane

Control Plane for Numaproj
Apache License 2.0
4 stars 3 forks source link

fix: fixed logic to run certain operations even when the spec is unchanged #56

Closed afugazzotto closed 1 week ago

afugazzotto commented 1 week ago

Modifications

Operations such as pipeline pausing and status update based on pipeline and isbservice status need to be performed even if the specs are unchanged based on hash comparison.

Verification

Unit tests and manually checking via ArgoCD and simulating a pipeline pause.

NOTE: we cannot test by checking that the ChildResourcesHealthy condition is True because it is based on the pipeline status which will never be healthy in the unit tests since there is no numaflow-controller running. This could be checked in e2e tests.

afugazzotto commented 1 week ago

@juliev0 @xdevxy I made some changes to the spec hashing and annotation setting code based on your feedback to try and resolve all of the concerns. Please, take a look again and let me know what you think. Thanks

juliev0 commented 1 week ago

Thank you for the adjustments @afugazzotto!

Unfortunately, and I am so sorry for this - I think I didn’t think this whole annotation thing through before. I am realizing that given that we want to do auto-healing, we want to overwrite the pipeline/ISBservice if somebody hand modifies it (like through kubectl apply). This means that we ultimately want to compare a. The spec we intend to apply (including “pause” designation) to b. the spec currently on the cluster. (We should vet that.) Therefore, there may be no value to storing the hash after all. :(

cc @xdevxy @dpadhiar @a49ibrahim

xdevxy commented 1 week ago

Thank you for the adjustments @afugazzotto!

Unfortunately, and I am so sorry for this - I think I didn’t think this whole annotation thing through before. I am realizing that given that we want to do auto-healing, we want to overwrite the pipeline/ISBservice if somebody hand modifies it (like through kubectl apply). This means that we ultimately want to compare a. The spec we intend to apply (including “pause” designation) to b. the spec currently on the cluster. (We should vet that.) Therefore, there may be no value to storing the hash after all. :(

cc @xdevxy @dpadhiar @a49ibrahim

We can discuss this when we implement pipeline draining with concurrent concerns addressed. One way is to ignore the desired state in the spec when comparing hash and have dedicated logic to control the desired lifecycle of pipeline.

juliev0 commented 1 week ago

One way is to ignore the desired state in the spec when comparing hash and have dedicated logic to control the desired lifecycle of pipeline.

Can you clarify what you mean by "ignore the desired state in the spec"? You mean use the hash only for the purpose of auto-healing? (to compare the last applied state to what's running on the cluster?)

It seems to me that having the hash at all may be inefficient if we need to ultimately compare the desired state to what's running, but I could be wrong. Please do document the design you have in mind when you can.

xdevxy commented 1 week ago

One way is to ignore the desired state in the spec when comparing hash and have dedicated logic to control the desired lifecycle of pipeline.

Can you clarify what you mean by "ignore the desired state in the spec"? You mean use the hash only for the purpose of auto-healing? (to compare the last applied state to what's running on the cluster?)

It seems to me that having the hash at all may be inefficient if we need to ultimately compare the desired state to what's running, but I could be wrong. Please do document the design you have in mind when you can.

What I mean is we probably do no even want desired lifecycle to be included in the pipelinerollout spec but only pipeline spec. It's a good point we need to also consider auto healing logic, for it we can directly compare between Pipeline spec in live cluster vs the desired PipelineRollout spec after the first hash comparison. @afugazzotto is working on that for auto healing task.

afugazzotto commented 1 week ago

@juliev0 here is the issue I'm working on https://github.com/numaproj/numaplane/issues/64 It has more details on the idea to have both auto healing and hash of spec.

juliev0 commented 1 week ago

64

thank you. I have responded over there.