numaproj / numaplane

Control Plane for Numaproj
Apache License 2.0
8 stars 4 forks source link

Determine how to handle ambiguous Child health conditions to determine if reconciliation is still happening (PPND) #239

Open juliev0 opened 1 week ago

juliev0 commented 1 week ago

Describe the bug PipelineRollout, ISBServiceRollout, and NumaflowController Rollout need to know whether the child resources of the Pipeline and ISBSvc, and whether the Numaflow Controller Deployment are done Progressing so they can safely unpause pipelines.

Pipeline and ISBsvc have Conditions which can indicate if their children are progressing: Condition.Reason=Progressing.

However, there are certain cases in which the Condition is set to false but it was ambiguous as to whether it was a case of "Progressing" or just general failure, so the Condition was not marked as "Progressing", namely I see this one:

{Type:DaemonServiceHealthy Status:False Reason:GetDaemonServiceFailed Message:Deployment not found, might be still under creation}

Message from the maintainers:

Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.

juliev0 commented 1 week ago

@chandankumar4 would you mind checking if there are other cases where we are saying some other reason than "Progressing", but it could be a progressing case?

juliev0 commented 1 week ago

Since some Conditions will inevitably be ambiguous, I think the way I will deal with it instead is this: once we are setting inProgressStrategy to "PPND", then I will concern myself with all unhealthy Child Conditions and not worry about their particular "Reason". I will make sure we're pausing if there's still unhealthiness during the process of PPND and not outside of it.