numaproj / numaplane

Control Plane for Numaproj
Apache License 2.0
8 stars 5 forks source link

Consider how to handle case that a pipeline cannot be drained before timeout #74

Closed xdevxy closed 3 weeks ago

xdevxy commented 4 months ago

Summary

In the case a pipeline cannot be drained before timeout, consider solutions how to update the pipeline without data loss. e.g. automatic replay from last successful processed offset.


Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

juliev0 commented 1 month ago

Decisions from meeting with @vigith, @whynowy, and @xdevxy:

In the case that the user selects not to time out:

juliev0 commented 1 month ago

We also need to consider the case of the Numaflow Controller updating and the isbsvc updating. If Pipelines are set not to time out, then the result is that the NumaflowControllerRollout and ISBServiceRollout will be considered "Progressing" in ArgoCD, and the Jenkins pipeline will time out. User can take same corrective actions with their pipelines as listed above.

juliev0 commented 4 weeks ago

I feel like the simplest thing we could do initially, which would take care of most needs, is to simply have configurability for the timeout value on the PavedRoad side. Either the user makes it somewhat short (~5 minutes), or they can make it something like 1 hour. If Pipeline doesn't drain within 1 hour, there is likely something broken. In this case, if there is not yet a capability to force the Pipeline to be reapplied, then the user would need to wait for 1 hour until their Pipeline will be reapplied and start running again.

juliev0 commented 3 weeks ago

I have opened a new issue to track the work on Numaplane Backend side here: https://github.com/numaproj/numaplane/issues/295

Therefore, closing this one.