numaproj / numaflow

Kubernetes-native platform to run massively parallel data/streaming jobs
https://numaflow.numaproj.io/
Apache License 2.0
1.1k stars 112 forks source link

DLQ for Sink #1660

Closed vigith closed 6 months ago

vigith commented 6 months ago

Problem

DLQ is already available out-of-the-box in Numaflow (using conditional forwarding) except for the Sink vertex. In the Sink vertex, the user must "manually" write to a DLQ if the Sink is down; otherwise, the backpressure caused will back-propagate and stop reading from the source. This is a significant concern when users have multiple Sinks where they expect writes to happen even if there is a failing Sink (else a failing Sink can cause the entire pipeline to stall).

Approaches

Multiple Container for fallback

e.g.,

    - name: out
      sink:
        # A simple log printing sink
        log: {}
        fallback:
          - foo
    - name: kafka-out
      sink:
        udsink:
          container:
            image: my-sink:latest
        fallback:
          kafka:
            brokers:
              - my-broker1:19700
              - my-broker2:19700
            topic: my-topic

Conditional Forwarding at Sink level

DLQ is unavailable only in the Sink because it is a terminal state, and no forwarding happens after the Sink. We could extend the conditional-forwarding and let a Sink forward to another Sink.

Use Cases


Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

### Core
- [ ] https://github.com/numaproj/numaflow/pull/1662
- [ ] https://github.com/numaproj/numaflow/pull/1664
- [ ] https://github.com/numaproj/numaflow/pull/1667
- [ ] https://github.com/numaproj/numaflow/issues/1668
- [ ] https://github.com/numaproj/numaflow/issues/1682
- [x] add examples
- [x] documentation
### SDK
- [ ] https://github.com/numaproj/numaflow-go/pull/120
- [ ] https://github.com/numaproj/numaflow-java/pull/116
- [ ] https://github.com/numaproj/numaflow-python/issues/156
whynowy commented 6 months ago

@yhl25 - be aware we need to have a different sock file and server-info file for the fallback container.

whynowy commented 6 months ago

@yhl25 - be aware we need to have a different sock file and server-info file for the fallback container.

We can populate a builtin ENV for all the UD containers to indicate the container type.