vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.75k stars 1.57k forks source link

VRL aborted events don't cause E2E pipelines to NACK #17709

Open joeppeeters opened 1 year ago

joeppeeters commented 1 year ago

A note for the community

Problem

context: We're a SaaS company providing a software suite to our customers. In that suite 100+, multi-lingual apps generate logs to for auditing purposes. We're running a project to improve the quality of those logs (=validate presence of required fields). The intended solution is to deploy a vector instance alongside every app to collect and process those logs and ship them to storage. Using an E2E vector pipeline we can use VRL to do some basic validation on the logs to see if they meet the quality standards. Being able to NACK them would yield instant feedback to the publisher such that issues can be spotted early on in development.

bug: Events which are aborted by VRL do not seem to signal a NACK at the source.

expected behaviour: All events which enter an E2E pipeline should end up in the sink before getting ack'ed

Configuration

[acknowledgements]
# enable E2E acks
enabled = true

[sources.http]
type = "http_server"
address = "0.0.0.0:80"
    [sources.http.decoding]
    codec = "json"

[transforms.validate]
# VRL script to abort all events, unconditionally.
inputs = [ "http" ]
drop_on_abort = true
drop_on_error = true
type = "remap"
source = "abort"

[sinks.console]
inputs = [ "validate" ]
type = "console"
target = "stdout"

    [sinks.console.encoding]
    codec = "json"

Version

0.30.0

Debug Output

curl -X POST -i -d '{"key1":"value1","key2":"value2"}' http://localhost:80

HTTP/1.1 200 OK <--- unexpected 200 status
content-length: 0
date: Fri, 16 Jun 2023 14:34:53 GMT

Example Data

docker run -d -v $PWD/vector.toml:/etc/vector/vector.toml:ro -p 80:80 timberio/vector:0.30.0-debian

curl -X POST -i -d '{"key1":"value1","key2":"value2"}' http://localhost:80

Additional Context

No response

References

Discussion on Discord

joeppeeters commented 1 year ago

Just my 2cts on possible ways to address this:

neuronull commented 1 year ago

Just dropping a quote from the discord thread that was linked

it's something we are working at to get the UX right. A lot of the time when events are dropped we don't want it to be a NACK, for example with filter, dedepe or throttle transforms the drops are very intentional. There is a fairly long running RFC here https://github.com/vectordotdev/vector/blob/bruceg/discarded-events-rfc/rfcs/2022-08-25-12217-handling-discarded-events.md that I think could be a good starting point for the considerations we are contemplating.

neuronull commented 1 year ago

Noting that we had some discussion on this but didn't reach any conclusions yet.

The linked RFC should (if it doesn't) likely cover this subject of ack/nack behavior.

Wanted to also call out this option: https://vector.dev/docs/reference/configuration/transforms/remap/#reroute_dropped

, which creates a named output for the dropped events that can then be used downstream.

In general the current model we have with these three options drop_on_error, drop_on_abort and reroute_dropped is convoluted and the behavior changes a lot when these three are used in different combinations.

While we might have a resolution to the NACK concern raised in this issue through the work of that RFC, we should consider whether an intermediate change in the short term would make sense.