vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.13k stars 1.6k forks source link

Impact of CHUNK_SIZE on Acknowledgement Guarantees for All Sources #21714

Open linw1995 opened 1 week ago

linw1995 commented 1 week ago

A note for the community

Problem

In the source_sender module, if the CHUNK_SIZE constant is larger than the buffer.max_events configured in Sink, can cause unexpected behavior. In the configuration, the sink's blackhole log output will remain at 0 until data equal to CHUNK_SIZE is consumed, rather than stopping once buffer.max_events is consumed.

Due to the existence of buffer in the source_sender, the buffer.when_full=block strategy will not function correctly. For example, with a file source, the source will read the file until the source_sender buffer is full (with a length of SOURCE_SENDER_BUFFER_SIZE = TRANSFORM_CONCURRENCY_LIMIT * CHUNK_SIZE), instead of respecting the length defined by the sink’s buffer.max_events.

source -> source_sender(buffered) -fanout-> sinks(buffered) 

If remove_after_secs is configured, it's possible for the file to be deleted before the data in both buffers has been fully consumed.

In addition to the impact of the source_sender implementation, the internal implementation of the file source also affects Acknowledgement Guarantees.

In the following code block, the checkpoints for file are updated in messages.map ahead of time, rather than after send_event_stream, let alone waiting for the sink acknowledgment.

https://github.com/vectordotdev/vector/blob/c1da408b34fe29f8c949e18f08867066b080b2f5/src/sources/file.rs#L666-L693

Configuration

data_dir = "/usr/local/vector/data"

[acknowledgements]
enabled = true

[sources.small_log_files]
type = "file"
include = [ "/tmp/logs/*" ]
ignore_not_found = true
remove_after_secs = 10

[sinks.console]
type = "blackhole"
inputs = ["small_log_files"]
rate = 1
print_interval_secs = 1

[sinks.console.buffer]
type = "memory"
max_events = 1
when_full = "block"

Version

0.37.0

Debug Output

No response

Example Data

No response

Additional Context

References

pront commented 1 week ago

Hi @linw1995, this looks like a bug indeed. Thank you for providing all the details 👍