Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
In the source_sender module, if the CHUNK_SIZE constant is larger than the buffer.max_events configured in Sink, can cause unexpected behavior. In the configuration, the sink's blackhole log output will remain at 0 until data equal to CHUNK_SIZE is consumed, rather than stopping once buffer.max_events is consumed.
Due to the existence of buffer in the source_sender, the buffer.when_full=block strategy will not function correctly. For example, with a file source, the source will read the file until the source_sender buffer is full (with a length of SOURCE_SENDER_BUFFER_SIZE = TRANSFORM_CONCURRENCY_LIMIT * CHUNK_SIZE), instead of respecting the length defined by the sink’s buffer.max_events.
If remove_after_secs is configured, it's possible for the file to be deleted before the data in both buffers has been fully consumed.
In addition to the impact of the source_sender implementation, the internal implementation of the file source also affects Acknowledgement Guarantees.
In the following code block, the checkpoints for file are updated in messages.map ahead of time, rather than after send_event_stream, let alone waiting for the sink acknowledgment.
A note for the community
Problem
In the
source_sender
module, if theCHUNK_SIZE
constant is larger than thebuffer.max_events
configured in Sink, can cause unexpected behavior. In the configuration, the sink'sblackhole
log output will remain at 0 until data equal toCHUNK_SIZE
is consumed, rather than stopping oncebuffer.max_events
is consumed.Due to the existence of buffer in the
source_sender
, thebuffer.when_full=block
strategy will not function correctly. For example, with afile
source, the source will read the file until thesource_sender
buffer is full (with a length ofSOURCE_SENDER_BUFFER_SIZE = TRANSFORM_CONCURRENCY_LIMIT * CHUNK_SIZE
), instead of respecting the length defined by the sink’sbuffer.max_events
.If
remove_after_secs
is configured, it's possible for the file to be deleted before the data in both buffers has been fully consumed.In addition to the impact of the
source_sender
implementation, the internal implementation of thefile
source also affects Acknowledgement Guarantees.In the following code block, the checkpoints for
file
are updated inmessages.map
ahead of time, rather than aftersend_event_stream
, let alone waiting for the sink acknowledgment.https://github.com/vectordotdev/vector/blob/c1da408b34fe29f8c949e18f08867066b080b2f5/src/sources/file.rs#L666-L693
Configuration
Version
0.37.0
Debug Output
No response
Example Data
No response
Additional Context
References
20816