vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.49k stars 1.53k forks source link

Enable load shedding for `kubernetes_log` source #18784

Open mdbenjam opened 11 months ago

mdbenjam commented 11 months ago

A note for the community

Use Cases

When running Vector as a Kubernetes DaemonSet, a single pod that writes large quantities of logs can degrade performance for the logs from the other pods on a node, and eventually can lead to pods being evicted from the node due to disk pressure.

Vector holds on to file descriptors of log files that it hasn't finished processing. So if a pod generates more logs per second than Vector can parse then over time Vector will continue to hold on to file descriptors preventing rotated log files from being deleted. This can eventually exhaust the disk space on the node and cause pods to be evicted.

To prevent this, there needs to be some way for Vector to shed load. Ideally in an equitable way that sheds load from noisy pods first.

Attempted Solutions

No response

Proposal

One way to address this issue is to add a new max_open_rotated_files_per_pod configuration to the kubernetes_logs source. This would allow users to define the maximum number of files Vector could track for a given pod.

Example:

Given:

log_file <--- current file, pod `foo` is writing to this
log_file_1
log_file_2 <--- oldest file, Vector is currently reading from this

Now that Vector is tracking 3 files for pod foo, but max_open_rotated_files_per_pod is set to 2, Vector will stop tracking the oldest file, which will allow the system to remove it.

Caveats

This setting will lead to log loss, which should be called out in documentation. If added, a corresponding metric should be added to allow users to know how many log files are being left unread.

References

Version

vector 0.33.0

jcantrill commented 11 months ago

@syedriko please push an upstream patch for https://github.com/ViaQ/vector/pull/154 to jump start the discussion and move into the upstream

syedriko commented 11 months ago

@syedriko please push an upstream patch for ViaQ#154 to jump start the discussion and move into the upstream

Here it is: https://github.com/vectordotdev/vector/pull/18904

benjaminhuo commented 5 months ago

cc @wanjunlei