Open ShahroZafar opened 1 month ago
Does Vector immediately run into the limit? Or does it look like it is increasing over time? One thing I can think of is setting https://vector.dev/docs/reference/configuration/global-options/#expire_metrics_secs in case it is the internal telemetry that is causing an a runway increase in memory.
A note for the community
Problem
We see an issue where a some of vector instances are getting OOMkilled. The cluster in which the vector is running can have more than 3000 nodes. On the pod that got OOMkilled, at any point in time, its reading about 10 to 15 files. The log rotations is 10Mb x 5 files for each pod. Out of these 3 are .gz files which are excluded from reading by vector. We are using kubernetes_logs source, dedoting the keys using remap transform and pushing to kafka.
The memory request and limit are set to 750Mi each. Also after increasing the memory limit, we were able to make it work but the memory usage seems higher on 1 of the pod where the rate of incoming logs is about 150 messages / sec. Running vector on a node to read logs of 1 pod only, we are able to get very high performance of about reading 6000 messages / sec with memory usage of 100Mi.
The maximum value of
vector_open_files
in the cluster is20
Configuration
Version
0.39.0
Debug Output
No response
Example Data
No response
Additional Context
No response
References
No response