Open derekhuizhang opened 1 year ago
As a sanity check, could I get you to run the same test scenario, but with a blackhole
sink instead of prometheus_remote_write
. I have a couple of ideas here but would like to confirm we're chasing the right scenarios.
I've tested this before with blackhole. Blackhole leads to no metrics dropped (consistently high statsd received) and no memory leakage (memory stays low over days).
Okay, thanks for that. We'll take a look.
More context:
I have been using vector in the same pattern as describe above.
I have tried sources of both statsd
and datadog_agent
both ending up with increased memory usage. Using a datadog_agent
source does seem to have a slightly lower memory impact.
We are running vector version: v0.36.1
Observations:
Restarting the upstream source (i.e. whats sending statsd/datadog_agent message) appears to make the memory increase faster.
We do have expire_metrics_secs
enabled at a fairly low value (30) with no help.
The metrics for vector_utilization
for the prometheus_remote_write
sink appears to have continuous growth as we send more data.
A note for the community
Problem
The distribution metric type is not supported in prometheus. Publishing distribution metrics to a statsd endpoint that routes to prometheus remote write endpoint in memory steadily increasing over time and statsd metrics received dropping off a cliff over time, causing large amounts of metrics to drop.
I haven't tested with other sources so can't say if statsd is the only source that causes this behavior.
Steps to reproduce:
If you remove the abort statement in the remap config below, the memory usage over time will increase. With the abort statement, the memory usage will stay low.
Configuration
Version
0.24.0
Debug Output
No response
Example Data
No response
Additional Context
I specifically ran this on Kube with the stateless-aggregator helm chart, but it should have the same effect on Docker
References
No response