Describe the bug
Recently we introduced long-running stability tests https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/master/testbed/stabilitytests to catch performance issues in long running scenarios. Ideally we need to compare resources utilization based on initial measures, and verify that CPU and memory utilization doesn't grow more than values observed within fist couple of minutes. We can't apply this approach right now, because currently memory utilization doesn't behave as expected and is slowly growing during the long running tests with a fixed memory and CPU threshold.
from the triage mtg today, this issue looks really old and going to close it, but feel free to reopen if this is still a problem that needs to be addressed
Describe the bug Recently we introduced long-running stability tests https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/master/testbed/stabilitytests to catch performance issues in long running scenarios. Ideally we need to compare resources utilization based on initial measures, and verify that CPU and memory utilization doesn't grow more than values observed within fist couple of minutes. We can't apply this approach right now, because currently memory utilization doesn't behave as expected and is slowly growing during the long running tests with a fixed memory and CPU threshold.
The test results example: https://app.circleci.com/pipelines/github/open-telemetry/opentelemetry-collector-contrib/1649/workflows/97ee2b3f-a3df-4047-b2e9-455bdbcb178f/jobs/6233 Each CircleCI parallel test represent one test case pipeline. "Run stability tests" section shows memory and CPU usage over time. All of the existing pipelines have significant grow in memory utilization over time except for
TestStabilityTracesOTLP
where the memory utilization stabilizes in 10 minutes.Need to investigate if the growing memory utilization caused by a memory leak or something else.
Steps to reproduce Look at any stability test runs, ex. https://app.circleci.com/pipelines/github/open-telemetry/opentelemetry-collector-contrib/1649/workflows/97ee2b3f-a3df-4047-b2e9-455bdbcb178f/jobs/6233 "Run stability tests" section shows memory utilization growing over time for most of the test cases.
What did you expect to see? Memory utilization should stabilize within several minutes and don't grow above the reached value after that.
What did you see instead? Memory utilization keep growing over time.
What version did you use? Version:
v0.5.0
What config did you use? Default testbed config
Environment Docker image:
cimg/go:1.14