Open arielvalentin opened 3 months ago
After some discussion amongst maintainers, we want to provide a quick fix for this issue via the feature gate service.noopTracerProvider
, and further investigate what's the cause of this problem before deciding if there should be a permanent configuration option to set a no-op tracer provider.
Returning a noop tracer provider if no tracing is configured in the telemetry
settings would break the current behaviour, which the zpages extension relies on to provide telemetry about traces.
Was there a recent change in zpages results in an increase in the number spans being allocated?
Thanks, @arielvalentin, for taking this up and raising a GitHub issue.
@codeboten ~did this feature flag make it into v0.107.0?~
Yes it did https://github.com/open-telemetry/opentelemetry-collector/releases/tag/v0.107.0
I recently upgraded to v0.107.0 from v0.104.0. During off peak times performance is about on-par but during our high peak times our p99 skyrockets and memory spikes forcing the memory limiter to trigger hard limits responding with 503s.
I have enabled the feature gate to ensure we are using the noop-tracer but there still seems to be an increase in memory allocations and latency.
Our heavy usage of OTTL has not changed much between versions and as far as I can tell that is what appears the most in the profiles along with a high number of context objects being allocated by the noop-tracer.
Re-opening to continue the investigation
@arielvalentin Did the OTTL performance degrade, or are you saying it remained the majority?
The profiles revealed that in 104 seemed to show OTTL execute was where it spent most of its time and allocations but our performance was pretty good ~15K spans per second per collector. p99 19ms.
Other than seeing the increased allocations to context objects related to nooptracer, the profiles between 104 and 107 look about the same to me so I don't have a reason to believe there is a regression in OTTL performance.
I'll see if I can dig up my 104 profiles and attach them as well.
From @arielvalentin
I have enabled the feature gate to ensure we are using the noop-tracer but there still seems to be an increase in memory allocations and latency.
Other than seeing the increased allocations to context objects related to nooptracer, the profiles between 104 and 107 look about the same to me so I don't have a reason to believe there is a regression in OTTL performance.
So can be service.noopTracerProvider
considered as a workaround?
So can be
service.noopTracerProvider
considered as a workaround?
I would say "no" because
there still seems to be an increase in memory allocations and latency.
Investigating this further, this might be related to https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33508
I just ran load tests for v0.107.0 and here's what i have when i enabled the noop tracer provider feature gate:
Before using --feature-gates service.noopTracerProvider
With --feature-gates service.noopTracerProvider
This seems different than what you're seeing @arielvalentin, can you confirm that the feature gate is enabled?
Yes. We currently run it with --feature-gates service.noopTracerProvider
You can see in the profiles I attached the noopTracer is used:
Here are profiles for 104 profiles-104.tar.gz
thanks @arielvalentin, I missed that when i looked your attached profiles a couple of days ago. good to know that using the no-op tracer provider at least reduces the overall memory consumption, even if its still higher than before the change to add tracing to ottl.
No worries @codeboten
Something I may have not mentioned already, but my colleague @antonio had observed the latency regression as early as 105.
We may downgrade to 105 and collect profiles there as well.
Something I may have not mentioned already, but my colleague @antonio had observed the latency regression as early as 105.
That's correct. Notice how request duration went significantly up in the graph below, when I tested 105 back in July.
Investigating this further, this might be related to https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33508
The time of when this PR was merged does indeed align with when we started seeing the issue.
Just adding what I've found here real quick;
I've done a quick benchmark to see the difference between an actual tracer impl, vs noop tracer, vs all tracing removed in ottl via the transform processor. You can see the benchmark here. All it does is run the transform processor with a batch of 100 logs, with a simple set operation (which seems like a pretty normal workload).
These are the results:
goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/transformprocessor
cpu: Apple M3 Pro
│ no-tracing.txt │ noop-tracer.txt │ test-tracer.txt │
│ sec/op │ sec/op vs base │ sec/op vs base │
SetLogRecords-12 50.29µ ± 1% 120.55µ ± 4% +139.70% (p=0.002 n=6) 426.21µ ± 5% +747.43% (p=0.002 n=6)
│ no-tracing.txt │ noop-tracer.txt │ test-tracer.txt │
│ B/op │ B/op vs base │ B/op vs base │
SetLogRecords-12 96.21Ki ± 0% 227.45Ki ± 0% +136.42% (p=0.002 n=6) 758.38Ki ± 1% +688.28% (p=0.002 n=6)
│ no-tracing.txt │ noop-tracer.txt │ test-tracer.txt │
│ allocs/op │ allocs/op vs base │ allocs/op vs base │
SetLogRecords-12 1.402k ± 0% 3.802k ± 0% +171.18% (p=0.002 n=6) 9.802k ± 0% +599.14% (p=0.002 n=6)
You can see that even the noop tracer increases these benchmarks by quite a bit. And it adds up with what I'm seeing in both the issue I posted and here, which is that the overhead of tracing each individual evaluation of the StatementSequence seems like overkill.
Of course, if you do more processing in a single ottl statement, you'd probably get numbers that look a little less striking, so take the exact numbers here with a grain of salt. I don't think the scenario is all too unlikely, though.
It feels like there should not be tracing in StatementSequence at all, but maybe a few metric counters instead. The overhead of the tracing just doesn't seem worth it.
I have got some more for you here v0.107.4 profiles-20240828.tar.gz
cc: @songy23
I am going to submit a PR to revert the ottl tracing stuff.
I am going to submit a PR to revert the ottl tracing stuff.
How about placing it behind a feature gate?
Thanks again for investigating. I am on the path to downgrade to 0.104.0 and will be eagerly awaiting the next release.
Once it's out, I will deploy to a subset of our canary instances and report back profiles and latency info.
@bvsvas have you all had a chance to generate profiles?
IIRC you all were not using OTTL so the latency you all reported would not be related to the transformers right?
This is the snippet you posted in slack:
service:
extensions: [apmauth, pprof]
pipelines:
traces:
receivers: [otlp/withauth]
processors: [probabilistic_sampler]
exporters: [customexporter]
metrics:
receivers: [otlp/withauth, prometheus]
processors: [cumulativetodelta, resource]
exporters: [customexporter]
👋🏼 Is there a release schedule/project board handy I can subscribe to?
The release schedule is documented here: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/release.md#release-schedule
Describe the bug
This issue was originally reported in CNCF slack by Srinivas Bevara and confirmed by @codeboten who asked me to open this issue
After Upgrade to v0.106.1 from v0.104.0 we have run into issues where we see an increase in request latency being reported by both the collector's internal telemetry and our ingress proxies.
Memory remains unaffected however pprof is reporting garbage collection seems to have increased due internal telemetry tracing related allocations:
This is unexpected since tracing telemetry is disabled in our deployments.
Steps to reproduce
Run a load test comparing v0.104.0 and v0.106.1
What did you expect to see?
What did you see instead?
What version did you use?
v0.106.1
What config did you use?
Our configuration has many transformations and connector pipelines. The bare minimum you may likely need is the one you use for benchmarking.
Environment
Ubuntu 20.04.1
Additional context
Screen shots of increased p99 latency and CPU and steady memory