Open mx-psi opened 1 year ago
cc @dmitryax @TylerHelmuth
@mx-psi @dmitryax i think we should implement this via a feature-gate-like config.
The first PR can add the option to use the Go setting instead of the extension, but disabled by default. After some time, we can enable it by default, but still allow users who want to use the extension to disable it. Then we'd wait until the extension is deprecated to remove the option.
i think we should implement this via a feature-gate-like config.
Have we done anything similar before on the Collector Helm chart? The plan sounds reasonable to me
@TylerHelmuth should this be a usual configuration option or is there an alternative mechanism in the Helm chart for temporary toggles?
We dont have anything like feature gates in the chart, we normally do it via a configuration option (or 2).
Question: what are we planning to do we do when request limits are not set?
ATM we default memorybalast to (https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/_config.tpl#L25C1-L27C1) :
memory_ballast:
size_in_percentage: 40
and requests / limits to empty:
resources: {}
# resources:
# limits:
# cpu: 250m
# memory: 512Mi
I think the only thing we can do is not set the GOMEMLIMIT
env?
This would be a different behaviour from current one.
I think the only thing we can do is not set the GOMEMLIMIT env?
I am personally fine with that, if you explicitly unset resource limits then you should be on your own to configure this
I have started work on this
How much should we wait until we enable the GOMEMLIMIT option by default? Is something like one month enough?
@mx-psi is this helm chart feature critical for the decision on whether or not to deprecate the extension or has the decision already been made to deprecate it?
@mx-psi is this helm chart feature critical for the decision on whether or not to deprecate the extension or has the decision already been made to deprecate it?
I would want to validate the approach with the Helm chart before committing to deprecate the extension. I don't feel like I personally have enough knowledge about the Go garbage collector to ensure this replacement will be satisfactory for all use cases (it probably will, that's why I pushed for this, but I want to be sure before deprecating the extension).
If anybody else feels like this is clear cut, then we don't have to wait for this, but personally I feel like this is safer.
In that case I think we need to leave it around as long as it take to make a decision. We won't move forward with turning it always on until we're sure we are deprecating the extension.
I can enable it on our internal use of the collector, but we'll need other examples as well.
@JaredTan95 based on #912 it seems like you are testing this out. Would you mind sharing any findings about the setting (memory and CPU usage patterns before/after?)
I also posted a call out for more testing on a few channels on the CNCF Slack. Once we have at least two I think we can proceed with the deprecation and enable this by default.
We have been running Collectors using the Helm chart with useGOMEMLIMIT enabled for a few weeks on Datadog's infrastructure with a similar CPU usage as with the memory ballast and lower memory consumption (because of the memory ballast).
Can you share some stats/numbers on the improvements?
We see similar stats at Honeycomb
@JaredTan95 based on #912 it seems like you are testing this out. Would you mind sharing any findings about the setting (memory and CPU usage patterns before/after?)
I also posted a call out for more testing on a few channels on the CNCF Slack. Once we have at least two I think we can proceed with the deprecation and enable this by default.
We switched to this(useGOMEMLIMIT) configuration because of an issue with memory ballast
's compatibility with our arm architecture. We haven't verified this configuration in production, but in our test environment, this feature solves the arm compatibility problem and the effect is similar to that of the original memory ballast
.
So, I support it.
@mx-psi I'd still like to see the extension officially deprecated in Core before we make this a default value in the helm chart.
@JaredTan95 can you please share more details about memory ballast's compatibility issues with our arm architecture? It be nice to post them in https://github.com/open-telemetry/opentelemetry-collector/issues/8343 as well
Based on the discussion on https://github.com/open-telemetry/opentelemetry-collector/issues/8343 and this issue, I filed open-telemetry/opentelemetry-collector#8803 to deprecate the memory ballast extension.
I'd prefer the next step before deprecating the memory_ballast
extension to be switching the useGOMEMLIMIT
feature gate to true in the helm chart. That way we are not forced to do it and will have a chance to get some more feedback before the actual deprecation. I'd think we should keep useGOMEMLIMIT: true
for a couple of versions, then deprecate memory_ballast
extension. @mx-psi @JaredTan95 @povilasv @TylerHelmuth WDYT?
I am fine with that, although I think @TylerHelmuth wanted to go the other way around :smile: Up to the Helm chart maintainers I guess :)
Ya I was imagining after the initial testing that the helm chart would follow the guidance from Core. My though being that I didn't want to disrupt users until the component was actually deprecated.
But we could enable it by default and use helm chart users as testers (unless they switch it back to false).
But we could enable it by default and use helm chart users as testers (unless they switch it back to false).
That's what I thought. I wanted to gather as much feedback as possible before the deprecation. I'm thinking of useGOMEMLIMIT
as an improvement anyway
I'd think we should keep useGOMEMLIMIT: true for a couple of versions, then deprecate memory_ballast extension
I agree! 👏
@JaredTan95 can you please share more details about memory ballast's compatibility issues with our arm architecture? It be nice to post them in open-telemetry/opentelemetry-collector#8343 as well
The compatibility problem we encountered was only tested on the Chinese operating system(Kylin OS), so I think it has little reference value for users using international operating systems, so I did not put it up :-P
open-telemetry/opentelemetry-collector/pull/8803 got merged, we can revert before Jan 8th if we find issues
We want to remove the
memory_ballast
extension in favor of Go 1.19+GOMEMLIMIT
environment variable as stated in open-telemetry/opentelemetry-collector/issues/8343.The first step to verify that we can effectively remove the extension is to change this on the OpenTelemetry Collector Helm chart. We can replace the extension by a value of
GOMEMLIMIT
that is something like 80-90% of the total memory limit for the Collector container. Once we release this change and wait some time to gather feedback, we can move on with deprecation of the memory ballast extension.