Metrics memory leak v1.12.0/v0.35.0 and up

bthomee commented 1 year ago

Description

After upgrading from v1.11.2/v.0.34.0 to v1.12.0/v0.35.0 I have witnessed our pods to have a never-ending increase in memory consumption over time. Restoring to the former version brought stable memory use back. Once v1.13.0/v0.36.0 had been released I tried upgrading to that version, but got the same result. Once again, restoring to v1.11.2/v.0.34.0 stopped the memory increase.

Refer to the screenshot below that shows memory use of one of our pods on AWS EKS, specifically:

On 1/31/23 I upgraded to v1.12.0/v0.35.0 and instantly you can see the memory grow.
On 2/6/23 I made an unrelated release, which reset memory use but it immediately started growing again.
On 2/8/23 I downgraded to v1.11.2/v.0.34.0, after which memory use stayed stable again.
On 2/14/23 I upgraded to v1.13.0/v0.36.0, and once again saw memory increase. A few new releases made afterwards temporarily reset memory use, as before.
On 2/21/23 I downgraded v1.11.2/v.0.34.0, restoring the memory use.

Grafana

Environment

OS: Linux
Architecture: AWS EKS
Go Version: 1.19
opentelemetry-go version: v1.12.0/v0.35.0 and up

Steps To Reproduce

Upgrade to v1.12.0/v0.35.0 or higher.
Update metrics from syncint64 etc. to their corresponding instrument versions.
See memory grow over time.

Expected behavior

Memory stays stable.

MrAlias commented 1 year ago

What telemetry are you generating over these time periods? What instruments are you using and what attributes are you using?

bthomee commented 1 year ago

Every 5s a health/live check is performed, which uses an int64 counter (+1) and an int64 histogram (+duration). Each function call does the same, with differently named counters/histograms.

The non-health/live check functions add a few constant attributes, such as package name, but those functions are not currently called frequently - perhaps once a day. The health/live check function, which is the one contributing 99.9% of metrics, adds just the request path (either /health or /live) as attribute.

Aneurysm9 commented 1 year ago

Are you able to obtain and share pprof heap dumps from the process when memory use is elevated?

bthomee commented 1 year ago

I'll need a few days to find some time to enable pprof, update the package version, and capture the heap dumps - stay tuned.

bthomee commented 1 year ago

Over the course of some 6 hours I took 5 heap dumps using pprof, see attachment. In those 6 hours only the live and health checks were called every 5 seconds, which resulted in ~4 counters and ~histograms being updated with constant attributes (namely, a string containing the suffix of the URL - either /livez or /healthz - and a boolean whether the call was a success - which it always was).

There are some functions worth looking at it seems: prometheus.MakeLabelPairs and attribute.computeDistinctFixed.

heap.zip

MrAlias commented 1 year ago

Image of the above data using go tool pprof:

profile001

bthomee commented 1 year ago

I deployed a fresh instance and will let it run until Monday so I can collect a new sample to give you even more to work with. If there are other pprof extracts you're interested in let me know, and I'll see what I can do.

MrAlias commented 1 year ago

If you could get data with -inuse_objects as well that could be helpful.

MrAlias commented 1 year ago

My initial thoughts

We should look into if we can provide a reusable Sortable during the attribute set collection
We might need to double check we are correctly creating and recording prometheus data efficiently
The old SDK ability to record data with for a batch of attributes might need to be re-added (this is something Java kept)

I'm interested to see what the inuse_objects returns. That should help determine what things are allocated.

bthomee commented 1 year ago

I called the tool with inuse_objects followed by top to get the highest memory consumers, and got the following:

go tool pprof --inuse_objects https://[host]:[port]/debug/pprof/heap
Fetching profile over HTTP from https://[host]:[port]/debug/pprof/heap
/pprof.main_image.binary.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
File: main_image.binary
Type: inuse_objects
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 704254, 99.21% of 709865 total
Dropped 46 nodes (cum <= 3549)
Showing top 10 nodes out of 47
      flat  flat%   sum%        cum   cum%
    496994 70.01% 70.01%     496994 70.01%  github.com/prometheus/client_golang/prometheus.MakeLabelPairs
     76459 10.77% 80.78%      76459 10.77%  strings.(*Builder).grow
     32770  4.62% 85.40%      32770  4.62%  go.opentelemetry.io/otel/sdk/metric/internal.newBuckets
     32768  4.62% 90.02%      32768  4.62%  strconv.formatBits
     24588  3.46% 93.48%      24588  3.46%  go.opentelemetry.io/otel/attribute.computeDistinctFixed
     14044  1.98% 95.46%      14044  1.98%  github.com/prometheus/client_golang/prometheus.processMetric
     10923  1.54% 97.00%      10923  1.54%  github.com/prometheus/client_golang/prometheus.populateMetric
      6150  0.87% 97.86%       6150  0.87%  runtime.allocm
      5461  0.77% 98.63%       5461  0.77%  runtime.main
      4097  0.58% 99.21%       4097  0.58%  go.opentelemetry.io/otel/sdk/metric/internal.(*cumulativeHistogram[...]).Aggregation
(pprof)

bthomee commented 1 year ago

Another update:

Showing nodes accounting for 1653226, 98.38% of 1680435 total
Dropped 54 nodes (cum <= 8402)
Showing top 10 nodes out of 45
      flat  flat%   sum%        cum   cum%
    988529 58.83% 58.83%     988529 58.83%  github.com/prometheus/client_golang/prometheus.MakeLabelPairs
    316763 18.85% 77.68%     316763 18.85%  strings.(*Builder).grow
     73730  4.39% 82.06%      73730  4.39%  github.com/prometheus/client_golang/prometheus.(*constHistogram).Write
     65537  3.90% 85.96%      65537  3.90%  net/textproto.(*Reader).ReadMIMEHeader
     65536  3.90% 89.86%      65536  3.90%  strconv.formatBits
     45059  2.68% 92.54%      45059  2.68%  go.opentelemetry.io/otel/sdk/metric/internal.newBuckets
     41892  2.49% 95.04%      41892  2.49%  go.opentelemetry.io/otel/attribute.computeDistinctFixed
     23412  1.39% 96.43%      97145  5.78%  github.com/prometheus/client_golang/prometheus.processMetric
     21845  1.30% 97.73%      21845  1.30%  sort.Strings
     10923  0.65% 98.38%      10923  0.65%  github.com/prometheus/client_golang/prometheus.populateMetric

MrAlias commented 1 year ago

This seems to be indicating that the memory use is coming from the Prometheus client used by the Prometheus exporter. I'm not sure how much we can do to alleviate that given we do not control that code.

We should double check we are supporting as best we can the optimal path of label creation.

@dashpole is there anything that seems obvious to you?

dashpole commented 1 year ago

Are you able to provide the contents of curling the prometheus endpoint? Reading the implementation, I don't see anything that would obviously be the culprit for this.

bthomee commented 1 year ago

I think we're getting close to the solution. Curling the /metrics endpoint gave me "http_server" count and histogram metrics repeated thousands upon thousands of times (until I quit the command). See the attached text file where I grabbed the first 1000 records.

The main difference between each repeated block of the "http_server" metrics is in the attributes: The net _sock_peer_port keeps increasing. These metrics are not created by my app - they seem to be built-in.

metrics.txt

MrAlias commented 1 year ago

The peer port should not be on a server metric^1.

bthomee commented 1 year ago

What is your current suggestion in my situation? Hold off on updating until #3744 is fixed, or do I need to change something on my end?

MrAlias commented 1 year ago

You could create a view for the instrument that filters out the attribute. Otherwise, yeah, I think the fix to #3744 is in order to fully remove the high cardinality attribute.

bitomaxsp commented 1 year ago

@bthomee Would it be possible for you to provide setup code you use in your service which creates exporter and meter provider?

bthomee commented 1 year ago

The following extract shows how we set up the exporter and meter provider:

// Create the metric exporter.
logger.Info("Creating metric exporter.")
metricExporter, err := prometheus.New()
if err != nil {
  logger.Panics(err)
}
// Create a view that filters out a number of dynamic attributes.
filteredView := metric.NewView(
  metric.Instrument{Name: "http.server.*"},
  metric.Stream{
    AttributeFilter: HttpAttributeFilter,
  },
)
// Create a metric provider using the exporter and filtered view.
metricProvider := metric.NewMeterProvider(
  metric.WithReader(metricExporter),
  metric.WithView(filteredView),
)
// Register the meter provider, so elsewhere we can call `global.MeterProvider()` to access it.
global.SetMeterProvider(metricProvider)
// Wrap the registry by a handler, so that collected metrics can be exported via HTTP.
metricHandler := promhttp.Handler()

The new addition we made, based on @MrAlias' suggestion, is adding that filtered view, which has worked like a charm.

aboryslawski commented 1 year ago

Was this resolved in 1.17.0?

open-telemetry / opentelemetry-go