open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.88k stars 2.25k forks source link

Understanding high memory usage of otelcol (prometheusreceiver / prometheusexporter) #9998

Closed Mario-Hofstaetter closed 2 years ago

Mario-Hofstaetter commented 2 years ago

Describe the bug

I am infrequently observing high memory usage of otelcol.exe on Windows Systems. The memory is however within the set bounds of memory_limiter (see config below). But I am not sure if this memory usage is intended behavior, since there is negligible load on the systems. I am not experienced in golang so please excuse my lack of knowledge.

My planned actions to counteract this behavior is reduce the configured limit_mib and fix the memory_ballast.

Steps to reproduce

?? Full config see below.

What did you expect to see?

otelcol should not consume "much" memory given the workload is little, and/or memory should eventually be garbage collected?

What did you see instead?

otelcol process uses (?) up to 2GB of memory. Currently on my local machine here it sits at ~ 1.5GB. Last stdout log message from memory_limiter after machine was started from hibernation:

{"level":"info","ts":1652033885.9450645,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"traces","cur_mem_mib":197}
{"level":"info","ts":1652033885.8394392,"caller":"memorylimiterprocessor/memorylimiter.go:310","msg":"Memory usage is above soft limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"traces","cur_mem_mib":1623}

Is this the intended behavior of memory usage of the process?

What are recommended values for memory_limiter ? The documentation is a little vague.

On my machine, the prometheus exporter currently emits ~ 19362 lines of metrics, less than 3 MB in size. On our biggest instance, the prometheus exporter has 37654 metric lines , ~ 8MB. Is this a lot?

The documentation on that page uses limit_mib: 4000 , which seems kinda HUGE for this kind of application?

After re-reading those docs just now, this line caught my attension:

limit_mib (default = 0): Maximum amount of memory, in MiB, targeted to be allocated by the process heap. Note that typically the total memory usage of process will be about 50MiB higher than this value. This defines the hard limit.

So it actually is the expected behavor of otelcol to stay around limit_mib indefinetely?

My trace queue size is currently at zero:

# HELP otelcol_exporter_queue_size Current size of the retry queue (in batches)
# TYPE otelcol_exporter_queue_size gauge
otelcol_exporter_queue_size{exporter="jaeger",service_instance_id="fc4ff71e-1a39-4b13-9990-db1ec97e9c9b",service_version="latest"} 0

I had the suspicion the trace buffer was filling memory when VPN was disconnected at jaeger server not reachable, since the default queue size of 5000 (?) seemed rather high.

Looking at logs from last night, it has been the traces responsible for causing the memory increase:

2022-05-07T01:22:01+02:00   {"level":"info","ts":1651879321.2295015,"caller":"memorylimiterprocessor/memorylimiter.go:303","msg":"Memory usage back within limits. Resuming normal operation.","kind":"processor","name":"memory_limiter","pipeline":"traces","cur_mem_mib":182}
2022-05-07T01:22:01+02:00   {"level":"info","ts":1651879321.2295015,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"traces","cur_mem_mib":182}
2022-05-07T01:20:19+02:00   {"level":"info","ts":1651879219.4642153,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"traces","cur_mem_mib":1665}
2022-05-07T01:18:37+02:00   {"level":"info","ts":1651879117.4491055,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"traces","cur_mem_mib":1661}
2022-05-07T01:16:55+02:00   {"level":"info","ts":1651879015.4828515,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"traces","cur_mem_mib":1664}
2022-05-07T01:15:30+02:00   {"level":"info","ts":1651878930.5143886,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"traces","cur_mem_mib":1683}
2022-05-07T01:14:05+02:00   {"level":"info","ts":1651878845.4196262,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"traces","cur_mem_mib":1683}
2022-05-07T01:22:01+02:00   {"level":"warn","ts":1651879321.100273,"caller":"memorylimiterprocessor/memorylimiter.go:291","msg":"Memory usage is above hard limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"traces","cur_mem_mib":2059}
2022-05-07T01:20:19+02:00   {"level":"warn","ts":1651879219.098343,"caller":"memorylimiterprocessor/memorylimiter.go:291","msg":"Memory usage is above hard limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"traces","cur_mem_mib":2049}
2022-05-07T01:18:37+02:00   {"level":"warn","ts":1651879117.0985374,"caller":"memorylimiterprocessor/memorylimiter.go:291","msg":"Memory usage is above hard limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"traces","cur_mem_mib":2048}
2022-05-07T01:16:55+02:00   {"level":"warn","ts":1651879015.103772,"caller":"memorylimiterprocessor/memorylimiter.go:291","msg":"Memory usage is above hard limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"traces","cur_mem_mib":2012}
2022-05-07T01:15:30+02:00   {"level":"warn","ts":1651878930.103912,"caller":"memorylimiterprocessor/memorylimiter.go:291","msg":"Memory usage is above hard limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"traces","cur_mem_mib":2005}
2022-05-07T01:14:05+02:00   {"level":"warn","ts":1651878845.099606,"caller":"memorylimiterprocessor/memorylimiter.go:291","msg":"Memory usage is above hard limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"traces","cur_mem_mib":2061}

If the sending_queue queue_size of 5000 does not fit within memory limits, what is going to happen? Oldest trace spans are going to get dropped from queue?

I did however also have on machine where otelcol used 2GB of memory suddenly, and no traces where being queued. No warnings at the time, unfortunately I have not info logs from that date (otelcol was still windows application event logging)

grafik

There are no apps emitting traces on that machine yet, so no Idea what has happend there.

I have collected and attached various debug information from http://localhost:1777/debug/pprof/, in case the memory usage is not okay.

What version did you use?

.\otelcol.exe --version
otelcol version **0.48.0**

What config did you use?

I am using two config files (currently 3 two debug) consisting of the following parts:

ConfigFile 1 (common and metrics):

exporters:
  prometheus:
    endpoint: 0.0.0.0:7299
    metric_expiration: 5m  # default = 5m
    send_timestamps: true

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: localmetrics
          scrape_interval: 17s

          tls_config:
            insecure_skip_verify: true

          static_configs:
            - targets: [localhost:8888] # Self diagnostic metrics of otelcol
              labels:
                from_app: otelcol
          file_sd_configs:
            - files:
              - "C:/Program Files/OpenTelemetry/OTEL Collector/metric-targets/*.yaml"

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 2000
  batch:

extensions:
  health_check:
    endpoint: localhost:13133
  zpages:
    endpoint: localhost:42424
  memory_ballast:
    size_mib: 256

service:
  extensions: [health_check , zpages]
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [memory_limiter , batch]
      exporters: [prometheus]

  # Otel-Collector Self-Diagnostics
  telemetry:
    logs:
      level: info
      encoding: json
      output_paths:       ["stdout"]
      error_output_paths: ["stderr"]
    metrics:
      address: localhost:8888

I just noticed an error in my config... the memory_ballast extension is not used it seems..

Second File used optionally (if traces are required):

exporters:
  jaeger:
    endpoint: "${AX_MONITORING_SERVER}:14250"
    tls:
      insecure: true

receivers:
  jaeger:
    protocols:
      thrift_compact:
        endpoint: localhost:6831
  otlp:
    protocols:
      grpc:
        endpoint: localhost:4317
      http:
        endpoint: localhost:4318

service:
  pipelines:
    traces:
      receivers: [otlp , jaeger]
      processors: [memory_limiter , batch]
      exporters: [jaeger]

Third config file currently used to add pprof:

extensions:
  pprof:
    endpoint: "127.0.0.1:1777"
    block_profile_fraction: 3
    mutex_profile_fraction: 5
service:
  extensions: [pprof,health_check,zpages]

Environment OS: Windows 10 21H2, Windows Server 2019

Mario-Hofstaetter commented 2 years ago

After changing config of your biggest instance to:

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 768
  batch:

extensions:
  memory_ballast:
    size_mib: 512

and it is still GC'ing in a short interval:

2022-05-08T22:40:03+02:00   {"level":"info","ts":1652042403.575912,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":159}
2022-05-08T22:40:03+02:00   {"level":"info","ts":1652042403.5041053,"caller":"memorylimiterprocessor/memorylimiter.go:310","msg":"Memory usage is above soft limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":702}
2022-05-08T22:39:30+02:00   {"level":"info","ts":1652042370.570825,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":163}
2022-05-08T22:39:30+02:00   {"level":"info","ts":1652042370.489251,"caller":"memorylimiterprocessor/memorylimiter.go:310","msg":"Memory usage is above soft limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":641}
2022-05-08T22:38:56+02:00   {"level":"info","ts":1652042336.586582,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":166}
2022-05-08T22:38:56+02:00   {"level":"info","ts":1652042336.5032008,"caller":"memorylimiterprocessor/memorylimiter.go:310","msg":"Memory usage is above soft limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":680}
2022-05-08T22:38:21+02:00   {"level":"info","ts":1652042301.5798883,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":186}
2022-05-08T22:38:21+02:00   {"level":"info","ts":1652042301.5080392,"caller":"memorylimiterprocessor/memorylimiter.go:310","msg":"Memory usage is above soft limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":677}
2022-05-08T22:37:48+02:00   {"level":"info","ts":1652042268.5550425,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":158}
2022-05-08T22:37:48+02:00   {"level":"info","ts":1652042268.492242,"caller":"memorylimiterprocessor/memorylimiter.go:310","msg":"Memory usage is above soft limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":645}
2022-05-08T22:37:14+02:00   {"level":"info","ts":1652042234.574366,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":159}
2022-05-08T22:37:14+02:00   {"level":"info","ts":1652042234.4942825,"caller":"memorylimiterprocessor/memorylimiter.go:310","msg":"Memory usage is above soft limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":643}
2022-05-08T22:36:40+02:00   {"level":"info","ts":1652042200.5750558,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":159}
2022-05-08T22:36:40+02:00   {"level":"info","ts":1652042200.501883,"caller":"memorylimiterprocessor/memorylimiter.go:310","msg":"Memory usage is above soft limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":714}
2022-05-08T22:36:06+02:00   {"level":"info","ts":1652042166.5794554,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":169}
2022-05-08T22:36:06+02:00   {"level":"info","ts":1652042166.4941897,"caller":"memorylimiterprocessor/memorylimiter.go:310","msg":"Memory usage is above soft limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":655}
2022-05-08T22:35:32+02:00   {"level":"info","ts":1652042132.5813339,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":160}
2022-05-08T22:35:32+02:00   {"level":"info","ts":1652042132.4957755,"caller":"memorylimiterprocessor/memorylimiter.go:310","msg":"Memory usage is above soft limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":619}
2022-05-08T22:35:06+02:00   {"level":"info","ts":1652042106.5584857,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":170}
2022-05-08T22:35:06+02:00   {"level":"info","ts":1652042106.4969077,"caller":"memorylimiterprocessor/memorylimiter.go:310","msg":"Memory usage is above soft limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":632}
2022-05-08T22:34:33+02:00   {"level":"info","ts":1652042073.5752532,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":148}
2022-05-08T22:34:33+02:00   {"level":"info","ts":1652042073.4907186,"caller":"memorylimiterprocessor/memorylimiter.go:310","msg":"Memory usage is above soft limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":620}
2022-05-08T22:33:59+02:00   {"level":"info","ts":1652042039.5686462,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":135}
2022-05-08T22:33:59+02:00   {"level":"info","ts":1652042039.4970694,"caller":"memorylimiterprocessor/memorylimiter.go:310","msg":"Memory usage is above soft limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":621}
2022-05-08T22:33:24+02:00   {"level":"info","ts":1652042004.566677,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":143}
2022-05-08T22:33:24+02:00   {"level":"info","ts":1652042004.5086606,"caller":"memorylimiterprocessor/memorylimiter.go:310","msg":"Memory usage is above soft limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":631}
2022-05-08T22:32:50+02:00   {"level":"info","ts":1652041970.5390697,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":147}
2022-05-08T22:32:50+02:00   {"level":"info","ts":1652041970.490576,"caller":"memorylimiterprocessor/memorylimiter.go:310","msg":"Memory usage is above soft limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":668}
2022-05-08T22:32:10+02:00   {"level":"info","ts":1652041930.5783658,"caller":"memorylimiterprocessor/memorylimiter.go:281","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":112}
2022-05-08T22:32:10+02:00   {"level":"info","ts":1652041930.5014062,"caller":"memorylimiterprocessor/memorylimiter.go:310","msg":"Memory usage is above soft limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":663}

Is this normal behavior? IMHO 700MB of memory is a lot for scraping 8 MB of metrics.

Edit 2022-05-12

Been playing with memory limiter the last couple of days and still anxious about memory usage.

Currently running this config (CLICK ME) ```yaml exporters: prometheus: endpoint: 0.0.0.0:7299 metric_expiration: 5m # default = 5m send_timestamps: true receivers: prometheus: config: scrape_configs: - job_name: localmetrics scrape_interval: 17s tls_config: insecure_skip_verify: true static_configs: - targets: [localhost:8888] # Self diagnostic metrics of otelcol labels: app: otelcol file_sd_configs: - files: - "C:/Program Files/OpenTelemetry/OTEL Collector/metric-targets/*.yaml" processors: memory_limiter: check_interval: 1s limit_mib: 512 batch: extensions: health_check: endpoint: localhost:13133 memory_ballast: size_mib: 256 pprof: endpoint: "127.0.0.1:1777" zpages: endpoint: localhost:42424 service: extensions: [health_check,memory_ballast,pprof,zpages] pipelines: metrics: receivers: [prometheus] processors: [memory_limiter,batch] exporters: [prometheus] # Otel-Collector Self-Diagnostics telemetry: logs: level: info encoding: json output_paths: ["stdout"] error_output_paths: ["stderr"] metrics: address: localhost:8888 ```

using only metrics pipeline and a imho reasonable amount of metrics, the logs of otelcol are full of info and warn messages showing GC actions, and even Dropping data, while consume > 600 MB of memory?

Currently the prometheus exporter exposes: 34377 lines of metrics, 4.39 MB. 600 MB of memory seems too much for this.

What am I doing wrong?

pprof outputs

grafik

Since github does not accept .txt or .gz at the moment, link to tar.gz with all files from /pprof: https://1drv.ms/u/s!AnvnX1Qo7mIHj6VrpUxcn95c8Ue7Tw?e=DfQS6J

mx-psi commented 2 years ago

cc @dashpole @Aneurysm9

Note that this happens with v0.48.0 and v0.51.0, so both before and after fixing #9278

Mario-Hofstaetter commented 2 years ago

Update: Currently running 0.51.0 locally using the following config.

otelcol config yaml (CLICK ME) ```yaml exporters: prometheus: endpoint: 0.0.0.0:7299 logging: receivers: prometheus: config: scrape_configs: - job_name: localmetrics scrape_interval: 10s tls_config: insecure_skip_verify: true static_configs: - targets: [localhost:8888] # Self diagnostic metrics of otelcol labels: app: otelcol file_sd_configs: - files: - "C:/Program Files/OpenTelemetry/OTEL Collector/metric-targets/*.yaml" processors: memory_limiter: check_interval: 1s limit_mib: 2048 service: pipelines: metrics: receivers: [prometheus] processors: [memory_limiter] exporters: [logging] telemetry: # Otel-Collector Self-Diagnostics metrics: address: localhost:8888 ```

Got rid of all processors (except memory_limiter, extensions, and the prometheus exporter). scrape_interval: 10s is now relatively low, otelcol is basically only logging to console and discarding all metrics.

Doing this, the process currently sits at around ~ 780 MB of memory, slowly increasing it seems: grafik

Log output is like this, two targets currently are unavailable:

2022-05-13T12:13:59.163+0200    INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 20}
2022-05-13T12:14:02.857+0200    warn    internal/otlp_metricsbuilder.go:161     Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "scrape_timestamp": 1652436840494, "target_labels": "map[__name__:up app:AX.Process.AxCommunicationClients.PLCDevices instance:localhost:18087 instance_origin:localhost:18087 job:localmetrics]"}
2022-05-13T12:14:02.857+0200    INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 5}
2022-05-13T12:14:03.955+0200    INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 199}
2022-05-13T12:14:06.596+0200    INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 199}
2022-05-13T12:14:07.731+0200    INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 199}
2022-05-13T12:14:08.421+0200    INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 138}
2022-05-13T12:14:08.840+0200    warn    internal/otlp_metricsbuilder.go:161     Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "scrape_timestamp": 1652436846490, "target_labels": "map[__name__:up app:AX.Server.Service instance:localhost:18086 instance_origin:localhost:18086 job:localmetrics]"}
2022-05-13T12:14:08.840+0200    INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 5}
2022-05-13T12:14:09.155+0200    INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 20}
2022-05-13T12:14:12.840+0200    warn    internal/otlp_metricsbuilder.go:161     Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "scrape_timestamp": 1652436850489, "target_labels": "map[__name__:up app:AX.Process.AxCommunicationClients.PLCDevices instance:localhost:18087 instance_origin:localhost:18087 job:localmetrics]"}
2022-05-13T12:14:12.840+0200    INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 5}
2022-05-13T12:14:13.936+0200    INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 199}
2022-05-13T12:14:16.556+0200    INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 199}
2022-05-13T12:14:17.668+0200    INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 199}
2022-05-13T12:14:18.424+0200    INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 138}
2022-05-13T12:14:18.857+0200    warn    internal/otlp_metricsbuilder.go:161     Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "scrape_timestamp": 1652436856492, "target_labels": "map[__name__:up app:AX.Server.Service instance:localhost:18086 instance_origin:localhost:18086 job:localmetrics]"}
2022-05-13T12:14:18.858+0200    INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 5}
2022-05-13T12:14:19.165+0200    INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 20}
2022-05-13T12:14:22.862+0200    warn    internal/otlp_metricsbuilder.go:161     Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "scrape_timestamp": 1652436860493, "target_labels": "map[__name__:up app:AX.Process.AxCommunicationClients.PLCDevices instance:localhost:18087 instance_origin:localhost:18087 job:localmetrics]"}
2022-05-13T12:14:22.862+0200    INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 5}
2022-05-13T12:14:23.951+0200    INFO    loggingexporter/logging_exporter.go:56  MetricsExporter {"#metrics": 199}

Things I will / could try next:

Failing scrape targets like not the cause of memory increase

Adding a few dozen failing prometheus scrape targets, while removing some working endpoints, did not increase memory usage but rather has lowered it.

grafik

Mario-Hofstaetter commented 2 years ago

@dashpole Regarding https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/10546#issuecomment-1151113059

I'll give Release 0.53.0 a try and see if it improves memory consumption for my metrics usage.

dashpole commented 2 years ago

The benchmarking I did (admittedly, a while ago) for the prometheus receiver was ~22 MB at idle + ~6KB / series at a 60s scrape interval. Assuming a 10s interval increases memory usage by 6x (which probably is an overestimate), that would predict ~ 1.4GB of total usage in your case, which isn't too far off.

A few things that would be helpful to try:

Mario-Hofstaetter commented 2 years ago

Thank you for the suggestions @dashpole

Assuming a 10s interval increases memory usage by 6x (which probably is an overestimate)

But why though? Shouldn't all previous samples be thrown away after a new successful scrape of targets? Only the most recent datapoints are exposed on the (prometheus-) exporter? or is there a bigger buffer interally because other exporters (!= prometheus) may export a history of last X values? 👀

get a baseline for comparison using the Prometheus server (in agent mode

Good idea.

Try out your setup on linux

Thats not possible / irrelevant unfortunately, because (a) our environments are heavily on the windows side, and (b) the actual endpoints are only reachable on localhost (firewalls)

dashpole commented 2 years ago

But why though? Shouldn't all previous samples be thrown away after a new successful scrape of targets? Only the most recent datapoints are exposed on the (prometheus-) exporter? or is there a bigger buffer interally because other exporters (!= prometheus) may export a history of last X values? 👀

Other than the batch processor, + sending queue, nothing should be holding onto multiple scrapes of metrics, so you are probably right that lowering the scrape interval shouldn't matter too much. You could try a higher interval to see if it makes a big difference

Mario-Hofstaetter commented 2 years ago

You could try a higher interval to see if it makes a big difference

Will do. Also, how about not using batch processor altogether? Not sure why I have not yet tested that..

I configured it because it is recommended, its running with the default settings, so that should be

send_batch_size (default = 8192): Number of spans, metric data points, or log records after which a batch will be sent regardless of the timeout.
timeout (default = 200ms): Time duration after which a batch will be sent regardless of size.
send_batch_max_size (default = 0): The upper limit of the batch size. 0 means no upper limit of the batch size. This property ensures that larger batches are split into smaller units. It must be greater or equal to send_batch_size.

I am unsure if it makes much sense for metrics pipeline if that is strictly using prometheus receiver and exporter? send_batch_size of 8192 will be exceeded on every scrape for our bigger targets, timeout 200ms seems not much.

Or maybe our BIG metric endpoints are the problem because a single scrape of the receiver is exceeding the 8192 metric points? I will also try setting that to like 100000 so that one scrape of our BIGGEST endpoints fits within one batch, if that makes any sense.

dashpole commented 2 years ago

Also, how about not using batch processor altogether?

Definitely worth a try

I am unsure if it makes much sense for metrics pipeline if that is strictly using prometheus receiver and exporter?

Agreed.

Mario-Hofstaetter commented 2 years ago

First observations.

After updating from 0.48.0 to 0.53.0, memory usage on our server with most metrics dropped noticeable but not by a huge amount, from memory_rss mean ~ 780 MiB to about 686 MiB mean, other memory metrics changed similarly.

rss memory

working set private bytes

private bytes

working set bytes

Removing the batch processor on the same machine had negligible effect ( < 30 MiB difference)

removed batch

Other tests will follow.

newly12 commented 2 years ago

We also observed memory leak from prometheus receiver. For example, following dashboard is for one instance that's pulling the kube state metrics for a k8s cluster, metrics pipeline are configured as this

        metrics:
          receivers: [ prometheus ]
          processors: [ attributes/drop_labels, batch/metrics ]
          exporters: [ nop ]

image

here are the heap profiles taken for running for 6 hours and ~1day.

image

update: otel collector in above case was running the latest prometheus receiver(cb16d48cf34b486aafa6aafe367208beac160665), which imported prometheus v0.36.2, to compare to how prometheus is working, I've started prometheus agent running v2.36.0 with following config

    scrape_configs:
      - job_name: 'prometheus'
        metrics_path: /metrics
        scrape_interval: 120s
        scrape_timeout: 90s
        static_configs:
          - targets: [ 'kube-state-metrics.kube-system:8080' ]

no memory leak observed for more than 12 hrs.

image
jpkrohling commented 2 years ago

First of all: when @Mario-Hofstaetter says there's a memory leak, I pay attention. I still have nightmares about https://github.com/jaegertracing/jaeger/issues/2638

Reading through this, I'm not sure yet there's a leak: apparently, the memory usage increases up to the threshold of the memory limiter, and stabilizes there. I have seen something similar in Jaeger in the past and found out that Go won't release the memory to the OS despite not using it anymore (GODEBUG=madvdontneed=1 used to cause some effect, not sure it's still the case in newer Go versions). Your previous profiling data didn't show anything of interest to me, just that the biggest offender seems to be the memory ballast extension, which wasn't surprising. Is there anything in the pprof data that would suggest that we have indeed a leak?

Mario-Hofstaetter commented 2 years ago

@jpkrohling sorry for letting this issue go stale.

From what I have seen so far, it does not entirely look like a memory leak, as memory usage does not increase indefinitely, or the memory limiter component is preventing the leak by its hard limit (?).

I have not yet tested prometheus in agent mode as "baseline" for memory usage for our metrics environment.

At the moment I barely have any time to contribute to this unfortunately topic I am afraid. It looks like @newly12 is also more skilled in providing insights.

newly12 commented 2 years ago

I had changes to disable the metrics adjuster, the metrics adjuster basically sets the startTimestamp for metrics, in order to do so, it caches 2 copies(initial, previous) of every metric, as the startTimestamp is only defined in open telemetry model but not prometheus model, given use cases like scrape prometheus metrics and publish metrics to a prometheus-model storage(via prometheus remote write exporter), I think it is a fair ask to provide an option to disable it due to significant memory consumption and potential leak? After the change the memory use is pretty stable.

image

Mario-Hofstaetter commented 2 years ago

@newly12 That's the memory usage of otelcol?! 👀 How many metric series are scraped on that instance?

newly12 commented 2 years ago

@Mario-Hofstaetter ~5M metrics, prometheus agent consumes ~25G memory as well.

dashpole commented 2 years ago

@newly12 Thats quite a dramatic difference. I'm definitely supportive of being able to disable start time tracking in that case. cc @Aneurysm9.

Given that, I think we should consider disabling it by default in the future, or moving it to a separate processor or to exporterhelper. Speaking from Google's perspective, we've had to reimplement start time tracking in our exporter regardless, since not all receivers follow the spec for handling unknown start time that the prom receiver implements.

Mario-Hofstaetter commented 2 years ago

Still not entirely sure what we are hitting, maybe its the start time calculation too. Have been running two configurations for some days now, memory was stable, and just today when activity started on the servers memory usage again made a considerable jump.

grafik

Prometheus agent mode

I am running prometheus (2.36.2) in agent mode too an the servers with the same scrape configuration (but no working remote_write target, dunno if that matters), and it generally had a little lower memory consumation and remained stable today:

grafik

Server 002 is running still 0.48.0 barebones (no extensions, no ballast, no batch)

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 2048
  batch:

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [memory_limiter]
      exporters: [prometheus]

Server 001 runs currently 0.54.0 with:

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 2048
  batch:

service:
  extensions: [health_check,zpages]
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [memory_limiter,batch]
      exporters: [prometheus]

Exporters config on both is still:

exporters:
  prometheus:
    endpoint: 0.0.0.0:7299
    metric_expiration: 5m  # default = 5m
    send_timestamps: true

I suspected batch processor, pprof, zpages ... so tried different combinations so far... the jump on server 002 ruled out all suspects so far (leaving the start time calculation?).

cardinality

Looking at the time series count over time today on these servers, a (moderate) increase of unique series is visible, but not quite matching the jump in memory at ~ 13:00 and ~ 13:50 Local Time.

grafik

So my current thesis is an increase in metric series count (cardinality) can lead to a sudden increase in memory usage (?).

Things I could still try out?

dashpole commented 2 years ago

send_timestamps probably won't have an impact. Lowering metric_expiration might have an impact, but make sure it is at least as high as your scrape interval (preferably at least twice your scrape interval).

Mario-Hofstaetter commented 2 years ago

Updated to the fresh 0.55.0 release on both servers after it came at and made some more config changes. Did not seem to matter.

It looks like if the metrics of our application change (due to activity == new series because new label variants and/or application restart), memory may make a jump.

So at this point it seems useless to try more, and maybe wait for #12215

Current plan is to use minimal config, no ballast, no batch, no pprof, and set a reasonable memory_limit. That should keep memory usage within acceptable bounds.


Memory on Server 002 is a bit lower, which may be due to metric_expiration: 4m instead of 5m, and/or this instance has a bit lower metric series count.

grafik

prometheus agent memory did not change again.

grafik

Things left to try out:

Full config YAML Server 001 <<<<< No extensions / processors (beside memory_limiter) are configured under `service` so I hope those settings don't do anything. ```yaml exporters: prometheus: endpoint: 0.0.0.0:7299 metric_expiration: 5m # default = 5m send_timestamps: true receivers: prometheus: config: scrape_configs: - job_name: localmetrics scrape_interval: 17s tls_config: insecure_skip_verify: true static_configs: - targets: [localhost:8888] # Self diagnostic metrics of otelcol labels: app: otelcol file_sd_configs: - files: - "C:/Program Files/OpenTelemetry/OTEL Collector/metric-targets/*.yaml" processors: memory_limiter: check_interval: 1s limit_mib: 1500 batch: extensions: health_check: endpoint: localhost:13133 memory_ballast: size_mib: 256 pprof: endpoint: "127.0.0.1:1777" zpages: endpoint: localhost:42424 service: pipelines: metrics: receivers: [prometheus] processors: [memory_limiter] exporters: [prometheus] # Otel-Collector Self-Diagnostics telemetry: logs: level: info encoding: json output_paths: ["stdout"] error_output_paths: ["stderr"] metrics: address: localhost:8888 ```
Full config YAML Server 002 <<<<< - Removed `send_timestamps` in prometheus exporter - Set `metric_expiration` to `4m` ```yaml exporters: prometheus: endpoint: 0.0.0.0:7299 metric_expiration: 4m # default = 5m receivers: prometheus: config: scrape_configs: - job_name: localmetrics scrape_interval: 17s tls_config: insecure_skip_verify: true static_configs: - targets: [localhost:8888] # Self diagnostic metrics of otelcol labels: app: otelcol file_sd_configs: - files: - "C:/Program Files/OpenTelemetry/OTEL Collector/metric-targets/*.yaml" processors: memory_limiter: check_interval: 1s limit_mib: 1500 batch: extensions: health_check: endpoint: localhost:13133 memory_ballast: size_mib: 256 pprof: endpoint: "127.0.0.1:1777" zpages: endpoint: localhost:42424 service: pipelines: metrics: receivers: [prometheus] processors: [memory_limiter] exporters: [prometheus] # Otel-Collector Self-Diagnostics telemetry: logs: level: info encoding: json output_paths: ["stdout"] error_output_paths: ["stderr"] metrics: address: localhost:8888 ```
RalphSu commented 2 years ago

make a build from https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/12215 and try it out? This is showing pretty stable in our env.

Mario-Hofstaetter commented 2 years ago

make a build from #12215 and try it out? This is showing pretty stable in our env.

@RalphSu maybe on the weekend, gotta learn how to build a go app / otelcol first 🙈

jpkrohling commented 2 years ago

hint: use the OpenTelemetry Collector Builder -- https://github.com/open-telemetry/opentelemetry-collector/tree/main/cmd/builder

Mario-Hofstaetter commented 2 years ago

@jpkrohling I tried building using the Collector Builder but failed miserably. I am sorry I am unfamiliar with go tooling..

I like to try out disable_start_time because meanwhile I know how to provoke the memory increase.

Whats the error in this config? Thanks..

exporters:
  - gomod: "github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter v0.55.0"

receivers:
  - gomod: "github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver main"

processors:
  - import: go.opentelemetry.io/collector/processor/memorylimiterprocessor
    gomod: go.opentelemetry.io/collector v0.55.0

replaces:
  # a list of "replaces" directives that will be part of the resulting go.mod
  - github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver => github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver main

Running with

.\ocb_0.55.0_windows_amd64.exe --config=./otelcol-builder.yaml --output-path=./tmp/

I am getting different variants of this error:

Error: failed to update go.mod: exit status 1. Output: "go: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod:
module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver
        but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver"

Using

receivers:
  - gomod: "github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver main"
    import: "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver"

makes the error list only longer.

Compile Errors when using import ```text Error: failed to update go.mod: exit status 1. Output: "go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports golang.org/x/sys/windows/svc: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter imports github.com/open-telemetry/opentelemetry-collector-contrib/pkg/resourcetotelemetry: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter imports github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator/prometheus: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter imports github.com/prometheus/client_golang/prometheus: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter imports github.com/prometheus/client_golang/prometheus/promhttp: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter imports github.com/prometheus/common/model: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter imports go.opentelemetry.io/collector/pdata/pcommon: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter imports go.opentelemetry.io/collector/pdata/pmetric: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter imports go.opentelemetry.io/collector/semconv/v1.6.1: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter imports go.uber.org/zap: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/component imports go.opentelemetry.io/otel/metric: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/component imports go.opentelemetry.io/otel/trace: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/processor/memorylimiterprocessor imports go.opentelemetry.io/collector/pdata/plog: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/processor/memorylimiterprocessor imports go.opentelemetry.io/collector/pdata/ptrace: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/processor/memorylimiterprocessor imports go.uber.org/atomic: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports contrib.go.opencensus.io/exporter/prometheus: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports github.com/google/uuid: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports github.com/spf13/cobra: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opencensus.io/metric: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opencensus.io/metric/metricproducer: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opencensus.io/stats/view: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/collector/semconv/v1.5.0: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/otel/exporters/prometheus: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/otel/metric/nonrecording: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/otel/sdk/metric/aggregator/histogram: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/otel/sdk/metric/controller/basic: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/otel/sdk/metric/export/aggregation: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/otel/sdk/metric/processor/basic: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/otel/sdk/metric/selector/simple: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/otel/sdk/trace: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.uber.org/multierr: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.uber.org/zap/zapcore: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports golang.org/x/sys/windows/svc/eventlog: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder tested by go.opentelemetry.io/collector/cmd/builder.test imports github.com/stretchr/testify/assert: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter imports go.opentelemetry.io/collector/exporter/exporterhelper imports github.com/cenkalti/backoff/v4: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter imports go.opentelemetry.io/collector/exporter/exporterhelper imports go.opencensus.io/metric/metricdata: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter imports go.opentelemetry.io/collector/exporter/exporterhelper imports go.opentelemetry.io/otel/attribute: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter tested by github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.test imports github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal/testdata: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter tested by github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.test imports github.com/prometheus/client_model/go: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter tested by github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.test imports github.com/prometheus/prometheus/config: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter tested by github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.test imports github.com/stretchr/testify/require: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter tested by github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.test imports gopkg.in/yaml.v2: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/processor/memorylimiterprocessor imports go.opentelemetry.io/collector/internal/iruntime imports github.com/shirou/gopsutil/v3/mem: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/processor/memorylimiterprocessor imports go.opentelemetry.io/collector/obsreport imports go.opencensus.io/stats: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/processor/memorylimiterprocessor imports go.opentelemetry.io/collector/obsreport imports go.opencensus.io/tag: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/processor/memorylimiterprocessor imports go.opentelemetry.io/collector/obsreport imports go.opentelemetry.io/otel/codes: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/collector/confmap imports github.com/knadh/koanf: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/collector/confmap imports github.com/knadh/koanf/maps: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/collector/confmap imports github.com/knadh/koanf/providers/confmap: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/collector/confmap imports github.com/mitchellh/mapstructure: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/collector/confmap/converter/overwritepropertiesconverter imports github.com/magiconair/properties: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/collector/service/internal/telemetry imports github.com/shirou/gopsutil/v3/process: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/collector/service/internal/telemetrylogs imports go.uber.org/zap/zapgrpc: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/collector/service/internal/telemetrylogs imports google.golang.org/grpc/grpclog: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service tested by go.opentelemetry.io/collector/service.test imports github.com/prometheus/common/expfmt: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter imports go.opentelemetry.io/collector/exporter/exporterhelper tested by go.opentelemetry.io/collector/exporter/exporterhelper.test imports go.opentelemetry.io/otel: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter imports go.opentelemetry.io/collector/exporter/exporterhelper tested by go.opentelemetry.io/collector/exporter/exporterhelper.test imports go.opentelemetry.io/otel/sdk/trace/tracetest: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/collector/confmap tested by go.opentelemetry.io/collector/confmap.test imports gopkg.in/yaml.v3: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service imports go.opentelemetry.io/collector/service/internal/pipelines tested by go.opentelemetry.io/collector/service/internal/pipelines.test imports go.uber.org/zap/zaptest/observer: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver go.opentelemetry.io/collector/cmd/builder imports go.opentelemetry.io/collector/service tested by go.opentelemetry.io/collector/service.test imports go.opentelemetry.io/collector/extension/zpagesextension imports go.opentelemetry.io/contrib/zpages: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.44.1-0.20220716201014-d4e2edcf6ea1: parsing go.mod: module declares its path as: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver but was required as: github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver " ```

Flipping the replaces statement got the compile to work, but I guess I was not using the Fork then because starting using disable_start_time: true failed.

May have time on the weekend to go read into golang documentation.

Mario-Hofstaetter commented 2 years ago

I ran some more tests (with 0.55.0) after realizing what actions cause new metric series which then cause memory increase.

Within 30 minutes otelcol process goes from <400 MiB to > 2 GiB of memory (see below). After restarting otelcol, memory is again below 400 MiB.

I could provid pprof dumps of the different states, if anyone would be interessed, or wait until #12215 is merged or I get the compile working...

Is it probable that disable_start_time will fix this behavior or is there something else going on in our metrics? @newly12 @dashpole

(sorry for being bothersome in this issue)


memory_limiter is currently running with limit_mib: 3000 due to these tests.

[>>>> Full Otelcol config yaml running on this machine <<<<] ```yaml exporters: prometheus: endpoint: 0.0.0.0:7299 metric_expiration: 1m # default = 5m receivers: prometheus: config: scrape_configs: - job_name: localmetrics scrape_interval: 17s tls_config: insecure_skip_verify: true static_configs: - targets: [localhost:8888] # Self diagnostic metrics of otelcol labels: app: otelcol file_sd_configs: - files: - "C:/Program Files/OpenTelemetry/OTEL Collector/metric-targets/*.yaml" processors: memory_limiter: check_interval: 1s limit_mib: 3000 service: pipelines: metrics: receivers: [prometheus] processors: [memory_limiter] exporters: [prometheus] telemetry: logs: level: info encoding: json output_paths: ["stdout"] error_output_paths: ["stderr"] metrics: address: localhost:8888 ```

After restarting our apps and otelcol I ran about 880 "tasks" in our app which resulted in the addition of some metrics:

Also some windows_exporter metrics come and go as process_ids change. Together thats a few thousand increase in metric count over ~ 30 minutes. I exported the prometheus exporter output every minuten in a textfile:

Time Metrics bytes Metrics lines otelcol_process_memory_rss ~
2022-07-19 13:22:59 11476774 44499 336 MiB
2022-07-19 13:45:08 12925711 50688 1.26 GiB
2022-07-19 14:08:25 14027869 54462 2.22 GiB

grafik

grafik

grafik

grafik

grafik

grafik

dashpole commented 2 years ago

pprof dumps would be useful with or without disable start time set. Thanks for your investigation!

Mario-Hofstaetter commented 2 years ago

pprof dumps would be useful with or without disable start time set. Thanks for your investigation!

@dashpole for disable_start_time I would need assistence regarding https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/9998#issuecomment-1189033814

I have repeated my test running pprof in default settings and saved everything I found 3 times:

pprof1_2022-07-19_1632.zip

pprof2_2022-07-19_1645.zip

pprof3_2022-07-19_1702.zip

Please let me know if this helps. I could also provide our raw metrics (in the prometheus text format), which could simplify running tests?

grafik

grafik

EDIT:

Since I do not fully understand the memory_limiter / Garbage Collection interaction, I re-ran the test with

  memory_limiter:
    check_interval: 1s
    limit_mib: 1000

The process now peaks at around 1250 MiB, but the receiver is not happy (data dropped due to high memory usage) and fails to scrape its own telemetry metrics.

So the memory_limiter is no suitable solution, if not using very high limits, or not at all.

{"level":"warn","ts":1658245285.2995648,"caller":"memorylimiterprocessor/memorylimiter.go:309","msg":"Memory usage is above soft limit. Dropping data.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":875}  
{"level":"info","ts":1658245279.445666,"caller":"memorylimiterprocessor/memorylimiter.go:295","msg":"Memory usage back within limits. Resuming normal operation.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":771}
{"level":"info","ts":1658245279.445666,"caller":"memorylimiterprocessor/memorylimiter.go:273","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":771}
{"level":"warn","ts":1658245279.2954862,"caller":"memorylimiterprocessor/memorylimiter.go:283","msg":"Memory usage is above hard limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":1069}
{"level":"error","ts":1658245278.462892,"caller":"scrape/scrape.go:1273","msg":"Scrape commit failed","kind":"receiver","name":"prometheus","pipeline":"metrics","scrape_pool":"localmetrics","target":"http://localhost:18087/metrics","error":"data dropped due to high memory usage","stacktrace":"github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1273\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1342\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).run\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1224"}
{"level":"error","ts":1658245278.429332,"caller":"scrape/scrape.go:1273","msg":"Scrape commit failed","kind":"receiver","name":"prometheus","pipeline":"metrics","scrape_pool":"localmetrics","target":"http://localhost:9080/metrics","error":"data dropped due to high memory usage","stacktrace":"github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1273\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1342\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).run\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1224"}
{"level":"error","ts":1658245273.3404717,"caller":"scrape/scrape.go:1273","msg":"Scrape commit failed","kind":"receiver","name":"prometheus","pipeline":"metrics","scrape_pool":"localmetrics","target":"http://localhost:9182/metrics","error":"data dropped due to high memory usage","stacktrace":"github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1273\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1342\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).run\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1224"}
{"level":"error","ts":1658245272.5216439,"caller":"scrape/scrape.go:1273","msg":"Scrape commit failed","kind":"receiver","name":"prometheus","pipeline":"metrics","scrape_pool":"localmetrics","target":"http://localhost:8888/metrics","error":"data dropped due to high memory usage","stacktrace":"github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1273\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1342\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).run\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1224"}
{"level":"warn","ts":1658245267.2991486,"caller":"memorylimiterprocessor/memorylimiter.go:309","msg":"Memory usage is above soft limit. Dropping data.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":814}
{"level":"info","ts":1658245261.4467704,"caller":"memorylimiterprocessor/memorylimiter.go:295","msg":"Memory usage back within limits. Resuming normal operation.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":795}
{"level":"info","ts":1658245261.4467704,"caller":"memorylimiterprocessor/memorylimiter.go:273","msg":"Memory usage after GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":795}
{"level":"error","ts":1658245261.4117634,"caller":"scrape/scrape.go:1273","msg":"Scrape commit failed","kind":"receiver","name":"prometheus","pipeline":"metrics","scrape_pool":"localmetrics","target":"http://localhost:9080/metrics","error":"data dropped due to high memory usage","stacktrace":"github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1273\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1342\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).run\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1224"}
{"level":"error","ts":1658245261.365305,"caller":"scrape/scrape.go:1273","msg":"Scrape commit failed","kind":"receiver","name":"prometheus","pipeline":"metrics","scrape_pool":"localmetrics","target":"http://localhost:18087/metrics","error":"data dropped due to high memory usage","stacktrace":"github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1273\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1342\ngithub.com/prometheus/prometheus/scrape.(*scrapeLoop).run\n\tgithub.com/prometheus/prometheus@v0.36.2/scrape/scrape.go:1224"}
{"level":"warn","ts":1658245261.2909968,"caller":"memorylimiterprocessor/memorylimiter.go:283","msg":"Memory usage is above hard limit. Forcing a GC.","kind":"processor","name":"memory_limiter","pipeline":"metrics","cur_mem_mib":1023}
holograph commented 2 years ago

I've been tracking this for a couple of days and we're seeing the exact same behavior here; our instance is running in K8s so getting pprof dumps would be a major PITA, so I just wanted to chime in and thank @Mario-Hofstaetter profusely for putting in the work 👍

jpkrohling commented 2 years ago

@dashpole, will you look into this, or should I put it on my queue?

dashpole commented 2 years ago

If you have time, you are welcome to look into it. I have a strange dev setup, and was having trouble opening the pprof profiles earlier

jpkrohling commented 2 years ago

If you have time

I have a few other items on my queue, but I think this might take precedence. Given that this seems related to the metrics part, you (or @gouthamve?) would probably find the problem faster than me, but if you can't, I can give it a try.

newly12 commented 2 years ago

@Mario-Hofstaetter please try this builder config.

exporters:
  - gomod: "github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter v0.55.0"

receivers:
  - gomod: "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.55.0"

processors:
  - import: go.opentelemetry.io/collector/processor/memorylimiterprocessor
    gomod: go.opentelemetry.io/collector v0.55.0

replaces:
  # a list of "replaces" directives that will be part of the resulting go.mod
  - github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver => github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver prom_receiver_mem
Mario-Hofstaetter commented 2 years ago

TL;DR: Using disable_start_time: true looks promising

Memory stayed < 400 MiB while running the actions in our application which caused new metrics. Memory usage of otelcol is now below prometheus in agent mode.

Memory footprint could also be especially low because of the custom build only containing used components? Will re-run the test with the custom binary but disable_start_time on false to double check.

Thank you @newly12 for the builder config, what a fail using the wrong branch in the replace for the fork.

Also thank you so much for your fix. If this proves to be stable without issues, we can run the custom build and solve this issue that troubled us for months now.

>>> Used builder config including pprof <<<< ```yaml exporters: - gomod: "github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter v0.55.0" receivers: - gomod: "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.55.0" processors: - import: go.opentelemetry.io/collector/processor/memorylimiterprocessor gomod: go.opentelemetry.io/collector v0.55.0 extensions: - gomod: "github.com/open-telemetry/opentelemetry-collector-contrib/extension/pprofextension v0.55.0" replaces: # a list of "replaces" directives that will be part of the resulting go.mod - github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver => github.com/newly12/opentelemetry-collector-contrib/receiver/prometheusreceiver prom_receiver_mem ```

The compile worked despite forgetting to run

$ GO111MODULE=on go install go.opentelemetry.io/collector/cmd/builder@latest

..I have no idea what that environment variable does, but I will repeat the compile nevertheless.

>>>>Used otelcol config <<<< ```yaml exporters: prometheus: endpoint: 0.0.0.0:7299 metric_expiration: 1m # default = 5m receivers: prometheus: disable_start_time: true config: scrape_configs: - job_name: localmetrics scrape_interval: 17s tls_config: insecure_skip_verify: true static_configs: - targets: [localhost:8888] # Self diagnostic metrics of otelcol labels: app: otelcol file_sd_configs: - files: - "C:/Program Files/OpenTelemetry/OTEL Collector/metric-targets/*.yaml" processors: memory_limiter: check_interval: 1s limit_mib: 3000 extensions: pprof: endpoint: "127.0.0.1:1777" service: extensions: [pprof] pipelines: metrics: receivers: [prometheus] processors: [memory_limiter] exporters: [prometheus] telemetry: logs: level: info encoding: json output_paths: ["stdout"] error_output_paths: ["stderr"] metrics: address: localhost:8888 ```

I started the built custom otelcol binary after restarting our application and re-ran my actions from yesterday.

grafik

Edit: after some more time still ok

grafik

Not sure why otelcol_process_runtime_total_sys_memory_bytes still slightly increased, will watch over a period of days.

Edit 2: Verify it wasn't the custom build but disable_start_time

Running the same build (now including otlp / jaeger receiver / exporter and batch processor if later needed), I re-ran the tests with disable_start_time: false and true

grafik

Note: The amount of metrics was still increasing at the end, which is why the memory usage was still slowly rising.

I will now let this otelcol process run for several days without restarts.

Edit: after running for ~ 1 week

Looking good.

grafik

holograph commented 2 years ago

Woohoo! Looking forward to seeing this released...

kwiesmueller commented 2 years ago

One other observation we've made is regarding windows. We're still collecting profiles etc. but it looks like our collector running on windows consistently uses more memory that on linux. We're running in K8s so the current theory as profiles don't show significant differences is that memory reporting in windows is different as we see the memory use growing over time and then level out which looks a little like the windows is not taking back memory until needed by something else. But so far I lack background on the windows side of things to confirm anything and we haven't prioritized it yet.

Doron-Bargo commented 2 years ago

Any update on this issue? We tried to build our own otel collector and the fix in #12215 worked ( run over a week on a cluster with 60 nodes )

jmacd commented 2 years ago

I commented in #12215 -- have we investigated why the garbage collection support in the metrics adjuster does not appear to work?

Support for start time is serious business. Prometheus has a heuristic that makes it unable to correctly calculate rates at restarts. OTLP makes it possible to correctly calculate rates around restarts, but that will be broken by the proposed fix to disable the adjuster. See the comment here: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/prometheusreceiver/internal/otlp_metrics_adjuster.go#L28

jpkrohling commented 2 years ago

@Mario-Hofstaetter, @holograph, @Doron-Bargo, could you give 0.57.2 a try? It was released last week.

Mario-Hofstaetter commented 2 years ago

@Mario-Hofstaetter, @holograph, @Doron-Bargo, could you give 0.57.2 a try? It was released last week.

@jpkrohling The release notes do not mention neither this (#9998) nor the PR (#12215) ? Has the new option been released? Or have there been other changes that might reduce momory consumption?

Has anyone looked at the memory dumps yet? It does not appear so, therefore it is still unclear if the memory usage (without disabling the start timestamp) is buggy?

jmacd commented 2 years ago

My understanding of this problem is that the OTel collector maintains a duplicative map of every active counter/histogram/summary timeseries in order to establish the start time of each series.

This is incredibly wasteful.

The Prometheus scrape manager includes the necessary map already, and IMO a good solution would be to extend the Prometheus scrape manager to include a small amount of new information about each series. Specifically, when the scrape observes a reset it is required to use its local information about the reset time as the start time of the reset series. Without the receiver adding this information--which it has on hand--the consumer is forced to read their database in order to establish the meaning (i.e., a contributed rate interpretation) of the point being written, which is a major efficiency concern and the reason OpenTelemetry includes a start timestamp.

I'm afraid the other ways of fixing this problem require replacing the Prometheus scrape manager in the OTC Prometheus receiver.

newly12 commented 2 years ago

@Mario-Hofstaetter I think @jpkrohling was referring #12765 which seems to be promising. would you mind to give it a try in your environment?

Mario-Hofstaetter commented 2 years ago

Well, I'll be damned. I ran my test scenario with Release otelcol 0.57.2 vanilla, and it looks like (for our scenario at least) the memory consumption has IMPROVED DRAMATICALLY and is fixed so to speak ✔

Big shoutout to @balintzs if #12765 was the golden change 👌🏻

On our biggest instance after running my test, otelcol process uses less memory than prometheus in agent mode. Will try again with a non-minimalistic configuration and run it long-term, but this looks very promising.

>>>> Details on current config / Screenshots <<<< Running Release file `otelcol_0.57.2_windows_amd64.tar.gz` Compare with https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/9998#issuecomment-1190288098 ![grafik](https://user-images.githubusercontent.com/33002073/183885027-5da9e02d-34a2-418a-8f29-3776c0a07c16.png) Config: ```yaml exporters: prometheus: endpoint: 0.0.0.0:7299 metric_expiration: 1m # default = 5m receivers: prometheus: # disable_start_time: true config: scrape_configs: - job_name: localmetrics scrape_interval: 17s tls_config: insecure_skip_verify: true static_configs: - targets: [localhost:8888] # Self diagnostic metrics of otelcol labels: app: otelcol file_sd_configs: - files: - "C:/Program Files/OpenTelemetry/OTEL Collector/metric-targets/*.yaml" processors: memory_limiter: check_interval: 1s limit_mib: 1500 extensions: pprof: endpoint: "127.0.0.1:1777" service: extensions: [pprof] pipelines: metrics: receivers: [prometheus] processors: [memory_limiter] exporters: [prometheus] telemetry: logs: level: info encoding: json output_paths: ["stdout"] error_output_paths: ["stderr"] metrics: address: localhost:8888 ```

Give it a try @Doron-Bargo @kwiesmueller @holograph @RalphSu

If memory stays stable, what should happen with this issue? Close as solved I guess after more feedback from the community? @newly12 @jpkrohling

Gonna open some beers as soon this PITA is closed 🍺

jpkrohling commented 2 years ago

If memory stays stable, what should happen with this issue? Close as solved I guess after more feedback from the community?

I think @jmacd has concerns about the current way we do things, so I'd either open a new issue to address his specific concerns and close this, or keep this one here open until his point is addressed.

gouthamve commented 2 years ago

Coming to this very late, but could others also verify if otelcol 0.57.2 fixes things for others as well? If yes, we can close this issue as we are still doing start time tracking but less buggily :)

Long-term though, I @jmacd is right that we should be doing this in Prometheus itself: https://github.com/prometheus/prometheus/issues/10164#issuecomment-1215037396

I'll update this issue once I have some buy-in from the Prometheus maintainers.

jmacd commented 2 years ago

@gouthamve This sounds great! Thank you for the link.

CatherineF-dev commented 2 years ago

Ignore this, in retesting #12765.

~Patched https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/12765 into v0.55 and it's amazing. Cpu reduced from 100m to 7m, and memory reduced from 1GiB to 500MiB.~


Oh, also updated use_start_time_metric flag. Lemme double check.

holograph commented 2 years ago

I jumped straight from 0.55 to 0.58, been running for the last 5 hours or so. So far memory utilization went down by about 30% and I'm not observing an upward trend, however the slow creep upwards in memory utilization is only observable after a much longer period of time (24 hours or so) so I'm still tracking. Fingers crossed!

holograph commented 2 years ago

Well over 12 hours in, I'm ready to call this a win:

image

The yellow graph represents the memory utilization of the previous 0.55 instance, stabilizing around the 2GB mark and then slowly creeping to the 2.4GB range over the course of a few days. The green line is 0.58, stabilizing very quickly around the more-modest 1.5GB mark and so far maintaining consistent memory usage patterns. Well done @jpkrohling and everyone involved, and thanks again to @Mario-Hofstaetter for relentlessly pushing this issue, it's great news to bring to my customers :-)

dashpole commented 2 years ago

Given the successful memory reductions, I think we can close this issue. I've opened https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/13451 to track lowering the memory used by caches in the prometheus receiver.

bogdandrutu commented 1 year ago

I think https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/13922 would make the memory usage go to 1/3, because we no longer need the store the attributes for initial and previous.