open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.36k stars 1.44k forks source link

Wrong machine total memory by using percentage memory limiter #3598

Closed anoyli closed 3 years ago

anoyli commented 3 years ago

Describe the bug Hi Team, for the machine types vary much in our project, so we want to use percentage memory limiter to make it more reasonable to control the max memory usage of collector agent docker. However, we found the total number of machine memory not right. We use aws ec2 instance which have 30G memory, but when we check the log of collector agent, we found the memory number very huge, so it can't control the memory limit as our demand.

Steps to reproduce

processors:
  memory_limiter:
    check_interval: 5s
    limit_percentage: 2
    spike_limit_percentage: 1

What did you expect to see?

the limit_mib will be about 600M

What did you see instead?

the limit_mib is very huge, we can see as the follow log

What version did you use?

docker image: otel/opentelemetry-collector:0.29.0

What config did you use?

aws ec2: m4.4xlarge

Environment

Distributor ID: Ubuntu
Description:    Ubuntu 18.04.4 LTS
Release:    18.04
Codename:   bionic

otel collector agent start logs

2021-07-10T11:24:30.918Z info service/collector.go:262 Starting otelcol... {"Version": "v0.29.0", "NumCPU": 16}
2021-07-10T11:24:30.918Z info service/collector.go:170 Setting up own telemetry...
2021-07-10T11:24:30.920Z info service/telemetry.go:99 Serving Prometheus metrics {"address": ":8888", "level": 0, "service.instance.id": "126d1972-997a-4d83-a96f-eadd72acf577"}
2021-07-10T11:24:30.920Z info service/collector.go:205 Loading configuration...
2021-07-10T11:24:30.921Z info service/collector.go:221 Applying configuration...
2021-07-10T11:24:31.686Z info builder/exporters_builder.go:274 Exporter was built. {"kind": "exporter", "exporter": "kafka"}
2021-07-10T11:24:31.686Z info memorylimiter/memorylimiter.go:138 Using percentage memory limiter {"kind": "processor", "name": "memory_limiter", "total_memory": 9223372036854771712, "limit_percentage": 2, "spike_limit_percentage": 1}
2021-07-10T11:24:31.686Z info memorylimiter/memorylimiter.go:105 Memory limiter configured {"kind": "processor", "name": "memory_limiter", "limit_mib": 184467440737095434, "spike_limit_mib": 92233720368547717, "check_interval": 5}
2021-07-10T11:24:31.686Z info builder/pipelines_builder.go:204 Pipeline was built. {"pipeline_name": "traces", "pipeline_datatype": "traces"}
2021-07-10T11:24:31.686Z info builder/receivers_builder.go:230 Receiver was built. {"kind": "receiver", "name": "otlp", "datatype": "traces"}
2021-07-10T11:24:31.687Z info builder/receivers_builder.go:230 Receiver was built. {"kind": "receiver", "name": "jaeger", "datatype": "traces"}
2021-07-10T11:24:31.687Z info builder/receivers_builder.go:230 Receiver was built. {"kind": "receiver", "name": "zipkin", "datatype": "traces"}
2021-07-10T11:24:31.687Z info service/service.go:137 Starting extensions...
2021-07-10T11:24:31.687Z info builder/extensions_builder.go:53 Extension is starting... {"kind": "extension", "name": "zpages"}
2021-07-10T11:24:31.687Z info zpagesextension/zpagesextension.go:42 Register Host's zPages {"kind": "extension", "name": "zpages"}
2021-07-10T11:24:31.687Z info zpagesextension/zpagesextension.go:55 Starting zPages extension {"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":":55679"}}}
2021-07-10T11:24:31.687Z info builder/extensions_builder.go:59 Extension started. {"kind": "extension", "name": "zpages"}
2021-07-10T11:24:31.687Z info builder/extensions_builder.go:53 Extension is starting... {"kind": "extension", "name": "health_check"}
2021-07-10T11:24:31.687Z info healthcheckextension/healthcheckextension.go:41 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Port":0,"TCPAddr":{"Endpoint":"0.0.0.0:13133"}}}
2021-07-10T11:24:31.687Z info builder/extensions_builder.go:59 Extension started. {"kind": "extension", "name": "health_check"}
2021-07-10T11:24:31.687Z info builder/extensions_builder.go:53 Extension is starting... {"kind": "extension", "name": "pprof"}
2021-07-10T11:24:31.687Z info pprofextension/pprofextension.go:79 Starting net/http/pprof server {"kind": "extension", "name": "pprof", "config": {"TCPAddr":{"Endpoint":":1777"},"BlockProfileFraction":0,"MutexProfileFraction":0,"SaveToFile":""}}
2021-07-10T11:24:31.687Z info builder/extensions_builder.go:59 Extension started. {"kind": "extension", "name": "pprof"}
2021-07-10T11:24:31.687Z info service/service.go:182 Starting exporters...
2021-07-10T11:24:31.687Z info builder/exporters_builder.go:92 Exporter is starting... {"kind": "exporter", "name": "kafka"}
2021-07-10T11:24:31.687Z info builder/exporters_builder.go:97 Exporter started. {"kind": "exporter", "name": "kafka"}
2021-07-10T11:24:31.687Z info service/service.go:187 Starting processors...
2021-07-10T11:24:31.687Z info builder/pipelines_builder.go:51 Pipeline is starting... {"pipeline_name": "traces", "pipeline_datatype": "traces"}
2021-07-10T11:24:31.687Z info builder/pipelines_builder.go:62 Pipeline is started. {"pipeline_name": "traces", "pipeline_datatype": "traces"}
2021-07-10T11:24:31.687Z info service/service.go:192 Starting receivers...
2021-07-10T11:24:31.687Z info builder/receivers_builder.go:70 Receiver is starting... {"kind": "receiver", "name": "otlp"}
2021-07-10T11:24:31.688Z info otlpreceiver/otlp.go:75 Starting GRPC server on endpoint 0.0.0.0:4317 {"kind": "receiver", "name": "otlp"}
2021-07-10T11:24:31.688Z info otlpreceiver/otlp.go:137 Setting up a second GRPC listener on legacy endpoint 0.0.0.0:55680 {"kind": "receiver", "name": "otlp"}
2021-07-10T11:24:31.688Z info otlpreceiver/otlp.go:75 Starting GRPC server on endpoint 0.0.0.0:55680 {"kind": "receiver", "name": "otlp"}
2021-07-10T11:24:31.688Z info builder/receivers_builder.go:75 Receiver started. {"kind": "receiver", "name": "otlp"}
2021-07-10T11:24:31.688Z info builder/receivers_builder.go:70 Receiver is starting... {"kind": "receiver", "name": "jaeger"}
2021-07-10T11:24:31.688Z info static/strategy_store.go:201 No sampling strategies provided or URL is unavailable, using defaults {"kind": "receiver", "name": "jaeger"}
2021-07-10T11:24:31.688Z info builder/receivers_builder.go:75 Receiver started. {"kind": "receiver", "name": "jaeger"}
2021-07-10T11:24:31.688Z info builder/receivers_builder.go:70 Receiver is starting... {"kind": "receiver", "name": "zipkin"}
2021-07-10T11:24:31.688Z info builder/receivers_builder.go:75 Receiver started. {"kind": "receiver", "name": "zipkin"}
2021-07-10T11:24:31.688Z info healthcheck/handler.go:129 Health Check state change {"kind": "extension", "name": "health_check", "status": "ready"}
2021-07-10T11:24:31.688Z info service/collector.go:182 Everything is ready. Begin running and processing data.
bogdandrutu commented 3 years ago

@mxiamxia I remember you looked into the memory reading part.

The total memory seems to be 2^63 so definitely a bug somewhere.

Also another bug is that we display bytes not MB in the log, but that is an easy fix

anoyli commented 3 years ago

@mxiamxia I remember you looked into the memory reading part.

The total memory seems to be 2^63 so definitely a bug somewhere.

Also another bug is that we display bytes not MB in the log, but that is an easy fix

Thank you @bogdandrutu , and i found whatever the machine memory i use, the total_memory is always 9223372036854771712

mxiamxia commented 3 years ago

The issue has been fixed in https://github.com/open-telemetry/opentelemetry-collector/pull/3456. I think you will have the correct total memory when the new Collector release is out.

https://github.com/open-telemetry/opentelemetry-collector/blob/main/internal/iruntime/total_memory_linux.go#L37

rakyll commented 3 years ago

This is resolved now, we can close this. See #3456. cc @alolita

bogdandrutu commented 3 years ago

@anoyli can you give a try with 0.30.0 image?

anoyli commented 3 years ago

@anoyli can you give a try with 0.30.0 image?

Great work, guys, thank you, i will take a try later and feedback if any result.

alolita commented 3 years ago

Closing issue since this request is resolved.