Open desududev opened 3 months ago
Hey there, @desududev! I'm a bot designed to assist you with bugs, questions, and contributions. Feel free to ask for help while waiting for a human maintainer to join the conversation. 🤖
To troubleshoot and resolve high CPU utilization on a Windows server after configuring the OpenTelemetry Collector (otelcol), you can follow these steps:
Enable Internal Telemetry: Configure the Collector’s internal telemetry to monitor its performance. This will help you understand the resource usage and identify any bottlenecks.
Use the pprof
Extension: Enable the pprof
extension to profile the Collector as it runs. This can help you identify which parts of the Collector are consuming the most CPU.
extensions:
pprof:
endpoint: 0.0.0.0:1777
Check for Configuration Issues: Ensure that the Collector is properly configured. Misconfigurations can lead to high CPU usage. For example, make sure that the memory_limiter
processor is configured correctly to avoid memory pressure.
processors:
memory_limiter:
check_interval: 1s
limit_mib: 4000
spike_limit_mib: 500
ballast_size_mib: 1000
Inspect Logs and zPages: Check the Collector’s logs and use the zPages
extension to inspect live data from the Collector’s receivers and exporters. This can help you identify any errors or performance issues.
extensions:
zpages:
endpoint: 0.0.0.0:55679
Use the debug
Exporter: Configure the debug
exporter to confirm that the Collector is receiving, processing, and exporting data correctly. This can help you identify any issues in the data flow that might be causing high CPU usage.
receivers:
zipkin:
exporters:
debug:
service:
pipelines:
traces:
receivers: [zipkin]
processors: []
exporters: [debug]
To begin testing, generate a Zipkin payload. For example, you can create a file called trace.json
that contains:
[
{
"traceId": "5982fe77008310cc80f1da5e10147519",
"parentId": "90394f6bcffb5d13",
"id": "67fae42571535f60",
"kind": "SERVER",
"name": "/m/n/2.6.1",
"timestamp": 1516781775726000,
"duration": 26000,
"localEndpoint": {
"serviceName": "api"
},
"remoteEndpoint": {
"serviceName": "apip"
},
"tags": {
"data.http_response_code": "201"
}
}
]
With the Collector running, send this payload to the Collector:
curl -X POST localhost:9411/api/v2/spans -H 'Content-Type: application/json' -d @trace.json
You should see a log entry like the following:
2023-09-07T09:57:43.468-0700 info TracesExporter {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}
You can also configure the debug
exporter so the entire payload is printed:
exporters:
debug:
verbosity: detailed
If you re-run the previous test with the modified configuration, the log output looks like this:
2023-09-07T09:57:12.820-0700 info TracesExporter {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}
2023-09-07T09:57:12.821-0700 info ResourceSpans #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.4.0
Resource attributes:
-> service.name: Str(telemetrygen)
ScopeSpans #0
ScopeSpans SchemaURL:
InstrumentationScope telemetrygen
Span #0
Trace ID : 0c636f29e29816ea76e6a5b8cd6601cf
Parent ID : 1a08eba9395c5243
ID : 10cebe4b63d47cae
Name : okey-dokey
Kind : Internal
Start time : 2023-09-07 16:57:12.045933 +0000 UTC
End time : 2023-09-07 16:57:12.046058 +0000 UTC
Status code : Unset
Status message :
Attributes:
-> span.kind: Str(server)
-> net.peer.ip: Str(1.2.3.4)
-> peer.service: Str(telemetrygen)
By following these steps, you should be able to identify and resolve the cause of high CPU utilization on your Windows server after configuring the OpenTelemetry Collector [1][2].
To continue talking to Dosu, mention @dosu.
Good bot, especially the first three steps.
High CPU utlization on windows server after configuring the otelcol