opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.44k stars 1.73k forks source link

[BUG] High CPU Usage reported reported for ClusterManager node by _node/stats API. #14667

Closed gargharsh3134 closed 1 month ago

gargharsh3134 commented 2 months ago

Describe the bug

Users are reporting high CPU usage percent being reported by the _node/stats API for clusterManager node post upgrading to 2.11 version from 1.3 version. With little to no change in workload, the cluster on 1.3 version was never reporting a CPU usage percent above 40%, however post upgrading, the percent is intermittently spiking to even around 90%. Need to investigate potential cause of regression (either in the functionality or the metric being reported) leading to CPU usage spikes.

Cluster Configuration on which spikes were observed: 3 Data Nodes (r6gd.4xlarge) 3 Master Nodes (r6gd.large) Total Number of shards -> 150

Related component

Cluster Manager

To Reproduce

  1. Monitoring the CPU usage percent provided by _node/stats API for clusterManager node, before and after upgrading from 1.3 version to 2.13 version.

Expected behavior

Given the workload remained pretty much constant, the intermittent CPU spikes should not have been observed.

Additional Details

No response

dblock commented 1 month ago

[Catch All Triage - 1, 2, 3, 4]

chanon-onman commented 1 month ago

We also see this issue with AWS OpenSearch 2.11

gargharsh3134 commented 1 month ago

The API was returning the correct CPU utilisation. It was an issue with our host setup, where a background component was intermittently taking up more CPU. Since, Opensearch process was not the one causing spikes and APIs response was cross-verified by taking top dumps, I'm closing this issue.