Open Bukhtawar opened 1 month ago
For a reasonably big cluster with 500k shards, logging cluster health changes becomes expensive after every reroute operation.
96.7% (9.6s out of 10s) cpu usage by thread 'opensearch[74e0b23bcf51c21918e96f38f93e1491][clusterManagerService#updateTask][T#1]' 2/10 snapshots sharing following 22 elements java.base@17.0.9/java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1053) app//org.opensearch.cluster.routing.RoutingTable.allShards(RoutingTable.java:245) app//org.opensearch.cluster.routing.RoutingTable.allShards(RoutingTable.java:225) app//org.opensearch.cluster.health.ClusterStateHealth.<init>(ClusterStateHealth.java:138) app//org.opensearch.cluster.health.ClusterStateHealth.<init>(ClusterStateHealth.java:77) app//org.opensearch.cluster.routing.allocation.AllocationService.buildResultAndLogHealthChange(AllocationService.java:186) app//org.opensearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:528) app//org.opensearch.node.Node$$Lambda$2602/0x0000004000a253b8.apply(Unknown Source) app//org.opensearch.cluster.routing.BatchedRerouteService$1.execute(BatchedRerouteService.java:136) app//org.opensearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:67) app//org.opensearch.cluster.service.MasterService.executeTasks(MasterService.java:882) app//org.opensearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:434) app//org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:301) app//org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:212) app//org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:209) app//org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:247) app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:863) app//org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283) app//org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246) java.base@17.0.9/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) java.base@17.0.9/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) java.base@17.0.9/java.lang.Thread.run(Thread.java:840)
All we care about is the status RED/YELLOW which can be derived using just the unassigned shards
ShardManagement:Performance
No response
[Triage - attendees 1 2 3 4 5] @Bukhtawar Thanks for creating this issue
Is your feature request related to a problem? Please describe
For a reasonably big cluster with 500k shards, logging cluster health changes becomes expensive after every reroute operation.
Describe the solution you'd like
All we care about is the status RED/YELLOW which can be derived using just the unassigned shards
Related component
ShardManagement:Performance
Describe alternatives you've considered
No response
Additional context
No response