opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.83k stars 1.83k forks source link

[BUG] [Cluster Manager Throttling] Cluster Manager Throttler allowing 2x the tasks allowed to be queued #16693

Open gbbafna opened 1 day ago

gbbafna commented 1 day ago

Describe the bug

On a 200 node cluster , with "cluster_manager.throttling.thresholds.update-snapshot-state.value" : "5000", , during snapshots we are seeing pending tasks of 10k instead of 5k . All 10k tasks are of update snapshot state task type only

curl  localhost:9200/_cat/pending_tasks  > /tmp/pt 

➜  ~ grep " update snapshot state" /tmp/pt | wc -l
10000

Looks like somehow we are allowing 2x the tasks before we actually start throttling on the task type.

Related component

Cluster Manager

To Reproduce

  1. Create 200 node cluster with 200k empty shards.
  2. Set "cluster_manager.throttling.thresholds.update-snapshot-state.value" : "5000"
  3. Register a repository and trigger a snapshot . Observe the pending task count just after triggering the snapshot .

Expected behavior

Pending task should not go beyond 5k for update snapshot state task type

Additional Details

Plugins Please list all plugins currently enabled.

Screenshots If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):