opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.51k stars 1.75k forks source link

[BUG] RemoteShardsBalancer cannot rebalance shards as expected #15302

Open bugmakerrrrrr opened 1 month ago

bugmakerrrrrr commented 1 month ago

Describe the bug

When rebalancing searchable snapshot indices, we first calculate the average number of primary shards that each node should has. In current implementation, the formula is as follows.

(totalNumberOfRemotePrimaryShards + totalNumberOfUnassignedShards)/totalNumberOfRoutingNodes

https://github.com/opensearch-project/OpenSearch/blob/e8ee6db87e55f66e28b46fac90bf2a4b33755160/server/src/main/java/org/opensearch/cluster/routing/allocation/allocator/RemoteShardsBalancer.java#L247-L251

If a cluster has both search nodes and dedicated data nodes, this average number of shards calculated using the current formula may be lower than the actual number.. I think that the formula needs to be adjusted as follows:

(totalNumberOfRemotePrimaryShards + totalNumberOfUnassignedRemoteShards)/totalNumberOfSearchNodes

Related component

Search:Searchable Snapshots

To Reproduce

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

RemoteShardsBalancer should rebalance shards as expected

Additional Details

Plugins Please list all plugins currently enabled.

Screenshots If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

Additional context Add any other context about the problem here.

bugmakerrrrrr commented 1 month ago

@kotwanikunal @andrross you might be interested in this :)