opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.12k stars 1.69k forks source link

[Paginating ClusterManager Read APIs] Paginate _cat/shards API. #14257

Open gargharsh3134 opened 1 month ago

gargharsh3134 commented 1 month ago

Is your feature request related to a problem? Please describe

As the number of shards grow in an opensearch cluster, the response size and the latency of the API increases. This issue tracks the approaches that can be used to paginate the response.

Describe the solution you'd like

For paginating the response, a pagination key would be required for which a deterministic order is/can be maintained/generated in the cluster. The deterministic order is required for starting a new response page from the very point where the last page left. The initial investigation on parameters/pagination key reveals 2 possible candidates, Index CreateTimeStamps (epoch timestamps) or NodeIDs.

Pagination key is index creation timestamps [RECOMMENDED]:

Overview: Each index has a creation timestamp stored in IndexMetadata which is part of Metadata object of ClusterState. These creation timestamps can act as sort/pagination key using which list of indices, sorted as per their respective creation timestamps, can be generated. The generated sorted list can then be used to prepare a list of shards to be sent as a response as per the page size.

User Experience:

Proposed New Query-> curl "localhost:9200/_cat/shards?format=json&pretty&nextToken=null" Users can specify a nextToken as a query parameter in the rest request, starting with nextToken being null to get the first page. Such paginated queries will be responded with a list of shards and a nextToken which can then be used to fetch subsequent pages until nextToken in the response is again null (implying no further pages remaining to be fetched).

Proposed New JSON response ->


{
  "nextToken" : "MCQw",
  "shards" : [{
      "index" : "test-ind",
      "shard" : "0",
      "prirep" : "r",
      "state" : "STARTED",
      "docs" : "0",
      "store" : "208b",
      "ip" : "127.0.0.1",
      "node" : "data4"
    },
    {
      "index" : "test-ind",
      "shard" : "0",
      "prirep" : "p",
      "state" : "STARTED",
      "docs" : "0",
      "store" : "208b",
      "ip" : "127.0.0.1",
      "node" : "data1"
    }]
}
  1. Users will not have control of the pageSize and a default page size at runtime will be decided by the OpenSearch process itself (say, k * number of data nodes + 1).

Given that shardRoutings for a shardID do not have any unique identifiers, it becomes difficult to define a strategy which can help start the next page incase shards corresponding to a shardID get split across pages. So, shards for a shardID should not split/span across pages and need to be displayed in a single page. This limitation then inherently imposes a limit on pageSize, i.e. minimum pageSize should always be greater than the maximum of number of shards for a shardID across all the indices in the cluster.

min(PageSize) = max(shardCountForShardID0Index1, shardCountForShardID1Index1, ..., shardCountForShardID0Index2, ......)

With this restriction on pageSize comes an overhead to having a new validation on pageSize parameter if it is to be passed by the user as a query parameter. That's why, instead of giving user the control over pageSize with a validation constraint, for now it's proposed to have OpenSearch decide the pageSize.

  1. The number of shards in a page will always be less than or equal to the default page size.

  2. The pagination would be agnostic to index creation or deletion operations which might get performed while the paginated queries are being executed. Since indices will be sorted according to the timestamps, newly created indices will always come towards the rear end of the pages, and a decision can be made whether to display them or not. If, they are not to be displayed, then the query start time (when the first page was queried) will have to stored as part of the nextToken.

  3. (OPEN POINT) Response for a JSON format request is easy to define with the existing behaviour. However, with plain text format response, which is actually a (n*n) table with headers and rows, nextToken doesn't fit in.

  4. (OPEN POINT) Whether to expose a parameter for user to get the list of shards in the ascending or descending order of their corresponding index create timestamps.

Implementation Details:

  1. Generating sorted list of indices based on creation time. To generate a sorted list, the following approaches need to be compared: a) Sorting the list obtained from clusterState metadata at the rest layer each time a paginated query is obtained. b) Making the indices() map of the Metadata object itself deterministic/sorted. The current hashmap implementation can be replaced with say LinkedHashMap and each time a new entry (index) is put/inserted into the map, a sorted map can be re-created. c) Maintaining a cache on each node which can be updated as part of appliers and listeners being run after each clusterState update. Also, a new TransportAction (with a term version check) might also be required for rest layer to fetch that cache

  2. New data members namely, nextToken and pagination response element (shards, segments, or indices) need to be introduced in the Table class which will then be read while preparing the actual RestResponse in the RestTable class.

Cons/Issues: The approach is susceptible to clock synchronization issues, and clock skews incase of cluster manager node switches can produce erroneous results.

Pagination key is NodeID.

The idea here will be to respond with all the shards on a set of nodes in a single page.

The concern with having NodeID as a pagination key is that, it is not agnostic to index creations or cluster re-balancing activities which could happen while the queries are getting executed.

Related component

Cluster Manager

Describe alternatives you've considered

No response

Additional context

No response

dblock commented 2 weeks ago

[Catch All Triage - Attendees 1, 2, 3, 4, 5]

backslasht commented 1 week ago

It makes sense not to have NodeId as the pagination key. But, can we introduce node id as filter in the API?