[Paginating ClusterManager Read APIs] Paginate _cat/segments API.

gargharsh3134 commented 1 month ago

Is your feature request related to a problem? Please describe

As the number of indices and data/segments grow in an opensearch cluster, the response size and the latency of the _cat/segments API increases, which makes it difficult not only for the client to consume such large responses, but also stresses out the cluster by making it accumulate segment stats across all the indices.

Describe the solution you'd like

Blocking the API for large clusters. Given that opensearch cluster can be used to store large amount of data, the number of segments can be very high (for eg, 1.6 million segments for 80TB data across 5k indices). If there isn't any strong use-case of the API for any monitoring activity, it can very well be blocked for large clusters to prevent unnecessary usage of nodes' resources.
Pagination: If the API is not to be blocked, one of the proposal to paginate will depend on the way _cat/shards API is paginated. A list of shards can be generated for a requested page (as per the finalized approach of _cat/shards API) and all the segments belonging to those shards can be displayed in a single page. The pageSize parameter would not be very well defined in that case. Instead of it directly being, number of segments in a page, it would rather be number of shards corresponding to which all the segments will be displayed. This would also be susceptible to skewness in the cluster. If shards having relatively more data/segments get picked up in same page, response sizes can significantly vary.

Related component

Cluster Manager

Describe alternatives you've considered

No response

Additional context

No response

dblock commented 2 weeks ago

[Catch All Triage - Attendees 1, 2, 3, 4, 5]

msfroh commented 2 weeks ago

Right now, a lot of the /_cat APIs work via a scatter-gather across all nodes. Essentially, the request hits a node, it fans out to all nodes, then stitches the responses together.

One option for pagination might involve limiting the number of nodes that get queried per request.

dblock commented 2 weeks ago

Maybe there's a way to redesign this as a "walk the cluster" in a predictable sequence with a cursor that can remember its place even with cluster changes, therefore keeping pagination stable, and making no more than N fan-out requests ahead at any given time to gather more data.

Bukhtawar commented 2 weeks ago

I think we could fan out to all the nodes for the first request, create checkpoint of the point in time state and then from there on serve batch node responses ensuring that all nodes are using the same point in time state. This could be costly for certain APIs though so we really need a trade-off of consistency vs cost

backslasht commented 1 week ago

I see the conversation is aligned towards pagination based on checkpoints. Alternatively, can we make it filter based? and the filter can be based on index, shard, node, etc. With filters, server need not retain the checkpoint and it is also simpler to implement.

Thoughts?

backslasht commented 1 week ago

I see index name or index regex is already supported as target (filter). Can it be extended to shards and nodes?

shwetathareja commented 1 week ago

create checkpoint of the point in time state and then from there on serve batch node responses ensuring that all nodes are using the same point in time state.

Cluster state today is point in time information. Maintaining checkpoints would be very expensive, overhead will increase depending on state changes happening in the cluster. These APIs are used for diagnostic purpose. I feel adding checkpoints of cluster state will be overkill.

Alternatively, can we make it filter based?

The below API already support filtering on indices: _cat/shards _cat/segments _cat/indices

But, the major problem is user don't always know what they are looking for and default response in large cluster would be very big for these APIs. We are thinking about adding more aggressive filtering support like node level in admin APIs separately but filtering only may not be sufficient. One thing I was thinking earlier was can we do implicit pagination like based on node. One challenge is node can leave and join the cluster and then the API responses could get confusing. Like in _cat/shards or _cat/segments API, a shard can appear multiple times if it was reassigned to a different node during pagination was in-progress. Paginating on indices with creation time can provide a more stable and deterministic pagination.

shwetathareja commented 5 days ago

By the way _cat/segments is not widely used API, pagination may not be P0 for this one. For this API, to start with we can enforce aggressive filtering but for _cat/shards pagination would be important.

opensearch-project / OpenSearch