[META] Search Query Categorization

deshsidd commented 11 months ago

Overview

Today OpenSearch customers have limited very visibility into the query workload running on a cluster. There is also no easy way to identify patterns in the queries being executed upon an index. This imposes a huge gap while debugging performance issues, tracking changes in data access patterns, or when targeting new feature improvements.

The Query Classification feature (part of the Query Visibility Project) in OpenSearch aims to enhance the platform's capabilities by providing a mechanism to identify patterns, latencies and resource utilization breakdown for the queries being executed upon an index. This will empower users and administrators to optimize query performance and identify query types for better resource allocation and index management.

The primary objective of this proposal is to implement a query classification mechanism within OpenSearch that can categorize and analyze the queries being executed on an index.

We intent to use metric counters to record this information using Metric Framework : https://github.com/opensearch-project/OpenSearch/pull/10241

This task has the following Phases and tasks:

Phase 1

Extract query type and level information for a small number of query types and add the basic framework : https://github.com/opensearch-project/OpenSearch/issues/10250

Phase 2 (https://github.com/opensearch-project/OpenSearch/issues/11040)

Identify the types of aggregations : https://github.com/opensearch-project/OpenSearch/issues/11366
Expand capture query type and level information : https://github.com/opensearch-project/OpenSearch/issues/11364
Extract fields related information : https://github.com/opensearch-project/OpenSearch/issues/11365
Gather information regarding the number and types of fields as part of the response : https://github.com/opensearch-project/OpenSearch/issues/11367

Integrations

We will be integrating this data collection with a generic data collection framework as part of the Query Insights project : https://github.com/opensearch-project/OpenSearch/issues/11429
We will be surfacing this information using the Query Insights dashboard : https://github.com/opensearch-project/OpenSearch-Dashboards/issues/5571

deshsidd commented 11 months ago

cc @getsaurabh02 @msfroh @ansjcy @rishabhmaurya @backslasht

ansjcy commented 11 months ago

Appreciate the summary! These categorization metrics would be valuable for identifying potential patterns in queries in the future. Furthermore, insights dashboards can leverage these metrics to provide additional layers of analysis for the users.

I think right now the big question the community has is "how would those metrics be beneficial for me once the metrics are available". It would be good give some examples illustrating the practical application of these metrics, and how can a user use those metrics and also examples of "insights" the user can get with some example metrics.

macohen commented 11 months ago

I agree with @ansjcy. If you took an example query, maybe from a benchmark, and then worked through the example here, what would the dataflow look like? Break down the query, categorize the parts. What metrics would help tune a query? What metrics might indicate an unhealthy cluster? Could this tool be used to prevent any issues with queries?

opensearch-project / OpenSearch