opensearch-project / query-insights

Query Insights plugin offers frameworks and APIs for analyzing and optimizing query performance in OpenSearch.
Apache License 2.0
8 stars 7 forks source link

[BUG] type param not working as expected for the Top N queries API #80

Closed deshsidd closed 2 months ago

deshsidd commented 2 months ago

What is the bug?

Need to filter out memory and cpu when we query top queries using param : GET /_insights/top_queries?type=latency However, looks like this is not getting filtered out and might be a regressions.

How can one reproduce the bug?

Enabled topN:

{
  "persistent" : {
    "search.insights.top_queries.latency.window_size" : "1m",
    "search.insights.top_queries.latency.top_n_size" : 3,
    "search.insights.top_queries.latency.enabled" : true,
    "search.insights.top_queries.cpu.enabled" : true,
    "search.insights.top_queries.memory.enabled" : true
  }
}

Index docs and run a few search queries.

GET http://{{hostname}}/_insights/top_queries?type=latency

{
            "timestamp": 1724887236541,
            "search_type": "query_then_fetch",
            "indices": [
                "my_index"
            ],
            "task_resource_usages": [
                {
                    "action": "indices:data/read/search[phase/query]",
                    "taskId": 51,
                    "parentTaskId": 50,
                    "nodeId": "CFpm76yZQ5iecw8ciQFV3g",
                    "taskResourceUsage": {
                        "cpu_time_in_nanos": 4063000,
                        "memory_in_bytes": 33656
                    }
                },
                {
                    "action": "indices:data/read/search",
                    "taskId": 50,
                    "parentTaskId": -1,
                    "nodeId": "CFpm76yZQ5iecw8ciQFV3g",
                    "taskResourceUsage": {
                        "cpu_time_in_nanos": 366000,
                        "memory_in_bytes": 3224
                    }
                }
            ],
            "phase_latency_map": {
                "expand": 0,
                "query": 14,
                "fetch": 1
            },
            "node_id": "CFpm76yZQ5iecw8ciQFV3g",
            "total_shards": 1,
            "source": {
                "query": {
                    "match_all": {
                        "boost": 1.0
                    }
                }
            },
            "labels": {},
            "latency": 20,
            "cpu": 4429000,
            "memory": 36880
        }

What is the expected behavior?

returns latency, cpu and memory but should only return latency. Same for all the other combinations.

What is your host/environment?

mac localhost

Do you have any screenshots?

If applicable, add screenshots to help explain your problem.

Do you have any additional context?

Add any other context about the problem.

dzane17 commented 2 months ago

This behavior is only when search.query.metrics.enabled is true.

We append latency, cpu, memory data here based on shouldCollect() which checks the individual metric flags OR search.query.metrics.enabled:

    private boolean shouldCollect(MetricType metricType) {
        return queryInsightsService.isSearchQueryMetricsFeatureEnabled() || queryInsightsService.isCollectionEnabled(metricType);
    }

https://github.com/opensearch-project/query-insights/blob/main/src/main/java/org/opensearch/plugin/insights/core/listener/QueryInsightsListener.java#L175-L177

dzane17 commented 2 months ago

When search.query.metrics.enabled=false, top N only returns metrics which are enabled.

Example: 1st query has only latency, 2nd query has latency & cpu

curl -X GET "localhost:9200/_insights/top_queries?pretty"                                             
{
  "top_queries" : [
    {
      "timestamp" : 1724957602437,
      "phase_latency_map" : {
        "expand" : 0,
        "query" : 1,
        "fetch" : 0
      },
      "labels" : { },
      "source" : {
        "query" : {
          "match_all" : {
            "boost" : 1.0
          }
        }
      },
      "indices" : [ ],
      "search_type" : "query_then_fetch",
      "node_id" : "DCHfS-bKQHyLsBcqH5j6MQ",
      "task_resource_usages" : [
        {
          "action" : "indices:data/read/search[phase/query]",
          "taskId" : 265,
          "parentTaskId" : 264,
          "nodeId" : "DCHfS-bKQHyLsBcqH5j6MQ",
          "taskResourceUsage" : {
            "cpu_time_in_nanos" : 811000,
            "memory_in_bytes" : 18248
          }
        },
        {
          "action" : "indices:data/read/search",
          "taskId" : 264,
          "parentTaskId" : -1,
          "nodeId" : "DCHfS-bKQHyLsBcqH5j6MQ",
          "taskResourceUsage" : {
            "cpu_time_in_nanos" : 280000,
            "memory_in_bytes" : 3272
          }
        }
      ],
      "total_shards" : 1,
      "latency" : 3
    },
    {
      "timestamp" : 1724957554669,
      "phase_latency_map" : {
        "expand" : 0,
        "query" : 2,
        "fetch" : 0
      },
      "labels" : { },
      "source" : {
        "query" : {
          "match_all" : {
            "boost" : 1.0
          }
        }
      },
      "indices" : [ ],
      "search_type" : "query_then_fetch",
      "node_id" : "DCHfS-bKQHyLsBcqH5j6MQ",
      "task_resource_usages" : [
        {
          "action" : "indices:data/read/search[phase/query]",
          "taskId" : 243,
          "parentTaskId" : 242,
          "nodeId" : "DCHfS-bKQHyLsBcqH5j6MQ",
          "taskResourceUsage" : {
            "cpu_time_in_nanos" : 793000,
            "memory_in_bytes" : 18248
          }
        },
        {
          "action" : "indices:data/read/search",
          "taskId" : 242,
          "parentTaskId" : -1,
          "nodeId" : "DCHfS-bKQHyLsBcqH5j6MQ",
          "taskResourceUsage" : {
            "cpu_time_in_nanos" : 182000,
            "memory_in_bytes" : 3272
          }
        }
      ],
      "total_shards" : 1,
      "cpu" : 975000,
      "latency" : 3
    }
  ]
}
ansjcy commented 2 months ago

This is an expected behavior, we always measure latency, cpu and memory usage unless you turn off search.query.metrics.enabled as @dzane17 memtioned. when you do GET /_insights/top_queries?type=latency, it means "get the top queries sorted by latency", but all other metrics will be included, it's just we are not sorting based on those keys.