opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.76k stars 1.82k forks source link

Search requests resource tracking framework #1179

Open tushar-kharbanda72 opened 3 years ago

tushar-kharbanda72 commented 3 years ago

Is your feature request related to a problem? Please describe.

https://github.com/opensearch-project/OpenSearch/issues/1042 aims to build back-pressure support for Search requests. This framework will act as a basic building block for building an effective search back-pressure mechanism.

Describe the solution you'd like

Build a resource tracking framework for search requests (queries), which tracks resource consumption on OpenSearch nodes, for various Search operations at different levels of granularity -

i. Individual Search Request (Rest) - On the Coordinator Node across phases (such as search and query phase) for end to end resource tracking from coordinator perspective. ii. Shard Search Requests (Transport) - On the Data Node per phase, for discrete search task tracking. iii. Shard level Aggregated View - Total consumption of resources mapped to every shard for searches on the node. iv. Node level Aggregated View - Total consumption of resources for all search request on the node.

Characteristics:

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

tushar-kharbanda72 commented 2 years ago

Got time this week to think more on this. Sharing my high level thoughts about the solution I'm thinking:

If we want to have E2E tracking of a search requests resource utilisation we have 2 major things to handle:

Tracking allocations on Search ThreadPool

  1. Making TaskId of the request available in Thread Context so that even when the request is handled by different threads - they should have these details. This can be made a part of transient headers. This can be done from TaskManager when a Task is created and registered.

  2. TaskResourceTracker: Create a TaskResourceTracker entity where all tasks are registered which needs resource tracking support. With each task additional optional meta data can be added like action type, shard/index name/id etc. Once the task is picked up by different threads those threads are then registered under the task for which they will be executing the work. And once the task completes that task will also get unregistered from TaskResourceTracker. So, it’ll hold a structure like Map<TaskInfo, List> - Implementation used would be of ConcurrentHashMap to support high throughput

  3. ResourceTrackingRunnable: Decorate the runnable which will get executed on Search ThreadPool (can be added to other ThreadPools as well if required). The responsibility of this runnable would be to register the thread in TaskResourceTracker under a task id (which’ll be available in Thread Context). While registering the thread it’ll capture the heap allocations by the thread at that time and CPU time of the thread. Once the runnable execution completes/errors out then it’ll capture the heap allocations and CPU time again for the thread and send this update to TaskResourceTracker that this thread has finished processing.

  4. ResourceWatcher: This will be a scheduled task which will be responsible to take snapshots at regular intervals (5 secs). What it’ll do is it’ll update the heap allocation and CPU time for each thread registered in TaskResourceTracker (not the ones which have finished). Once done then it’ll create views using that data. For eg: Memory allocations per node/shard/index/ThreadPool/actionType etc which ever required.

Without resource watcher we'll only get resource utilisation info once the task/thread completes which wouldn't be helpful as task is about to leave the node and don't need resources. Also, if there's a rogue query for which a phase is consuming a lot of resources and taking longer - We should know that which the phase is in progress so that if required a shortcutting decision can be made.

Tracking response overhead on coordinator

In InboundHandler while we’re constructing Java Object from bytes we’ll be doing allocations on heap. We track the overhead of creating the response object. At that time we don’t know the task id so we’ll map the overhead against the response object address and as soon as the thread context is restored and we have the thread id available we’ll move that response overhead to be tracked under that task only. And if there are any objects within the response which have delayed initialisation those will be constructed which being executed on Search ThreadPool so will get tracked automatically.

These metrics can exposed via stats API. Full details can be discussed once there’s alignment on this high level proposal.

These 2 should give us much needed visibility into resource utilisation due to Search requests on a node.

tushar-kharbanda72 commented 2 years ago

I need to run a performance benchmark to see if it hurts the performance under normal/high load. Will create a patch and should be able to get some results on this by next week.

asafm commented 2 years ago

This is great works towards supplying observability of search requests, specifically per shard (once done). I was wondering what is the state of this issue?

tushar-kharbanda72 commented 2 years ago

This is great works towards supplying observability of search requests, specifically per shard (once done). I was wondering what is the state of this issue?

Thanks for showing interest in this feature @asafm . We're working on a Task resource tracking framework where you can get resource consumption for any task running on a cluster which addresses not only search requests but all sorts of requests we want to track. We're code complete and currently in review phase and trying to this feature out with OpenSearch 2.0 release.

First PR for initial frame: https://github.com/opensearch-project/OpenSearch/pull/2089

tushar-kharbanda72 commented 2 years ago

This PR completes the initial task resource tracking framework. Users can get insights into resource consumption of tasks running on the cluster by using list tasks API. List Tasks API refreshes the resource consumption info before returning response so that it is accurate.

Users can further use X-Opaque-Id to get insights into how much resources their queries are using on different nodes.

curl --location --request GET 'http://127.0.0.1:9200/_tasks?actions=*'

{
    "nodes": {
        "JsXVdDkXRAOg3m3v6NNhrA": {
            "tasks": {
                "JsXVdDkXRAOg3m3v6NNhrA:74": {
                    "node": "JsXVdDkXRAOg3m3v6NNhrA",
                    "id": 74,
                    "type": "direct",
                    "action": "indices:data/read/search[phase/query]",
                    "description": "shardId[[test-index][0]]",
                    "start_time_in_millis": 1648563530779,
                    "running_time_in_nanos": 3777544037,
                    "cancellable": true,
                    "parent_task_id": "JsXVdDkXRAOg3m3v6NNhrA:73",
                    "headers": {},
                    "resource_stats": {
                        "total": {
                            "cpu_time_in_nanos": 6429000,
                            "memory_in_bytes": 307424
                        }
                    }
                }
            }
        }
    }
}
tushar-kharbanda72 commented 2 years ago

This current issue is now only for the Task resource tracking framework

dblock commented 2 years ago

The PR in https://github.com/opensearch-project/OpenSearch/pull/3046 was reverted.

alexahorgan commented 2 years ago

Demo feedback (8/3/22):

Outcome: Approved, ship it.

Action Items/Follow up: