opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.16k stars 1.69k forks source link

Search Memory Tracking - track memory used during a shard search #1009

Open malpani opened 3 years ago

malpani commented 3 years ago

Is your feature request related to a problem? Please describe. There is limited visibility into how much memory is consumed by a query. In an ideal world, resource consumption details should be abstracted out from users and everything should auto-tune/auto-reject. But we are not there (yet!) and with every query treated equally, certain memory heavy queries can end up tipping the memory breakers for all requests. It will be helpful to track and surface memory consumed by a query. This visibility can help users tune their query better

Describe the solution you'd like The plan is to make this generic and expose these stats via tasks framework. Tasks Framework already tracks latency and has some context about the query/work being done. The idea is to enhance this to start tracking additional stats of memory and CPU consumed per task. As tasks have a nice parent--> child hierarchy, this mechanism will allow tracking the cluster-wide resource consumption from a query. So plan is to update Task to start tracking additional context + stats. When a task completes, this task info will be pushed to a sink. Sink can be logs or a system index to enable additional insights

For search side tracking of stats - The proposed solution is to leverage the single threaded nature of searching within a shard. I plan to use [ThreadMxBean.getCurrentThreadAllocatedBytes](https://docs.oracle.com/en/java/javase/14/docs/api/jdk.management/com/sun/management/ThreadMXBean.html#getCurrentThreadAllocatedBytes() ) for tracking the memory consumption and exposing this in 2 forms

Based on some initial rally benchmarks on a POC, the overhead does not look high. Having said that, my plan is to gate this under a cluster setting search.track_resources that defaults to false (disabled)

Describe alternatives you've considered

Planning

AmiStrn commented 3 years ago

How about having a way to stop/deprioritise memory heavy queries kind of like the way timeout for a query works?

This is different than the observability issue. But makes sense to prevent these really intensive queries to begin with. (In addition, not instead of...)

Bukhtawar commented 3 years ago

Nice proposal, maybe we need an extension for aggregation reduce phases on the coordinator as well(major contributors to memory), also being cautious about deserialisation overhead.

@AmiStrn maybe we need a special handling for query prioritization for instance async searches should have a different priority than usual search #1017. Also we might need to track/estimate memory prior to the memory allocation in order for it to be terminated early. I guess both of the above can be tracked separately. Thoughts?

malpani commented 3 years ago

@AmiStrn Today a query execution can be stopped on scenarios like hitting the bucket limit or parent breakers. There is value to adding some notion of memory sandbox and preempt the query on hitting a 'per query memory limit' as the next phase and eventually improve the memory estimation (prior to executing)

@Bukhtawar good point. This approach will not capture the reduce phase overhead and I will explore that as a follow up

malpani commented 2 years ago

Finally got some time to explore this more and here are some thoughts

  1. The utility of exposing top N via a new search_stats section into /_node/stats API to return N most expensive queries is limited and may not help answer questions like "What queries between October 4 and 5 were most expensive in terms of their memory footprint?" as N most expensive queries might have run 60 days ago.
  2. Implementing this via tasks framework can provide a hook to track on parent task ids and not just restrict to isolated shard level memory tracking (thanks @sohami for the idea). It also allows for other actions (not just search, if they choose to) to track memory usage. Existing tasks API already tracks latency and adding memory consumption could be useful.
  3. On completion of task - task info which will include memory used(for search tasks) can be dumped into a sink - sink could be configurable a simple log file or a system index for further analysis.