Open jainankitk opened 1 year ago
Thanks @jainankitk sounds exciting. Some items like Query prioritisation and scheduling #1017 are themselves pending prioritisation 😎
Thanks @Bukhtawar for sharing that issue. Linking the issue in overview.
Some items like Query prioritisation and scheduling are themselves pending prioritisation 😎
Created this issue for holistic view of ideas in search space (any existing or new issue) and prioritize accordingly
This is quite interesting. I think we may be missing Query Planning, which may change how we do, for example, sandboxing, prioritization and scheduling.
Thanks @jainankitk! What about cost as a theme? A common problem users run into is that it can be quite expensive to have a large dataset that is searchable even if it is not actively being searched at a high rate. I know this bleeds more into separating compute and storage but it might be worth a mention in this holistic view as there might be multiple ways to attack the problem.
Thanks for putting this together.
I think we may be missing Query Planning, which may change how we do, for example, sandboxing, prioritization and scheduling.
+1
We have been discussing sandboxing for restricting the resource foot print in context of single query, but sandboxing makes more sense at bucket level. We can define per bucket thresholds configurable as a cluster setting to preemptively kill a query from the bucket for which threshold is hit.
To achieve bucket level sandboxing I think we will still need to have query level sandboxing in place such that each query in the bucket plays well with the resources given to it. With reactive cancellation mechanism we will still run into similar problems as today where oversubscription of the resources happen in the concurrent environment and triggering cancellation may be too late. One can think of the CircuitBreakers present today as a catch all bucket
I created an RFC for Disk based caching - https://github.com/opensearch-project/OpenSearch/issues/9001
I think we may be missing Query Planning, which may change how we do, for example, sandboxing, prioritization and scheduling.
Good catch! The planner, in addition to providing much needed visibility, helps us make important decisions across resiliency and performance
What about cost as a theme?
As you rightfully mentioned, it makes more sense for compute/storage separation.
there might be multiple ways to attack the problem.
Do you have any specific problem in mind from search perspective? Maybe, improving the ratio of searchable data per compute/memory unit?
To achieve bucket level sandboxing I think we will still need to have query level sandboxing in place such that each query in the bucket plays well with the resources given to it.
IMO bucket level sandboxing is more generic version allowing us not only tackle query level sandboxing (every query is unique bucket), it also allows for managing resources across multiple tenants on single cluster.
With reactive cancellation mechanism we will still run into similar problems as today where oversubscription of the resources happen in the concurrent environment and triggering cancellation may be too late. One can think of the CircuitBreakers present today as a catch all bucket
The reactive mechanism today has tricky decision making component. For query sandboxing, it is simpler as what to cancel is quite clear. If we can harden the cancellation mechanism, reactive should work pretty well. Although, there are interesting scenarios where you want to serialize the query execution instead of parallelization to stay within the resource constraint bounds. For example - instead of picking 2 shards in parallel on node, do them one after the other to stay within resource bounds.
To achieve bucket level sandboxing I think we will still need to have query level sandboxing in place such that each query in the bucket plays well with the resources given to it.
IMO bucket level sandboxing is more generic version allowing us not only tackle query level sandboxing (every query is unique bucket), it also allows for managing resources across multiple tenants on single cluster.
I am visualizing buckets as a group where multiple queries can execute. If you are considering 1 bucket per query than we are talking about same thing to enforce the sandboxing essentially at query level. What I was calling out, we need to have mechanism to sandbox per query within a bucket where a bucket is a group (which I think how buckets will be used ?) instead of 1:1 mapping with a query . Like Bucket_1 (IT dept), Bucket_2 (Security dept). I want to split the buckets such that I allow resource split of 40/60. So each bucket need to ensure that when multiple queries are running in each bucket they remain under the resource constraint of each bucket.
The reactive mechanism today has tricky decision making component. For query sandboxing, it is simpler as what to cancel is quite clear..
My thought was we will basically use similar mechanism but instead of node level it will be done at bucket level. But may be will wait to see the details on this to understand better :)
If we can harden the cancellation mechanism, reactive should work pretty well
Yes, but should we always fail the request as part of sandboxing ? Can be a first step towards it, but we should also think around how we can let the query run with constrained resource and complete may be taking more time but not necessarily failing it. Reactive is one approach other can be based on cost based planning.
I am visualizing buckets as a group where multiple queries can execute. If you are considering 1 bucket per query than we are talking about same thing to enforce the sandboxing essentially at query level. What I was calling out, we need to have mechanism to sandbox per query within a bucket where a bucket is a group (which I think how buckets will be used ?) instead of 1:1 mapping with a query . Like Bucket_1 (IT dept), Bucket_2 (Security dept). I want to split the buckets such that I allow resource split of 40/60. So each bucket need to ensure that when multiple queries are running in each bucket they remain under the resource constraint of each bucket.
Your understanding of the bucket is correct, I was just saying that bucketing is more generic and can be used for limiting individual queries as well. The bucket level decision is slightly simpler due to two reasons 1/ The value is configured by the customer, so we don't need to determine whether node is under duress 2/ Individual bucket should have more uniformity (due to which they should be bucketed together in the first place)
Yes, but should we always fail the request as part of sandboxing ? Can be a first step towards it, but we should also think around how we can let the query run with constrained resource and complete may be taking more time but not necessarily failing it. Reactive is one approach other can be based on cost based planning.
That's a great point and I also kind of talked about this in previous comment - There are interesting scenarios where you want to serialize the query execution instead of parallelization to stay within the resource constraint bounds. For example - instead of picking 2 shards in parallel on node, do them one after the other to stay within resource bounds. That being said, we will iterate in following manner 1/ Reactive cancellation 2/ Proactive cancellation based on historic latency/cancellation metrics and query cost estimation using planner 3/ Serialize across shards to not exceed resource bounds 4/ Limit the resources within individual shard search execution
Thanks @jainankitk for the high level vision! Wanted to add a few points on the visibility front. We are working on a set of Query Insights features to improve the visibility: https://github.com/opensearch-project/OpenSearch/issues/11522
We are working on query categorization, which will provide users OTel metrics on summaries of search workload types. Also as shown in the meta issue, we will also work on the Query Insights plugin with Top N Queries (based on resource usages) and in the future implement the Query Insights Dashboard (Admin Dashboard) to provide even more visibility!
Can we rename this RFC? It's focused on visibility and resiliency.
We have a lot more feature-oriented RFCs that are on the roadmap for search, and the title suggests that this is the singular vision for search.
Agree with Froh. How about renaming it to Performance and Resiliency: Overview?
Introduction
This issue proposes high level vision for core search within Opensearch. While the initial common themes are identified from customer feedback for Amazon Opensearch Service, any comments about the overall direction are welcome for making it applicable to more diverse use cases
Background
Search component forms the core of applications based on Opensearch. The ability to search data efficiently is critical enough to influence the data storage format during indexing keeping search first. Compared to traditional search engines, Opensearch supports insightful aggregations in addition to filtering allowing customers to do much more and visualize their data better
Common Themes
Proposal
I am capturing below some of the projects that can be prioritized or are being worked upon in other RFCs. Feel free to link any other ideas / projects already in flight for Core Search
Resiliency - The cluster can go down even with single query today, which should not be allowed. Recovery from node failures in OpenSearch is an expensive process and can lead to cascading failure. Hence, it makes sense to cancel/kill aggressively and what not to ensure the node stays up. Search back pressure is good first step in direction of making it more robust. The key component is the lightweight resource tracking that can be leveraged to do:
Performance - We need to look at ways of inherently improving the query performance. This becomes even more critical with the decoupling of storage and compute, remote storage performance is nowhere close to the hot counterpart
Visibility - Customers have very limited to no visibility into query execution today. Root causing issues related to query latencies are very difficult and time consuming
New Features
Query Prioritization - Different search queries have different priorities and implications. Search query triggered from customer facing interface is much more critical than the one triggered by an operator sitting at the backend. Currently the queries are processed FIFO, which is not ideal when the cluster is under load and queues are backing up. Opensearch has notion of priority for cluster state update tasks. We can use something similar at coordinator and node level to treat query as high/medium/low priority and process accordingly. Customer can set cluster default and override the default for specific set of queries. https://github.com/opensearch-project/OpenSearch/issues/1017
Query scheduling - Not all queries are same. Some queries are expensive but don’t need result immediately. Customers might want to schedule some of their expensive queries during low traffic time and look at the results later. We need to build some logic for scheduling the queries and probably can leverage mechanism similar to async search for saving the results
Next Steps We will incorporate the feedback from this RFC into more detailed proposal / high level design for different themes. We will then create meta issues to go in more depth for the detailed design