Open kiranprakash154 opened 9 months ago
@msfroh @dblock @reta @sohami @Bukhtawar @nknize would love your thoughts !
Could we piggy-back on the idea of "views" to enforce sandboxes? A view could be associated with a sandbox.
Alternatively, if we wanted to be more flexible, a view could enforce a search pipeline, then the search pipeline could (conditionally) modify the request to associate it with a sandbox.
My next few messages contain notes from Search Backlog and Triage meeting: https://www.meetup.com/opensearch/events/298411954/. The @ references refer to the people who said the respective lines.
Example from the Google doc that helps clarify things more than the example above:
Two Sandboxes defined for memory and CPU
[
{
"name": "analysts",
"attributes": {
"role": "ba-*"
},
"resources": {
"jvm": {
"allocation": "0.5"
},
"cpu": {
"allocation": "0.3"
},
},
"enforcement": "soft"
},
{
"name": "operations",
"attributes": {
"role": "tellers",
"indices_name": "transactions-*"
},
"resources": {
"jvm": {
"allocation": "0.25"
}
},
"enforcement": "hard"
}
]
So I think the usage of the sandbox
concept is confusing here: we are not sandboxing anything (we cannot limit resources) but track the resource usage and react on that. Better names would be "resource limits group", "query prioritization group" or something along these lines.
Linking #1017 as it seems somewhat related
Thanks for the RFC @kiranprakash154 and notes @msfroh . Regarding the term "Sandbox" used in the proposal, I'd like to offer another perspective. The term might not fully capture the essence of our approach, as we are not actually pausing or decelerating query execution to manage system load, which is often associated with a sandbox environment. Instead, we're implementing a more granular strategy on the lines of query-level circuit breakers for various resources, such as CPU and memory. This method ensures more precise control over resource allocation.
Additionally, incorporating a priority system for query execution, along with the capability to cancel queries when resource usage exceeds certain thresholds, seems to align with the principles outlined in this proposal.
Thanks for writing this all up and taking the time to present the document today. The level of detail and consideration for low level scenarios shows through. In the context of advancing this proposal - I would recommend creating a proof of concept and publishing it as a draft pull request. This approach will not only help us visualize the proposed mechanisms but also enable a hands-on exploration of scenarios that significantly impact users.
As we moving forward I would suggest probing into the following areas:
@peternied - Thank you for reviewing the proposal and providing detailed feedback. Please find response to some of the questions below:
Clusters evolve over time by scaling horizontally (additional nodes) or vertically (larger instances), how does changing these constraints impact sandboxing. What needs to be built so that after a scaling event sandboxing works without manual intervention?
Scaling the cluster horizontally / vertically will not impact the sandboxing as constraints are such that they are agnostic to such events.
There is overhead with additional monitoring systems and keeping 'head room' in thresholds that directly impacting billing. How can we make sure that the cluster is well tuned to be responsive without underutilization of resources?
Underutilization is an interesting aspect and something we have talked about at length while coming up with the proposal. Every sandbox has option of soft enforcement mode that allows it to exceed the allocated quota assuming the node is not under duress. If the node is under duress, the sandbox will be hard enforced to its pre-allocated quota, since we want to prevent node from going down at any cost.
When an enforcement an action is performed (queries are rejected or canceled), the feedback loop for the query author is crucial. How are users informed of these actions and what guidance are they provided to help them adjust their queries?
While we will not provide any immediate guidance around adjusting the queries, the error message should inform the users clearly about the reason for rejection. In the later phases, we can leverage query insights for providing actionable feedback on making query more efficient.
Thanks for the presentation today.
I would like to understand how a sys admin would use sandboxes for the following scenarios:
Classic search on the web, e.g., e-commerce (see the Atlassian comment):
Fairness in updating dashboards:
Mix of batch jobs:
Let me admit that – except for the first case – I have no practical experience with these scenarios. Are they realistic? Can we handle them?
-s
I echo my thoughts with @reta and had similar comments on labelling this as "sandbox" since this is more a tenant specific resource limit. While this is a good starting point, we need to see how we tie the bigger picture with other multi-tenant capabilities like
Thanks @macrakis for providing these scenarios
- I have a steady stream of small live queries (Web users) where I want to bound latency (mean < 100ms, P99 < 1000ms, Pmax < 2000ms).
1st use case is more around end to end latency which gets affected due to multiple reasons and involve load characteristics on multiple nodes, while with this feature we are introducing node level resiliency so this doesn't come under the purview of this RFC.
- I mix in some longer-running, lower-priority queries which may use a lot of resources, but which can be re-run if they fail.
This one hints more towards async completion of cancellable queries at node level, this is something we can support as 2nd iteration of this. Piggybacking too many things on this RFC will only clutter up and complicate the delivery of this.
- I am also performing updates and I don’t want my search catalog to lag more than 10 sec behind the data feed.
This is something controlled using index.refresh_interval
setting if I am right, So this is already present.
Thanks @Bukhtawar! for taking time to provide your insightful thoughts on this. Naming convention is something we call can come to an agreement to use throughout.
Regarding tying this feature with other multi-tenant capabilities
Index level encryption
Not sure how does this one correlates with node level resiliency, can you be more specific ?
Tenant specific shard placement for index patterns where users can choose to allocate indices on certain node groups for better query isolation
But given this is a node level resiliency feature and given sharding mechanism distributes the data fairly well. I think this can potentially create the skewness in data distribution, Since we will track and limit the resource usage at node level for search workload then This will work fine as long as those query groups are not oversubscribing the resources. We can only isolate the search queries when the indexing traffic is not shared with search traffic.
Prioritised queue to run some queries in the background based on constraints.
I suppose you are hinting towards async completion of cancellable queries or do you mean something else ?
Since we are not introducing the priority concept for these query groups in the first iteration, We can take this up later.
Regarding naming convention instead of Sandbox for this feature, what should we go with (I can make few initial suggestions)
Hi @kaushalmahi12, @jainankitk, @kiranprakash154 I just wanted to check in over here.
I know we've chatted a bit about this and I want to share that I am neither for/nor against making this or any of the related changes from a code perspective at the moment. I went ahead and left some basic comments on @kaushalmahi12's draft and don't have too many suggestions at this point.
It seems like there is some confusion around the use case for this feature; while I appreciate how detailed this RFC is (wow!), I don't know if you can point to some other issues or comments which ask for something like this?
Looking this over again, several of us have pointed out that "sandbox" is probably the wrong name. Besides implying incorrectly that it has something to do with testing or with security, it doesn't explicitly mention that it is about resources or groups of users. I like @reta's suggestion of "resource limits group".
Based on the latest discussion on this, We have come to a conclusion that this problem consists of following independent subproblems
Based on the discussion with folks we had decided to move the APIs to a plugin due to following reasons
We will make it as a core plugin due to following reasons
QueryGroup
is also residing in core because of the tracking and cancellation.@kaushalmahi12 Should it be a module? If the reasons to separate it from the core code are for modularity/architectural reasons, but the majority of the feature actually sits in the core, then it sounds like a module might be a better fit. There are two major differences between a module and a plugin:
This plugin appears to have no new dependencies, so I'm not sure the classloader difference is important. The other point is an important difference though. Are there cases where you would not want this feature to be installed?
@andrross These are important facts to consider when making the decision to move the feature into a plugin/module. I think it might not make sense for all users since this feature will need enablement so the code might sit dead in the artifact. If that is fine with the community then It should be fine to move it to a module. @andrross What are your thoughts on this ?
code might sit dead in the artifact
Core plugins are in the min distribution artifact. They aren't loaded into the JVM unless installed, but the overhead of loading these classes is likely negligible.
feature will need enablement
If the act of installing the plugin is the enablement mechanism, then a plugin is probably the right place because then you'd have to build some other enabling mechanism.
@msfroh What do you think regarding plugin vs module here?
Co-Author @kkhatua
8879 - [RFC] High Level Vision for Core Search in OpenSearch
11061 - [RFC] Query Sandboxing for search requests
A common challenge with managing resources on OpenSearch clusters has been keeping runaway queries in check. With Search Backpressure, the ability to avoid running out of resources is now available, but there is no capability of protecting tenants who might be unfairly penalized. The goal of this RFC is to propose a mechanism for how admin users will be able to organize tenants into different groups (aka Sandboxes), and limit the cumulative resource utilization of these groups. We will mention an idea of how the sandbox is enforced, but will likely be a separate RFC due to its complexity.
What is a Sandbox ?
A sandbox is a logic construct designed for running search requests within the virtual limits. The sandboxing service will track and aggregate a node's resource usage for different sandboxes, and based on the mode of containment, limit or terminate tasks of a sandbox that's exceeding its limits. A sandbox's definition is stored in cluster state, hence the limits that need to be enforced are available to nodes across the cluster.
Tenets
Schemas
Sandbox Definition
The following is an abstract example of a sandbox’s definition, and is broadly broken into 4 essential elements within the document
Resource Definition
For each resource, a cluster level schema is also required, and the following is an abstract example
High Level Flow
A request landing on a coordinator node will first need to be mapped to a sandbox, as per the tenets. Once a sandbox has been mapped, all child tasks spawned from the request will also inherit the sandbox allocation, irrespective on which node it runs.
Sandbox Resolution
The sandbox resolution happens on the coordinator node and will persist with all the tasks (and child tasks) of that request for the entirety of its lifecycle.
Thresholds Enforcement
As the sandboxes are enforced at a node level, for each resource, there are 2 thresholds defined cluster-wide:
Each Sandbox level thresholds are always proportional to the node level thresholds
The following is an example where
Additional Context
This RFC aims to discuss ideas at a high level. More details are provided in this google doc. Anyone with the link has comment access and we would love to gather feedback.