opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.62k stars 1.76k forks source link

[QSB Meta Issue] Association and Accounting of Requests in QSB #11900

Open kaushalmahi12 opened 8 months ago

kaushalmahi12 commented 8 months ago

Is your feature request related to a problem? Please describe

This is a subpart of the original feature QSB. Main RFC Proposal Doc

Sandbox and its types

Sandbox

This is the main entity which will help us divide the traffic into groups and enforce system resource limits per such group. The classification of such groups depends on the request attributes e,g;

Although these attributes are a great way to start segmenting the traffic but still it is very hard to truly divide the traffic into user specific groups as we can't accurately partition the incoming requests into user specific sandboxes for example if incoming request coming from userA and for indexB could resolve into two different sandboxes. Hence it warrants to think little differently to handle such cases.

We will use special type of sandboxes which will be user specific only to address the following

Sandbox Types

  1. Reserved Type - These sandboxes will have multiple attributes and will be responsible for shard level request association and accounting. It will have fixed low and high limits for each system resource, CPU and JVM allocations to start with. Sum of a system resource for all such sandboxes should not exceed the value 100. It will have all the attributes such as, index type, user and index pattern as mandatory attributes. On breach of low limit of a system resource for this sandbox will start causing rejections while on breach of high limit of a system resource it will start cancelling the requests in the sandbox. This may not cause the parent request cancellation because of the flag allow_partial_results.

  2. Constrained Type - These sandboxes will be created to address user level resource consumption enforcement along with co-ordinator level request accounting. Since co-ordinator request can span across multiple indices it is highly likely that there will be conflicting sandbox resoulutions for the co-ordinator level request. The accounting for these type of sandboxes is derived from the reserved type sandboxes. This sandbox will have user as the only selection attribute. Sum of a system resource for all such sandboxes can exceed the value 100. Now since this is a kind of abstracted from reserved type sandboxes (as user level sandbox on non co-ordinator node will sum up the shard level task resource usages for the user). At any point in time the sum of a resource across all the co-ordinator level sandboxes will not exceed 100. The low and high limits for a resource will exactly same as of reserved type sandbox. Now with this sandbox the only distinction is to track the co-ordinator and user level traffic.

  3. Default Type - This will be the default sandbox which will act as catch all for the requests which could not resolve into any of the sandboxes. This will have the least priority. We will keep one for co-ordinator level tasks and one for shard level tasks.

Sandbox_Workflow

Tracking and Cancellation flow diagram

tracking_and_cancellation

Related component

Search:Resiliency

Describe alternatives you've considered

No response

Additional context

No response

jainankitk commented 8 months ago

@kaushalmahi12 - Thank you for taking stab at documenting this. While this captures some of the aspects we talked about offline, it is missing few things and some of the things are still unclear:

Sum of a system resource for all such sandboxes should not exceed the value 100. It will have all the attributes such as, index type, user and index pattern as mandatory attributes.

Why are all the attributes mandatory for reserved type sandboxes?

This sandbox will have user as the only selection attribute. Sum of a system resource for all such sandboxes can exceed the value 100.

Why do we have these conditions for constrained sandboxes?

kaushalmahi12 commented 8 months ago

@jainankitk Thanks for going through this!. But If we don't make the attributes mandatory for reserved type sandboxes, we may have multiple sandboxes resolving for a single request. These three attributes uniquely identifies a request. If we don't mandate the attributes then we can have 2^n different configurations for n attributes.

peternied commented 8 months ago

[Triage - attendees 1 2 3] Thanks for filing