openobserve / openobserve

🚀 10x easier, 🚀 140x lower storage cost, 🚀 high performance, 🚀 petabyte scale - Elasticsearch/Splunk/Datadog alternative for 🚀 (logs, metrics, traces, RUM, Error tracking, Session replay).
https://openobserve.ai
GNU Affero General Public License v3.0
10.65k stars 384 forks source link

Search Queue resource management #2506

Closed hengfeiyang closed 4 months ago

hengfeiyang commented 7 months ago

Which OpenObserve functionalities are relevant/related to the feature request?

log search

Description

We are using a global queue to control the search resources now, we can only concurrently process one request, other requests need to wait in a queue, as all resources are allocated to the executing request to finish processing requests as fast as possible. This design causes other requests to wait in the queue even when it has to search for the last 5 minutes of data, till current executing search request is done. We need a better solution for managing the search queue.

Proposed solution

Search Queue resource management

What resources are consumed by a search request?

  1. CPU
  2. Memory
  3. IO

What problems can occur with concurrent execution of search requests?

  1. Since CPU resources are limited, creating additional threads to execute new searches will increase the burden on system scheduling, meaning that all search processing will slow down.
  2. Because memory resources are finite and unlike CPU scheduling, if memory is fully occupied, executing new searches can lead to memory overload and trigger OOM (Out of Memory) issues.
  3. Although IO resources are also limited like CPU scheduling, it only leads to slower performance without causing system crashes.

So what should we manage in search resource management?

From the analysis above, we know that not restricting CPU or IO won't cause serious issues; it would just make multiple searches run slower. Moreover, our historical data analysis shows that in most cases the usage of CPUs is not high because they're idle due to IO throughput limitations. The only unrestricted resource that could potentially crash the system is memory.

How should we manage memory allocation?

The simplest principle is allocation on demand: allocate less memory for short queries and more for long queries.

To ensure concurrency control as well, we need to limit the total amount of memory used per query so every query has access to some resources.

Design a Search Resource Management

  1. Allocate resources based on whether it's a short or long query.
  2. Define two resource groups with 3 environments to limit the resource on each querier node:
    • O2_SEARCH_GROUP_x_MAX_MEMORY, should be a percentage of the total Datafusion memory.
    • O2_SEARCH_GROUP_x_MAX_CPU, should be a percentage of the total CPU cores.
    • O2_SEARCH_GROUP_x_MAX_CONCURRENCY, should be a fixed number, minimal 1.
      1. Long query group
        • O2_SEARCH_GROUP_LONG_MAX_MEMORY = 80%
        • O2_SEARCH_GROUP_LONG_MAX_CPU = 80%
        • O2_SEARCH_GROUP_LONG_MAX_CONCURRENCY = 2
      2. Short query group
        • O2_SEARCH_GROUP_SHORT_MAX_MEMORY = 20%
        • O2_SEARCH_GROUP_SHORT_MAX_CPU = 20%
        • O2_SEARCH_GROUP_SHORT_MAX_CONCURRENCY = 4
  3. The amount of available memory per query request equals  O2_SEARCH_GROUP_MAX_MEMORY / O2_SEARCH_GROUP_MAX_CONCURRENCY. For example, if total system memory is 10GB then Datafusion has access to 50%, which amounts to 5GB; therefore, long-query groups have access to 80% equating to 4GB and supporting 2 concurrent processes means each search request can use up to 2GB of RAM.
  4. The search request will always use all of the CPU cores in its group.
  5. Search requests exceeding concurrency limits will be queued and executed in FIFO order.

User Quota-Based Resource Management

On top of global resource management settings, we can add user quota-based design elements.

For example, we limit a particular user’s quota to 2 CPUs and 1GB RAM with a max_concurrency of 2. When calculating resources for a search request, we first determine a global quota that using the above example, a long search under global settings could use up to 2GB RAM. Applying the user’s quota 512MB (1GB / 2 = 512MB) the maximum usable RAM would be halved from their allowance resulting in 512MB, and restrict the CPUs to a single core. Thus, this particular search request would utilize 512MB of RAM and 1 CPU core.

Of course, user requests are still subject to global concurrency limits; if the number of current processing requests exceeds the global limit, the user’s request will also be queued.

How to calculate whether the search request is a long query or a short query?

We assume the search speed is 1GB, O2_SEARCH_GROUP_BASE_SPEED=1024, this is configurable. We assume greater than 10s is a long query, O2_SEARCH_GROUP_BASE_SECS=10, this is configurable. We know the total CPU cores of the queries in the cluster.

We also know the scan_size of a search request, and then we can calculate the predicted seconds:

let cpu_cores = min(USER_CPU_QUOTA, (TOTAL_CPU_CORES * O2_SEARCH_GROUP_SHORT_MAX_CPU));
let predict_secs = scan_size / O2_SEARCH_GROUP_BASE_SPEED / cpu_cores;
if predict_secs > O2_SEARCH_GROUP_BASE_SECS {
    // this is a long query
} else {
    // this is a short query
}

How to decide Long or Short query

Short query

We fire a query and we know that the scan_size is 100GB, then 100GB / 1GB (base_speed set using O2_SEARCH_GROUP_BASE_SPEED) / 32 CPU cores = 3s, and the base_secs (set using O2_SEARCH_GROUP_BASE_SECS) is 10s, so we decide this is a short query.

Long query

We fire a query and we know that the scan_size is 1TB, then 1024GB / 1GB (base_speed) / 32 CPU cores = 31s, and the base_secs is 10s, so we decide this is a long query.

How to decide the MAX_CONCURRENCY

This is based on your resource and the response time that you expect.

For example. We have 160 CPU cores and we assume the search speed is 1GB/core/secs, then we know if we want to search 1TB data, it need 1024GB / 1GB / 160 CPU = 6.4s, but this is for single request.

If you want to support two requests processing in parallel, then each request can use 50% resource it means only 80 CPU cores, then concurrent 2 requests and both search for 1TB data, each request will response in 1024GB / 1GB / 80 CPU = 12.8s.

If you set the O2_SEARCH_GROUP_LONG_MAX_CONCURRENCY = 4, then it will be:

If you set the O2_SEARCH_GROUP_LONG_MAX_CONCURRENCY = 10, then it will be:

gaby commented 7 months ago

I noticed this the other day, normally I have one tab with a dashboard, as soon as I open another tab with the logs search it all slows down.

Sometimes queries never come back and I have to restart the tab.

You can also see this happening in a dashboard running multiple queries, they get results sequentially.

hengfeiyang commented 7 months ago

I noticed this the other day, normally I have one tab with a dashboard, as soon as I open another tab with the logs search it all slows down.

Sometimes queries never come back and I have to restart the tab.

You can also see this happening in a dashboard running multiple queries, they get results sequentially.

Yep, that is because the search queue.

gaby commented 7 months ago

I noticed this the other day, normally I have one tab with a dashboard, as soon as I open another tab with the logs search it all slows down. Sometimes queries never come back and I have to restart the tab. You can also see this happening in a dashboard running multiple queries, they get results sequentially.

Yep, that is because the search queue.

Another thing I noticed. Once more than 1 user is running queries performance is really bad.

This defeats the purpose of using O2, having +2 users using a dashboard basically locks the system until each query is done.

hengfeiyang commented 7 months ago

@gaby Yep, that is because the search queue. this feature is planning to improve it.

Yadumathur commented 7 months ago

Couple of additions I see in Quota management , In ideal enviornment we would expect the Global settings to further split into group level settings tied to RBAC. Example : ( consider example of AD group where you have 10 users ) All 10 users can max run 1 query at a time if concurrency is set to 10 , adds up to 10. futher adding user level quota drill down setting at user quota wont let any user run more than X number of searches.

  1. in case user hits its quota his next search gets queued.
  2. In case group hits its quota , any user in that group search gets queued.
  3. Global limits should get hit at the last.
Yadumathur commented 7 months ago

Users can write bad queries which can still take over the CPU and Memory. ( assuming search speed and CPU secs for predictions may not work everytime )

  1. Set guardrails mechanisms over the search criterias which allows / denies bad searches. ( This can be configurable - alowlist / denylist)
  2. Are we planning to implement Cgroups to safeguard underlying resources ?