Erms: Efficient Resource Management for Shared Microservices with SLA Guarantees

microservice resource management

calls between microservices triggered a user's request form a graph
e2e latency sensitive
microservice are overprovisioning to ensure performance

challenges:

difficult to do fine-grained scaling while satisfying SLA requirement. Microservice call graph is complicated, and e2e latency depends on all microservices within the graph.
microservice multiplexing. 40% microservices are shared by >=100 online services. services have diverse workload patterns and different SLA requirements.
resource interference is non-uniform. Performance imbalances between contains from the same microservices. A microservice usually has hundreds to thousands containers in cluster.

solution

step1: modeling microservices latency [Figure 3, sec2.2, sec5.2]
- Address Challenge 1,3
- microservices latency can be formulated with a piece-wise linear function(分段线性函数) . slope and cut-off point depend on workload and resource interference.
- For each microservice, Erms
  - performs offline profiling,
  - collect historical data including tail latency, workload(number of calls processed), CPU utilization, memory utilization
  - apply ML model (decision tree) to learn parameters in the piece-wise linear function, determining the cut-off point (σ_i) between the two linear pieces
- This model captures how the microservice's latency changes with workload, and how this relationship is affected by resource utilization (CPU and memory) [sec5.2]
step2: latency target allocation [sec4.2, sec5.3.1]
- Address Challenge 1
- Latency targets in Erms represent the maximum time allocated to each microservice to process a request while still meeting the overall end-to-end latency SLA [sec2.2]
- It first simplifies the complex dependency graph into a graph with only sequential dependencies, represented by these virtual microservices.
  - use DFS to simplify microservice graph. Merge microservices with sequential or parallel calls as virtual microservices [sec4.2, Algorithm1]
- It then computes latency targets for all microservices by solving a convex optimization problem that minimizes total resource usage while meeting SLA requirements.
  - Objective Function: Minimize the total resource usage: min Σ(n_i * R_i) [equation 2]
  - Constraint: Ensure the end-to-end latency meets the SLA: Σ(a_i * γ_i / n_i + b_i) ≤ SLA [equation 2,4]
  - calculate initial latency target based on equation 5: Eq. (5) states that the optimal latency target of each microservice is in proportion to the square root of the product of ai , workload γi , and resource demand Ri.
    - a_i represents the slope(斜率) with respect to workload in the latency function of microservice i. It quantifies how sensitive the microservice's latency is to changes in workload. A larger a_i means the microservice's latency increases more rapidly with increasing workload.
step3: Priority Scheduling and Resource Allocation [sec2.3, sec4.3, sec5.3.2, Figure 5, ]
- Address Challenge 2
- 1. Priority Scheduling:
    - 1.1. Priority Level Assignment: based on initial latency target computed in step2. For each shared microservice, Erms assigns priority levels to the services that use it. services with lower latency targets at a shared microservice are given higher priority.
    - 1.2. Priority-based Scheduling: Whenever a thread is available in a deployed container and there are requests waiting to be processed, a request from the service with higher priority will be assigned to this thread with higher probability. [sec5.3.2]
    - 1.3. Latency Target Recomputation: For each service, recomputes latency targets for all microservices in each service. This recomputation takes into account the priority assigned to each services at shared microservices. (eg: For a service with priority k, its new workload at a shared microservice is the sum of workloads from all services with priority 1 to k. )
- 1. Resource Allocation: decide how many containers to allocate to each microservice, based on the initial latency targets and priority. Each microservice container is configured with 0.1 core and 200M memory. [sec4.3]
step4: Dynamic Resource Provisioning [sec5.4]
- decides where to place the allocated containers across physical hosts to minimize resource interference and balance utilization.
- For each host, the resource unbalance is defined as the difference between the host utilization and the cluster-wide utilization.
- use POP optimization to solve this problem. The main idea is to statically divides the hosts of a cluster into multiple groups of the same size and solves a much smaller scale optimization problem.

pentium3 / sys_reading

Erms: Efficient Resource Management for Shared Microservices with SLA Guarantees #261

microservice resource management

challenges:

solution