occlum / ngo

Next-Gen Occlum, a work-in-progress fork of Occlum that is optimized for the next-generation of Intel SGX (on Xeon SP processors)
Other
33 stars 17 forks source link

[RFC] Add scheduler to the Rust async runtime #18

Open ShuochengWang opened 3 years ago

ShuochengWang commented 3 years ago

The Problem

NGO uses Rust async / await to reconstruct the code and realize asynchronous. Because Rust async / await needs to be executed through Rust async runtime, NGO has implemented a basic version of Rust async runtime, namely async-rt.

With async-rt, NGO implements M : N : P in-enclave scheduling. Specifically, there are M user threads, N LibOS coroutines, and P host threads. The executor of Rust async runtime runs on P host threads (that is, P Linux threads). The executor schedules N LibOS coroutines (that is, N Rust async tasks). N LibOS coroutines contain M user threads and N - M LibOS system tasks.

However, there is no scheduler for async-rt at present. The executor of async-rt can only run tasks in its local queue. This will lead to the following problems for NGO:

  1. Unbalanced load: some host threads are busy, while others are idle. This may result in increased program latency.
  2. Waste of computing power: if the host thread is idle, it will continuously poll the local queue, which will waste CPU computing power.
  3. No priority: in NGO, some tasks have lower priority, such as some LibOS background tasks, but async-rt can't distinguish priorities at present.

In addition, in async-rt, if the task is in the pending state, the corresponding waker will be set. Invoking waker.wake() will rejoin the task to the local queue. If wake() is invoked repeatedly (It is feasible at the API level), the queue will contain multiple identical tasks, which will affect performance and scheduling.

Design Goals

Add scheduler to the Rust async runtime async-rt and make async-rt a production-level Rust async runtime with no_std. The scheduler should meet the following requirements:

The priorities of these requirements decrease in turn. We will implement these requirements in order of priority.

ShuochengWang commented 3 years ago

The results of the discussion with @tatetian :

  1. idle vcpu (the thead of the async runtime) should sleep and wakeup later. When the vcpu's run_queue is empty, the vcpu need sleep some time, eg, sleep 50ms. We can use eventfd (better) or futex in occlum to implement this feature. However, when the run_queue is empty, there may be some pending tasks on this vcpu, other vcpu need to invoke wake(), insert the task to the run_queue, and wakeup the corresbonding vcpu.
  2. performance problem of io_uring callback. In current ngo implementation, we set a sched_callback (io_uring callback) before init async runtime, then each vcpu will invoke sched_callback (eg, poll io_uring completion queue) in each event loop. Actually, It is not necessary for all vcpu to poll io_uring queue. We can use only one vcpu, or several vcpu to poll the io_uring queue. We can deprecated sched_callback, and treat polling io_uring queue as a async task.
  3. [RFC] Config the number vCPUs #16 . Implement this RFC
  4. sleep function for user task. User task maybe want to sleep some time, eg, timer. Since async-rt is a no_std runtime, there is no time function in no_std env. We can implement sleep function in ngo libos.
ShuochengWang commented 3 years ago

Related Work

ShuochengWang commented 3 years ago

Our Design Goals

A Rust async runtime with no_std, for LibOS, In particular, for SGX LibOS.

Take ngo as an example, ngo need rust async runtime with following features:

Idle thread sleep (Already supported)

If one thread is idle, this thread will sleep. When a new task is ready, the sleeping thread will be waked up.

Load balancing

  1. Assign tasks to each thread in a balanced manner
  2. Threads with fewer workload should execute some workload on other threads
  3. Try to avoid starvation
  4. Can not violate the cpu affinity.

Support priority

There are different kinds of tasks in LibOS, e.g. user tasks and LibOS tasks. User tasks generally have higher priority, while LibOS tasks maybe running in the background and with lower priority.

  1. Distinguish different kinds of tasks.
  2. Support priority in the scheduler
  3. Shutdown background tasks correctly. (background tasks may be infinite loop pattern, we need shutdown it correctly before exits)
ShuochengWang commented 3 years ago

Design Overview

Firstly, thanks to tokio's blog https://tokio.rs/blog/2019-10-scheduler, those ideas are very usefull.

Basic Idea

Task

Task priority

Task priority affects how to select tasks in the thread.

Task type

Task type affects how to assign task to threads.

CPU affinity

In our async runtime, the thead is called vcpu. Since user can bind threads to specific cpu set in OS, we allow user to bind tasks to specific vcpu set.

If one task is bound to a vcpu set, then the task can only run in those vcpus. vcpu outside the set can not steal the task.

Run queue

Each thread has three local queues. All threads share one global queue.

ShuochengWang commented 3 years ago

Assign tasks to threads

When accept one task, we need to decide which thread to run the task.

In the assign algorithm, we consider following factors:

ShuochengWang commented 3 years ago

Select one task to run

  1. high probability: select from high_priority queue.
  2. medium probability: select form mid_pri queue.
  3. low probability: select from low_pri queue.
  4. very low probability: select from global queue.

Steal

ShuochengWang commented 3 years ago

Optimization: Temporarily increase priority

Optimization: Throttle stealing (tokio's opt)