gramineproject / graphene

Graphene / Graphene-SGX - a library OS for Linux multi-process applications, with Intel SGX support
https://grapheneproject.io
GNU Lesser General Public License v3.0
768 stars 260 forks source link

Improving Fork Performance with Zombie Pools #1993

Open vahldiek opened 3 years ago

vahldiek commented 3 years ago

Description of the Problem

In Linux-SGX PAL fork is implemented via forking to a new process, creating a new SGX enclave, and restoring a memory checkpoint from the parent process/SGX enclave. As a result, applications using fork system call suffer from high overheads to create new processes (and SGX enclaves) when compared to their non-SGX alternatives. The main overhead stems from creating an SGX enclave possibly with GB's of enclave memory for every fork. This pattern is common among server applications such as Apache HTTP, nginx, or redis.

Proposed Solution: Zombie Pools

We suggest to amortize the time to create processes over several fork invocations. We recognize that a forked process could be reused subsequent to the exit system call by another fork in a different process. This would allow to instantiate a new Graphene process without requiring reinitializing the SGX enclave or create a new process and only requires to cleanup and restore a new checkpoint.

While this idea can be implemented in a general way with a global zombie pool, we think that initially it should be implemented as a per process zombie pool. This holds several advantages which simplify the implementation and does not require global coordination and applies to the major impacted workloads such as server applications.

Initially when Graphene starts, it starts as usual creating an enclave. Once this process forks for the first time, it would work as it does today (creating a new process, creating a new enclave, and restoring a checkpoint). Once the child has finished and called exit(), instead of exiting the process the child would notify the parent about the exit and wait on a response from the parent. On the parent side, the exit message from the child results in storing the zombie child in a free list. This free list is used once a new fork occurs within the parent. At this point Graphene would reuse the zombie child by issuing a new checkpoint. At this point it skips the creation of a new process via fork and creating a new SGX enclave.

We're assuming that the child exited with a successful exit code. In addition, this only works for fork and we do not consider exec, since exec loads a different manifest with different layouts and MRENCLAVE. Using it for exec is possible but requires additional considerations such as zombie pools per manifest. Also once a parent exits, it informs all children to exit. This limits the length of zombie pool chains to a single child. While this limits the applicability, we think it is important to not leave exhaustive amounts of resources unused. We therefore suggest the following lifecycle for processes:

Normal mode: Process started as before

Zombie mode: Process exited and will wait on message from parent

Implementation Details

We suggest an implementation in the libraryOS layer. Such optimization should be available to all PAL layers to optimize their fork performance. We briefly structure the work into 4 main tasks and describe their possible implementation.

What does this not solve?

The described approach and its implementation suggestion is limited at two points. First, it does not support exec which is common in applications that rely on the system libc function to spawn new shell executions. Second, it does not allow chains of zombies pools to exist. As a result, the particular case where an application executes sh -c ldconfig in a new process is not speed up (only the first invocation of sh may use a zombie from a pool, the subsequent fork into ldconfig has no zombie). While we think that these cases are common, they typically appear several times at the beginning of an application while forking could occur throughout the lifetime of the application. In addition, the approach could be altered to allow for these cases eventually and further improve performance of more use cases.

We would like to solicit your feedback on the proposal.

dimakuv commented 3 years ago

Thanks, Anjo.

This helps greatly for applications that frequently fork children during runtime. E.g., web/database applications that fork a child for every client connection (PostgreSQL). On one of such applications, we observe that a typical run (with ~500 forks) takes 3 hours instead of 5 minutes (36x runtime overhead) due to enclave creation on every fork.

yamahata commented 3 years ago

When recycling zombie, its state needs to be re-initialized before receiving checkpoint. i.e. bring its statue into known (initial) state. There are several ways.

For memory, One simple way is to stash the original image of PAL and LibOS(and app binary image) in reserved area as read only and copy into the actual area. If we can trust the file of PAL and LibOS(e.g. by checking hash value), re-reading them into memory will be another option. This implies some small executable is needed in addition to Pal and LibOS to handle it. Another approach is to make LibOS release all the unused memory on shim_do_exit().(or reinitiazation). I'm not sure how hard it would be without auditing the code in such context.

For other resources, e.g. opened file, they needs to be released on exit correctly. Anyway LibOS is tracking them to some extent.

Once re-initialization is implemented and hash value for executable is known, zombie approach would be applied to exec case.

some random thoughts:

mkow commented 3 years ago

I think we'll have to wait with implementing this until we rewrite IPC (#2107).