Open vahldiek opened 4 years ago
Thanks, Anjo.
This helps greatly for applications that frequently fork children during runtime. E.g., web/database applications that fork a child for every client connection (PostgreSQL). On one of such applications, we observe that a typical run (with ~500 forks) takes 3 hours instead of 5 minutes (36x runtime overhead) due to enclave creation on every fork.
When recycling zombie, its state needs to be re-initialized before receiving checkpoint. i.e. bring its statue into known (initial) state. There are several ways.
For memory, One simple way is to stash the original image of PAL and LibOS(and app binary image) in reserved area as read only and copy into the actual area. If we can trust the file of PAL and LibOS(e.g. by checking hash value), re-reading them into memory will be another option. This implies some small executable is needed in addition to Pal and LibOS to handle it. Another approach is to make LibOS release all the unused memory on shim_do_exit().(or reinitiazation). I'm not sure how hard it would be without auditing the code in such context.
For other resources, e.g. opened file, they needs to be released on exit correctly. Anyway LibOS is tracking them to some extent.
Once re-initialization is implemented and hash value for executable is known, zombie approach would be applied to exec case.
some random thoughts:
I think we'll have to wait with implementing this until we rewrite IPC (#2107).
Description of the Problem
In Linux-SGX PAL
fork
is implemented via forking to a new process, creating a new SGX enclave, and restoring a memory checkpoint from the parent process/SGX enclave. As a result, applications usingfork
system call suffer from high overheads to create new processes (and SGX enclaves) when compared to their non-SGX alternatives. The main overhead stems from creating an SGX enclave possibly with GB's of enclave memory for everyfork
. This pattern is common among server applications such as Apache HTTP, nginx, or redis.Proposed Solution: Zombie Pools
We suggest to amortize the time to create processes over several
fork
invocations. We recognize that a forked process could be reused subsequent to theexit
system call by anotherfork
in a different process. This would allow to instantiate a new Graphene process without requiring reinitializing the SGX enclave or create a new process and only requires to cleanup and restore a new checkpoint.While this idea can be implemented in a general way with a global zombie pool, we think that initially it should be implemented as a per process zombie pool. This holds several advantages which simplify the implementation and does not require global coordination and applies to the major impacted workloads such as server applications.
Initially when Graphene starts, it starts as usual creating an enclave. Once this process forks for the first time, it would work as it does today (creating a new process, creating a new enclave, and restoring a checkpoint). Once the child has finished and called
exit()
, instead of exiting the process the child would notify the parent about theexit
and wait on a response from the parent. On the parent side, theexit
message from the child results in storing the zombie child in a free list. This free list is used once a new fork occurs within the parent. At this point Graphene would reuse the zombie child by issuing a new checkpoint. At this point it skips the creation of a new process viafork
and creating a new SGX enclave.We're assuming that the child exited with a successful exit code. In addition, this only works for
fork
and we do not considerexec
, since exec loads a different manifest with different layouts and MRENCLAVE. Using it for exec is possible but requires additional considerations such as zombie pools per manifest. Also once a parent exits, it informs all children to exit. This limits the length of zombie pool chains to a single child. While this limits the applicability, we think it is important to not leave exhaustive amounts of resources unused. We therefore suggest the following lifecycle for processes:Normal mode: Process started as before
exit
Zombie mode: Process exited and will wait on message from parent
exit
, if parent doesn't exitImplementation Details
We suggest an implementation in the libraryOS layer. Such optimization should be available to all PAL layers to optimize their fork performance. We briefly structure the work into 4 main tasks and describe their possible implementation.
shim_process.h
instruct shim_process
shim_ipc_child.c
in fctipc_cld_exit_callback
)shim_exit.c
(libos_exit
andlibos_clean_and_exit
)del_all_ipc_ports
implementation into parent and all other IPClibos_clean_and_exit
wait for parent message to either terminate or restart processshim_checkpoint.c
increate_process_and_send_checkpoint
)exec
is not set)libos.fork_pooling
= 0/1libos.fork_pooling
= 1shim_init.c
and set it inshim_init.c
(~ line 500)toml_int_in
to extract the integer value oflibos.fork_pooling
What does this not solve?
The described approach and its implementation suggestion is limited at two points. First, it does not support
exec
which is common in applications that rely on thesystem
libc function to spawn new shell executions. Second, it does not allow chains of zombies pools to exist. As a result, the particular case where an application executessh -c ldconfig
in a new process is not speed up (only the first invocation of sh may use a zombie from a pool, the subsequent fork intoldconfig
has no zombie). While we think that these cases are common, they typically appear several times at the beginning of an application while forking could occur throughout the lifetime of the application. In addition, the approach could be altered to allow for these cases eventually and further improve performance of more use cases.We would like to solicit your feedback on the proposal.