Improve function cold-start latency

Describe the enhancement We are using vHive as the cloud function platform in our work. This great project provides a white-box and tunable serverless computing infrastructure for our research. However, in the experiments, we met problems with the cold-start latency of cloud functions. We define cold-start latency as the time elapsed from when the user sends a function invocation request to when a new function instance in k8s pods is created and starts processing the request. When we invoke many cloud function instances to process a query in parallel, the query may suffer from a high cold-start latency of more than 10 seconds. It would be much appreciated if you could give us some suggestions on how to reduce the cold-start latency. We notice the cold-start latency in the ASPLOS'21 paper of vHive is much lower (< 1 second if REAP is used, as shown in Figure 9 of the paper) and is what we expect for query processing. However, in our experiment, the cold-start latency is much higher, especially when creating more than four function instances (k8s pods) on each node (k8s worker). The cumulative distribution of cold-start latency is as follows. We have tried different function images (including 'helloworld' in vHive), different memory sizes and CPUs for each function instance, different virtualization solutions (docker, firecracker, firecracker+snapshot, firecracker+REAP), and different physical environments (CloudLab, AWS, and on-perm servers, all with SSDs and >=10G network). The cold-start latency is not significantly reduced. Is there anything that we should take care of to get the low cold-start latency (~ one second) under high invocation concurrency, like Figure 9 in the paper?

vhive-serverless / vHive

Improve function cold-start latency #960