Closed pentium3 closed 2 years ago
motivating use case:
Latency-sensitive applications(eg, memcached) may have load variation over time. Peak load requires significantly more cores than average load. So many cores are wasted when the load is low.
To improve CPU utilization, people use multiplexing in datacenter network: run multiple applications on the same server.
motivating experiment:
Run memcached(1us to response to client) and a batch processing job(use any CPU cycles not used by memcached) on a server. The highest processing capacity of memcached is 6M req/s
Goal: best case in theory. when input rate of memcached<6M, memcached achieves low latency. The lower input rate for memcached job, the higher throughput we can get for batch processing job.
Linux: high latency even with low input rate for memcached job. Also low thoughput for batch job
ZygOS: can not multiplex. dedicate all CPU cores to memcached job -> no throughput for batch processing job
No existing approach provides high network performance and high CPU efficiency simultaneously.
Goal
Reallocate cores across applications at microsecond granularity
https://www.usenix.org/conference/nsdi19/presentation/ousterhout