Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads

motivating use case:

Latency-sensitive applications(eg, memcached) may have load variation over time. Peak load requires significantly more cores than average load. So many cores are wasted when the load is low.

To improve CPU utilization, people use multiplexing in datacenter network: run multiple applications on the same server.

latency-sensitive: load varies over time. latency sensitive.
batch processing: don't require low latency. need high throughput

motivating experiment:

Run memcached(1us to response to client) and a batch processing job(use any CPU cycles not used by memcached) on a server. The highest processing capacity of memcached is 6M req/s

Goal: best case in theory. when input rate of memcached<6M, memcached achieves low latency. The lower input rate for memcached job, the higher throughput we can get for batch processing job.

Linux: high latency even with low input rate for memcached job. Also low thoughput for batch job

ZygOS: can not multiplex. dedicate all CPU cores to memcached job -> no throughput for batch processing job

No existing approach provides high network performance and high CPU efficiency simultaneously.

Goal

Reallocate cores across applications at microsecond granularity

pentium3 / sys_reading

Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads #195