sampsyo / cs6120

advanced compilers
https://www.cs.cornell.edu/courses/cs6120/2023fa/
MIT License
709 stars 157 forks source link

Proposal: JITNIC #393

Closed zachary-kent closed 7 months ago

zachary-kent commented 9 months ago

eBPF is an API and language that allows code to be installed dynamically into the kernel and run with various “hooks.” A verifier ensures that this code terminates and does not exhibit any unsafe behavior, such as writes to privileged memory locations. eBPF code installed can communicate with user space code using “maps” and a message-passing model. Historically, eBPF was developed for packet-processing, but due to its versatility can be used for more general applications as well (such as the amazing bpftrace tool). The eBPF language itself resembles a restricted subset of x86 and is executed by a VM that prevents runtime security violations.

With the emergence of software defined networking within the network core, switches have become more programmable, allowing for on the fly changes to switch behavior, without having to fabricate a new chip. This paradigm shift has led to new DSLs for specifying switch behavior and an entire ecosystem of tools and compilers for testing and running programs on switch hardware. A recent advancement in this direction is introducing programmability to the NIC, which is historically a specialized hardware unit for receiving and transmitting packets. Programmability in the NIC allows application and network level processing to be offloaded from the CPU to the NIC, where specialized hardware units can accelerate the computations.

Despite the specialized hardware for packet processing that exists in the NIC, eBPF programs cannot take advantage of it, as they currently only execute on CPU cores. (Insert some overused cliche about the end of Moore's law) We aim to accelerate packet-processing and other eBPF computations for networked applications by building a tracing JIT that offloads program traces of straight-line code to a pipelined NIC, which contains “read-modify-write” memory stages and other specialized functional units. On receiving a packet, it first traverses the “slow path” through the CPU, where we compute the path conditions–predicates over memory locations and packet data–that determine when a packet will traverse that trace. Doing so “flattens” a branching program into a set of straight line traces. In a sense, we compute the “equivalence class” of packets that will cause any given trace to be executed. Then, we compile this trace to the bytecode executable by the NIC and install it there. Finally, we process this packet by executing the newly installed trace on the NIC; packets received in the future falling into the same equivalence class will also be processed by executing this trace.

All of this sounds super easy, right? Unfortunately, we anticipate some especially challenging problems, notably dealing with memory and consistency between the NIC and the CPU. For example, a trace executed on the pipelined NIC may only perform a number of memory operations bounded by the number of memory stages in the pipeline; any more would require falling back to a “slow path” on the CPU. While this packet is being processed on the CPU, we would have to stall the NIC pipeline to avoid potential read-after-write dependencies between the packet on the slow path and those next in the sequence. As a simplifying assumption, we will only deal with traces with a bounded number of memory operations and assume all packets can be processed on the NIC. To help keep this project tractable, we have created a timeline of milestones we intend to hit.

How will you empirically measure success? We’re hoping to have something that at least kind of works. Jokes aside, there is a wealth of existing eBPF programs in the Linux kernel, which we will use to benchmark the performance of our implementation. As a baseline, we will execute these programs normally; then, we will execute these programs accelerated on the NIC and compare their performance. We also plan on doing at least one case study of a “realistic” program and show how it gets compiled to the NIC to improve the performance.

Team members: @bennyrubin @zachary-kent

sampsyo commented 9 months ago

Sounds awesome; I'm interested to see where this goes. The two big questions I have are:

Since this pair of questions seems so critical to the success of the vision, what do you think about trying to answer them soon—like, within 1-2 weeks—so we can have a clearer idea of what the compiler will entail?