oxidecomputer / opte

packets go in, packets go out, you can't explain that
Mozilla Public License 2.0
37 stars 8 forks source link

better separation between API vs. engine, better code/crate structure #108

Open rzezeski opened 2 years ago

rzezeski commented 2 years ago

Shortly after writing up #107 (based on experience in #106) I realized I didn't like the solution it presented (typical). I was conflating various concerns around generic OPTE vs. the specific implementation of VPC, user vs. kernel context, and cargo feature flags. After a bunch of thinking this is a new issue to supplant #107 with a new plan, but based on some of the work already done in #106. But first, I think it helps to state some high-level aspects of OPTE and the Oxide VPC in order to better understand the environment in which I'm trying to make good decisions.

First and foremost, OPTE is a packet transformation engine. You hook it up to some entity that speaks TCP/IP, set the engine's policy via the API, and let the traffic flow. While this engine (currently in opte-core) is built with the illumos kernel in mind, it is also largely agnostic to both the illumos kernel and to running in kernel context in general. That is, it would probably take little work to also have OPTE run in userland as well. In fact, its unit/integration tests are run via cargo test, which isn't far removed from a persistent userland process. That said, OPTE must be able to run in kernel context, as that is it's primary use case; therefore it must have no_std support. This constraint means that OPTE is limited in the crates it may use. This is true not only because it must build no_std, but also just for the general reason that it must run in kernel and interrupt contexts which come with different considerations than userland (see #105).

OPTE is made up of ports, which sit on a virtual switch (this is not currently true but its the direction we are heading). Each port's policy is configured in terms of a stack of layers, where each layer contains a set of inbound and outbound rules. Attached to those rules are actions which act on the packet as it matches the rule. A rule matches only when its predicates are true. As flows are established these actions are cached and packet processing becomes a matter of matching the flow to the cached action.

These ports have clients. The client is the entity that sits on the other side of the "link" attached to the port. In the case of the Oxide VPC the client is the guest VM: its virtual NIC has a virtual link which attaches to OPTE's virtual Port. This part of OPTE is also referred to as the "data plane" or the "engine".

We need some way to affect policy in these ports: the configuration of the layers, rules, and actions. This programming of policy is part of the "control plane". Once again, using Oxide VPC as an example, this comes in the form of Omicron. The Omicron system has a Sled Agent running on each host, and that agent makes calls to the OPTE control plane in order to program the individual ports in a manner consistent with the data stored in Nexus. We refer to Sled Agent as a consumer of OPTE.

The consumer needs someway to call into OPTE's control plane in order to program the ports: this is done via the API. However, the API also serves as a sort of "firewall" (in the automotive sense) between the consumer and the engine itself. It provides the means by which to set policy based on high-level semantics without concern for how the engine is enforcing said policy via its private, unstable interfaces. The consumer and the engine enter a contract in the form of the API; as long as both sides honor that contract there should be no problems (outside of bugs which this developer will surely introduce).

In #106 and #107 I was overly focused on this separation between the consumer and the engine, so much so that to me it felt sensible to have a separate crate for the API (opte-api). While it's not the worst choice in the world it's no longer a choice I like. The thing is, the separation really comes from the ioctl layer and from good module and type hygiene (to provide data encapsulation and proper API visibility). Using a dedicated crate makes this more visceral for sure, but it doesn't actually enforce anything. However, for Rust/Cargo reasons the userland side of the ioctl code does need its own crate. That code relies on libnet which pulls in many deps and assumes a std environment, causing the xde build to fail even when the libnet dep is eliminated using features. Also it turns out keeping that code in a separate crate keeps things cleaner anyways (less deps and feature flag games that opte-core needs to play).

So what am I thinking now?

First, there is some good work in #106. I'm going to use that as a base and start by putting the API back into opte-core. Along with that I want to make some organizational changes to help separate things. Finally, I want to rename opte-core to just opte.

The organizational changes are centered around the idea of exposing various parts of the code based on feature flags.

This is just the start of the changes. Now it's time to bring up the real elephant in the room: the oxide-specific APIs vs. the generic API.

OPTE is a generic packet transformation engine. Yes, it has Oxide in the name. Yes, our first main use case is as a means to provide the Oxide VPC network to guests. However, none of that is inherent to the core design of OPTE, to its engine. The engine only cares about layers, rules, and actions (there are some other abstractions, but those are the main ones). You implement policy by creating specific values of these abstractions and combining them in various ways. For example, the Oxide VPC NAT creates rules which predicate on destination IP and potentially rewrite the inner source and destination IP address. This is high-level policy enacted by specific construction of generic types. If you look at opte-core you'll see the oxide_net namespace, all of those are specific to the Oxide VPC and should not live in that crate.

This also leaks into opteadm, which has commands for both the specific VPC abstractions (set-v2p) as well as the generic OPTE abstractions (list-layers). The problem I face is how to provide consumer-specific APIs and commands from a generic engine? For a while I thought maybe some sort of callback mechanism to register consumer-specific abstractions; but that gets complicated and feels very framework-ish.

There are various ways to think about this problem and how best to solve it. In fact I spent entirely too long writing up many different ideas in my head. The more I think about it the more it feels like a problem to delay a while longer. However, there are some ideas that fell out that seem worthwhile to do now.

I'm going to do this work in a few phases just so it's easy to trace the source code movement when looking at the commit history. The phase 1 work will start with #106 as it base.

rzezeski commented 2 years ago

Ping to self: I need to come back and write separate issues for phase 6 and phase 7.