Shortly after writing up #107 (based on experience in #106) I realized I didn't like the solution it presented (typical). I was conflating various concerns around generic OPTE vs. the specific implementation of VPC, user vs. kernel context, and cargo feature flags. After a bunch of thinking this is a new issue to supplant #107 with a new plan, but based on some of the work already done in #106. But first, I think it helps to state some high-level aspects of OPTE and the Oxide VPC in order to better understand the environment in which I'm trying to make good decisions.
First and foremost, OPTE is a packet transformation engine. You hook it up to some entity that speaks TCP/IP, set the engine's policy via the API, and let the traffic flow. While this engine (currently in opte-core) is built with the illumos kernel in mind, it is also largely agnostic to both the illumos kernel and to running in kernel context in general. That is, it would probably take little work to also have OPTE run in userland as well. In fact, its unit/integration tests are run via cargo test, which isn't far removed from a persistent userland process. That said, OPTE must be able to run in kernel context, as that is it's primary use case; therefore it must have no_std support. This constraint means that OPTE is limited in the crates it may use. This is true not only because it must build no_std, but also just for the general reason that it must run in kernel and interrupt contexts which come with different considerations than userland (see #105).
OPTE is made up of ports, which sit on a virtual switch (this is not currently true but its the direction we are heading). Each port's policy is configured in terms of a stack of layers, where each layer contains a set of inbound and outbound rules. Attached to those rules are actions which act on the packet as it matches the rule. A rule matches only when its predicates are true. As flows are established these actions are cached and packet processing becomes a matter of matching the flow to the cached action.
These ports have clients. The client is the entity that sits on the other side of the "link" attached to the port. In the case of the Oxide VPC the client is the guest VM: its virtual NIC has a virtual link which attaches to OPTE's virtual Port. This part of OPTE is also referred to as the "data plane" or the "engine".
We need some way to affect policy in these ports: the configuration of the layers, rules, and actions. This programming of policy is part of the "control plane". Once again, using Oxide VPC as an example, this comes in the form of Omicron. The Omicron system has a Sled Agent running on each host, and that agent makes calls to the OPTE control plane in order to program the individual ports in a manner consistent with the data stored in Nexus. We refer to Sled Agent as a consumer of OPTE.
The consumer needs someway to call into OPTE's control plane in order to program the ports: this is done via the API. However, the API also serves as a sort of "firewall" (in the automotive sense) between the consumer and the engine itself. It provides the means by which to set policy based on high-level semantics without concern for how the engine is enforcing said policy via its private, unstable interfaces. The consumer and the engine enter a contract in the form of the API; as long as both sides honor that contract there should be no problems (outside of bugs which this developer will surely introduce).
In #106 and #107 I was overly focused on this separation between the consumer and the engine, so much so that to me it felt sensible to have a separate crate for the API (opte-api). While it's not the worst choice in the world it's no longer a choice I like. The thing is, the separation really comes from the ioctl layer and from good module and type hygiene (to provide data encapsulation and proper API visibility). Using a dedicated crate makes this more visceral for sure, but it doesn't actually enforce anything. However, for Rust/Cargo reasons the userland side of the ioctl code does need its own crate. That code relies on libnet which pulls in many deps and assumes a std environment, causing the xde build to fail even when the libnet dep is eliminated using features. Also it turns out keeping that code in a separate crate keeps things cleaner anyways (less deps and feature flag games that opte-core needs to play).
So what am I thinking now?
First, there is some good work in #106. I'm going to use that as a base and start by putting the API back into opte-core. Along with that I want to make some organizational changes to help separate things. Finally, I want to rename opte-core to just opte.
The organizational changes are centered around the idea of exposing various parts of the code based on feature flags.
By default opte would expose only the types and APIs needed by the control plane (consumer), and this feature would be called api.
The rest of opte, the engine bits, would sit behind an engine feature.
There would still exist a std feature which adds additional methods to the API types that are useful in std environments.
A new OpteHdl interface will be created and live in a new opte-ioctl crate. It will being life with just the APIs needed by Sled Agent. The sled-agent crate will depend on opte (currently opte-core) and opte-ioctl.
The types and methods used for packet processing will sit behind the engine feature of opte.
The opteadm crate will also make use of opte-ioctl, but just for the use of run_cmd_ioctl() for the moment. It still needs its own OpteAmd in order to support all commands. This also means opteadm needs to set the engine feature of opte.
In order to keep the code cleaner these flags would apply to modules. The current flat structure of opte-core would be replaced with top-level modules like api and engine.
This is just the start of the changes. Now it's time to bring up the real elephant in the room: the oxide-specific APIs vs. the generic API.
OPTE is a generic packet transformation engine. Yes, it has Oxide in the name. Yes, our first main use case is as a means to provide the Oxide VPC network to guests. However, none of that is inherent to the core design of OPTE, to its engine. The engine only cares about layers, rules, and actions (there are some other abstractions, but those are the main ones). You implement policy by creating specific values of these abstractions and combining them in various ways. For example, the Oxide VPC NAT creates rules which predicate on destination IP and potentially rewrite the inner source and destination IP address. This is high-level policy enacted by specific construction of generic types. If you look at opte-core you'll see the oxide_net namespace, all of those are specific to the Oxide VPC and should not live in that crate.
This also leaks into opteadm, which has commands for both the specific VPC abstractions (set-v2p) as well as the generic OPTE abstractions (list-layers). The problem I face is how to provide consumer-specific APIs and commands from a generic engine? For a while I thought maybe some sort of callback mechanism to register consumer-specific abstractions; but that gets complicated and feels very framework-ish.
There are various ways to think about this problem and how best to solve it. In fact I spent entirely too long writing up many different ideas in my head. The more I think about it the more it feels like a problem to delay a while longer. However, there are some ideas that fell out that seem worthwhile to do now.
Replace VPC-specific Actions with generic ones where possible. For example, the DHCP and ICMP actions are generic because there could be other consumers besides VPC that want them.
Add Action, Rule, and Layer creation to the API.
Rename oxide_net to oxide_vpc. Put it behind a vpc feature flag.
Remove the VPC commands from opteadm, move them to a new vpcadm crate.
Add VPC-centric commands to vpcadm. For example, add a show-firewall command that displays the firewall in a VPC-centric manner (as opposed to the OPTE-centric manner it does now).
I'm going to do this work in a few phases just so it's easy to trace the source code movement when looking at the commit history. The phase 1 work will start with #106 as it base.
[x] Phase 1: Move opte-api back into opte-core.
[x] Add an api feature to opte-core. Restructure opte code so that api code is in its own top-level namespace.
[x] Add engine feature to opte-core. The engine code will remain in its current flat namespace to start so that it can be moved in its own commit.
[x] Move OpteHdl to a new opte-ioctl crate. Delete the opte-api crate.
[x] Change xde to depend on opte-core with engine feature.
[x] Verify that using opte-core (api + std) and opte-ioctl works with #88 test, as that's the canary for working with sled-agent.
[x] Since opteadm requires all commands (some of which will still be in engine), it will import opte-core with the engine feature set until the API is fully moved into opte-ioctl.
[x] Phase 2: Rename opte-core to opte.
[x] Phase 3: Move engine code into engine namespace.
[x] Phase 4: Rename oxide_net to oxide_vpc. Put it behind vpc feature.
[x] Phase 5: Replace some VPC-specific actions with generic ones.
[x] Replace VPC DHCP action with a generic DHCP action.
[x] Replace VPC Icmp4Reply action with generic action.
Shortly after writing up #107 (based on experience in #106) I realized I didn't like the solution it presented (typical). I was conflating various concerns around generic OPTE vs. the specific implementation of VPC, user vs. kernel context, and cargo feature flags. After a bunch of thinking this is a new issue to supplant #107 with a new plan, but based on some of the work already done in #106. But first, I think it helps to state some high-level aspects of OPTE and the Oxide VPC in order to better understand the environment in which I'm trying to make good decisions.
First and foremost, OPTE is a packet transformation engine. You hook it up to some entity that speaks TCP/IP, set the engine's policy via the API, and let the traffic flow. While this engine (currently in opte-core) is built with the illumos kernel in mind, it is also largely agnostic to both the illumos kernel and to running in kernel context in general. That is, it would probably take little work to also have OPTE run in userland as well. In fact, its unit/integration tests are run via
cargo test
, which isn't far removed from a persistent userland process. That said, OPTE must be able to run in kernel context, as that is it's primary use case; therefore it must haveno_std
support. This constraint means that OPTE is limited in the crates it may use. This is true not only because it must buildno_std
, but also just for the general reason that it must run in kernel and interrupt contexts which come with different considerations than userland (see #105).OPTE is made up of ports, which sit on a virtual switch (this is not currently true but its the direction we are heading). Each port's policy is configured in terms of a stack of layers, where each layer contains a set of inbound and outbound rules. Attached to those rules are actions which act on the packet as it matches the rule. A rule matches only when its predicates are true. As flows are established these actions are cached and packet processing becomes a matter of matching the flow to the cached action.
These ports have clients. The client is the entity that sits on the other side of the "link" attached to the port. In the case of the Oxide VPC the client is the guest VM: its virtual NIC has a virtual link which attaches to OPTE's virtual Port. This part of OPTE is also referred to as the "data plane" or the "engine".
We need some way to affect policy in these ports: the configuration of the layers, rules, and actions. This programming of policy is part of the "control plane". Once again, using Oxide VPC as an example, this comes in the form of Omicron. The Omicron system has a Sled Agent running on each host, and that agent makes calls to the OPTE control plane in order to program the individual ports in a manner consistent with the data stored in Nexus. We refer to Sled Agent as a consumer of OPTE.
The consumer needs someway to call into OPTE's control plane in order to program the ports: this is done via the API. However, the API also serves as a sort of "firewall" (in the automotive sense) between the consumer and the engine itself. It provides the means by which to set policy based on high-level semantics without concern for how the engine is enforcing said policy via its private, unstable interfaces. The consumer and the engine enter a contract in the form of the API; as long as both sides honor that contract there should be no problems (outside of bugs which this developer will surely introduce).
In #106 and #107 I was overly focused on this separation between the consumer and the engine, so much so that to me it felt sensible to have a separate crate for the API (opte-api). While it's not the worst choice in the world it's no longer a choice I like. The thing is, the separation really comes from the ioctl layer and from good module and type hygiene (to provide data encapsulation and proper API visibility). Using a dedicated crate makes this more visceral for sure, but it doesn't actually enforce anything. However, for Rust/Cargo reasons the userland side of the ioctl code does need its own crate. That code relies on libnet which pulls in many deps and assumes a std environment, causing the xde build to fail even when the libnet dep is eliminated using features. Also it turns out keeping that code in a separate crate keeps things cleaner anyways (less deps and feature flag games that opte-core needs to play).
So what am I thinking now?
First, there is some good work in #106. I'm going to use that as a base and start by putting the API back into opte-core. Along with that I want to make some organizational changes to help separate things. Finally, I want to rename
opte-core
to justopte
.The organizational changes are centered around the idea of exposing various parts of the code based on feature flags.
opte
would expose only the types and APIs needed by the control plane (consumer), and this feature would be calledapi
.opte
, the engine bits, would sit behind anengine
feature.std
feature which adds additional methods to the API types that are useful instd
environments.OpteHdl
interface will be created and live in a newopte-ioctl
crate. It will being life with just the APIs needed by Sled Agent. Thesled-agent
crate will depend onopte
(currentlyopte-core
) andopte-ioctl
.engine
feature ofopte
.opteadm
crate will also make use ofopte-ioctl
, but just for the use ofrun_cmd_ioctl()
for the moment. It still needs its ownOpteAmd
in order to support all commands. This also meansopteadm
needs to set theengine
feature ofopte
.opte-core
would be replaced with top-level modules likeapi
andengine
.This is just the start of the changes. Now it's time to bring up the real elephant in the room: the oxide-specific APIs vs. the generic API.
OPTE is a generic packet transformation engine. Yes, it has Oxide in the name. Yes, our first main use case is as a means to provide the Oxide VPC network to guests. However, none of that is inherent to the core design of OPTE, to its engine. The engine only cares about layers, rules, and actions (there are some other abstractions, but those are the main ones). You implement policy by creating specific values of these abstractions and combining them in various ways. For example, the Oxide VPC NAT creates rules which predicate on destination IP and potentially rewrite the inner source and destination IP address. This is high-level policy enacted by specific construction of generic types. If you look at opte-core you'll see the
oxide_net
namespace, all of those are specific to the Oxide VPC and should not live in that crate.This also leaks into
opteadm
, which has commands for both the specific VPC abstractions (set-v2p
) as well as the generic OPTE abstractions (list-layers
). The problem I face is how to provide consumer-specific APIs and commands from a generic engine? For a while I thought maybe some sort of callback mechanism to register consumer-specific abstractions; but that gets complicated and feels very framework-ish.There are various ways to think about this problem and how best to solve it. In fact I spent entirely too long writing up many different ideas in my head. The more I think about it the more it feels like a problem to delay a while longer. However, there are some ideas that fell out that seem worthwhile to do now.
oxide_net
tooxide_vpc
. Put it behind avpc
feature flag.opteadm
, move them to a newvpcadm
crate.vpcadm
. For example, add ashow-firewall
command that displays the firewall in a VPC-centric manner (as opposed to the OPTE-centric manner it does now).I'm going to do this work in a few phases just so it's easy to trace the source code movement when looking at the commit history. The phase 1 work will start with #106 as it base.
api
feature to opte-core. Restructure opte code so thatapi
code is in its own top-level namespace.engine
feature to opte-core. The engine code will remain in its current flat namespace to start so that it can be moved in its own commit.OpteHdl
to a newopte-ioctl
crate. Delete theopte-api
crate.opte-core
withengine
feature.opte-core
(api
+std
) andopte-ioctl
works with #88 test, as that's the canary for working withsled-agent
.engine
), it will importopte-core
with theengine
feature set until the API is fully moved intoopte-ioctl
.opte-core
toopte
.oxide_net
tooxide_vpc
. Put it behindvpc
feature.Icmp4Reply
action with generic action.Rule
creation to api.Layer
creation to api.Port
creation to api.vpcadm
.opteadm
.show-firewall
command.show-router
command.dump-v2p
toshow-v2p
.