oxidecomputer / opte

packets go in, packets go out, you can't explain that
Mozilla Public License 2.0
37 stars 9 forks source link

DHCP from illumos zone not working #69

Open rzezeski opened 2 years ago

rzezeski commented 2 years ago

EDIT: This is blocked on illumos#11990

I tested both DHCP and ICMP on a very hacked up Omicron + OPTE environment. However, after the USDT work and some rust toolchain updates I was no longer able to build that environment. Rather than spend an inordinate amount of time tracking that down I went ahead and pushed anyways. However, I also made a few adjustments to the DHCP/ICMP rule predicates, to make sure they more fully qualify the requests being made by the guest. But since I could no longer stand up my Omicron+OPTE environment I couldn't test it.

Either those additional qualifications were too stringent and broke DHCP or the illumos netstack is doing something slightly different than Linux and I need to account for it. Unfortunately, I also couldn't dump the dhcp layer rules because of #68.

The excerpts below show the zone failing to get a DHCP reply because the xde0 device is failing to match the packet against the dhcp4 layer. This causes it to make it to the router layer which rejects it, as there is no route to the virtual gateway.

This last point has me thinking: we should have a least priority rule (aka high value) in the dhcp4 layer (and really dhcp4 + icmp + arp should probably all be merged into a gateway layer) that predicates on only the destination address of the virtual gateway and performs Drop. That would prevent traffic destined for the virtual gateway from leaking past those first layers. Otherwise, I imagine the router could end up sending it out default route aka the "Internet" Gateway. Also, it just kind of makes sense to constrain the traffic from proceeding any further than it should.

root@iz1:~# ipadm create-addr -t -T dhcp vnic0/v4
ipadm: warning: Communication with dhcpagent timed out

root@sled1:~/dtrace# ./opte-trace opte-port-process.d 
DIR NAME         FLOW                                        MBLK               RESULT
Out xde0         UDP,0.0.0.0:68,255.255.255.255:67           0xfffffe03964b2540 Drop { reason: Layer { name: "router" } }
Out xde0         UDP,0.0.0.0:68,255.255.255.255:67           0xfffffe03964b2540 Drop { reason: Layer { name: "router" } }

root@sled1:~/dtrace# ./opte-trace opte-rule-match.d 
MATCH  DIR LAYER        FLOW                                        ACTION
NO     out dhcp4        UDP,0.0.0.0:68,255.255.255.255:67           --
NO     out icmp         UDP,0.0.0.0:68,255.255.255.255:67           --
NO     out arp          UDP,0.0.0.0:68,255.255.255.255:67           --
YES    out router       UDP,0.0.0.0:68,255.255.255.255:67           DENY
rzezeski commented 2 years ago

As part of this issue write a DHCPv4 test and get rid of the obsolete dhcp_req test.

rzezeski commented 2 years ago

Now that I have #68 figured out I can dump the dhcp4 layer and see the problem clear as day.

root@sled1:/opt/cargo-bay# truss -x ioctl -t ioctl ~/opteadm dump-layer -p xde0 dhcp4
ioctl(3, 0xDE00001F, 0xFFFFFC7FFFDF7860)    = 0
Layer dhcp4
======================================================================
Inbound Flows
----------------------------------------------------------------------
PROTO  SRC IP           SPORT  DST IP           DPORT  HITS     ACTION                

Outbound Flows
----------------------------------------------------------------------
PROTO  SRC IP           SPORT  DST IP           DPORT  HITS     ACTION                

Inbound Rules
----------------------------------------------------------------------
ID       PRI    PREDICATES                                       ACTION            

Outbound Rules
----------------------------------------------------------------------
ID       PRI    PREDICATES                                       ACTION            
1        1      inner.ether.dst=FF:FF:FF:FF:FF:FF inner.ether.src=A8:40:25:FF:00:01 inner.ip.src=0.0.0.0 inner.ip.dst=255.255.255.255 inner.ip.proto=UDP inner.ulp.dst=67 inner.ulp.src=68 dhcp4.msg_type=Request "HAIRPIN: DHCPv4 ACK: 10.0.0.1"
0        1      inner.ether.dst=FF:FF:FF:FF:FF:FF inner.ether.src=A8:40:25:FF:00:01 inner.ip.src=0.0.0.0 inner.ip.dst=255.255.255.255 inner.ip.proto=UDP inner.ulp.dst=67 inner.ulp.src=68 dhcp4.msg_type=Discover "HAIRPIN: DHCPv4 OFFER: 10.0.0.1"

root@iz1:~# dladm show-vnic
LINK         OVER         SPEED  MACADDRESS        MACADDRTYPE         VID
vnic0        ?            0      2:8:20:d7:e9:1c   random              0

root@sled1:/opt/cargo-bay# cat sled1/06-create-xde0-sled1.sh 
#!/bin/bash

name=xde0

instance_mac=A8:40:25:ff:00:01 
instance_ip=10.0.0.1

gateway_mac=A8:40:25:00:00:01
gateway_ip=10.0.0.254

boundary_services_addr=fd00:99::1
boundary_services_vni=99

vpc_vni=10
source_underlay_addr=fd00:1::1

./opteadm xde-create \
    $name \
    $instance_mac $instance_ip \
    $gateway_mac $gateway_ip \
    $boundary_services_addr $boundary_services_vni \
    $vpc_vni \
    $source_underlay_addr

The inner.ether.src=A8:40:25:FF:00:01 does not match the mac address of the VNIC (2:8:20:d7:e9:1c). The topo scripts need to make sure these MAC addresses agree with each other.

rzezeski commented 2 years ago

Since the xde device is claiming the MAC address we can't fix the VNIC.

root@sled1:/opt/cargo-bay# dladm create-vnic -t -l xde0 -m A8:40:25:ff:00:01 vnic0
dladm: vnic creation over xde0 failed: MAC address reserved for use by underlying data-link

This is one reason why we need to get the VNIC out of the equation: xde is the virtual NIC.

rzezeski commented 2 years ago

Okay so I hacked my way to success by giving the xde device a bogus MAC address just to get it out of the way.

@@ -325,8 +348,17 @@ unsafe extern "C" fn xde_ioc_create(req: &CreateXdeReq) -> c_int {

     mreg.m_callbacks = &mut xde_mac_callbacks;

-    let mut src = req.private_mac.to_bytes();
-    mreg.m_src_addr = src.as_mut_ptr();
+    // let mut src = req.private_mac.to_bytes();
+    //
+    // TODO Total hack to allow the VNIC to have the guest's MAC
+    // address. The VNIC **NEEDS** to have the guest's MAC address or
+    // else none of the rules will match against the source MAC address.
+    //
+    // The real answer is to stop putting VNICs atop xde. The xde
+    // device needs to sit in the place where a VNIC would usually go.
+    mreg.m_src_addr = EtherAddr::from(
+        [0xA8, 0x40, 0x25, 0x77, 0x77, 0x77]
+    ).to_bytes().as_mut_ptr();

     match mac::mac_register(mreg as *mut mac::mac_register_t, &mut xde.mh) {
         0 => {}
@@ -1404,7 +1436,7 @@ fn new_port(
     xde_dev_name: String,
     mh: *mut mac::mac_handle,
     private_ip: Ipv4Addr,
-    _private_mac: EtherAddr,
+    private_mac: EtherAddr,
     gateway_mac: EtherAddr,
     gateway_ip: Ipv4Addr,
     boundary_services_addr: Ipv6Addr,
@@ -1414,9 +1446,9 @@ fn new_port(
     ectx: Arc<ExecCtx>,
     snat: Option<SnatCfg>,
 ) -> Result<Box<Port<opte_core::port::Active>>, ()> {
-    let mut private_mac = [0u8; 6];
-    unsafe { mac::mac_unicast_primary_get(mh, &mut private_mac) };
-    let private_mac = EtherAddr::from(private_mac);
+    // let mut private_mac = [0u8; 6];
+    // unsafe { mac::mac_unicast_primary_get(mh, &mut private_mac) };
+    // let private_mac = EtherAddr::from(private_mac);

With that in place instead of running ./09-create-vnic.sh I manually created the VNIC over xde0.

root@sled1:/opt/cargo-bay# dladm create-vnic -t -l xde0 -m A8:40:25:ff:00:01 vnic0

root@sled1:~# zlogin iz1
[Connected to zone 'iz1' pts/3]
Last login: Wed Mar 16 11:19:18 on pts/3
The illumos Project     helios-1.0.21050        February 2022
root@iz1:~# dladm show-vnic
LINK         OVER         SPEED  MACADDRESS        MACADDRTYPE         VID
vnic0        ?            0      a8:40:25:ff:0:1   fixed               0

And now DHCP works.

root@iz1:~# ipadm create-addr -t -T dhcp vnic0/v4
root@iz1:~# ipadm show-addr
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
vnic0/v4          dhcp     ok           10.0.0.1/32
lo0/v6            static   ok           ::1/128
root@sled1:/opt/cargo-bay# ~/opteadm dump-layer -p xde0 dhcp4
Layer dhcp4
======================================================================
Inbound Flows
----------------------------------------------------------------------
PROTO  SRC IP           SPORT  DST IP           DPORT  HITS     ACTION                

Outbound Flows
----------------------------------------------------------------------
PROTO  SRC IP           SPORT  DST IP           DPORT  HITS     ACTION                

Inbound Rules
----------------------------------------------------------------------
ID       PRI    PREDICATES                                       ACTION            

Outbound Rules
----------------------------------------------------------------------
ID       PRI    PREDICATES                                       ACTION            
1        1      inner.ether.dst=FF:FF:FF:FF:FF:FF inner.ether.src=A8:40:25:FF:00:01 inner.ip.src=0.0.0.0 inner.ip.dst=255.255.255.255 inner.ip.proto=UDP inner.ulp.dst=67 inner.ulp.src=68 dhcp4.msg_type=Request "HAIRPIN: DHCPv4 ACK: 10.0.0.1"
0        1      inner.ether.dst=FF:FF:FF:FF:FF:FF inner.ether.src=A8:40:25:FF:00:01 inner.ip.src=0.0.0.0 inner.ip.dst=255.255.255.255 inner.ip.proto=UDP inner.ulp.dst=67 inner.ulp.src=68 dhcp4.msg_type=Discover "HAIRPIN: DHCPv4 OFFER: 10.0.0.1"

root@sled1:~/dtrace# ./opte-trace opte-rule-match.d 
MATCH  DIR LAYER        FLOW                                        ACTION
YES    out dhcp4        UDP,0.0.0.0:68,255.255.255.255:67           HAIRPIN: DHCPv4 OFFER: 10.0.0.1
YES    out dhcp4        UDP,0.0.0.0:68,255.255.255.255:67           HAIRPIN: DHCPv4 ACK: 10.0.0.1
rzezeski commented 2 years ago

Actually one remaining issue is that the Classless Static Route option didn't seem to take as I see nothing in the routing able about the gateway (note in this case the Falcon topo is using 10.0.0.254 as the gateway, not 10.0.0.1 which is the IP of the zone -- this keeps breaking my brain as my home network is 10.0.0.0/24 with a .1 gateway):

root@iz1:~# netstat -rn

Routing Table: IPv4
  Destination            Gateway          Flags  Ref     Use     Interface 
-------------------- -------------------- ----- ----- ---------- --------- 
10.0.0.1             10.0.0.1             UH        2          0 vnic0     
127.0.0.1            127.0.0.1            UH        2         36 lo0  

That said I need to start popping my yak stack a bit here so this can probably become a new issue.

rmustacc commented 2 years ago

There's an open illumos issue on that. @jclulow had been looking at that in the past.

rzezeski commented 2 years ago

Yea, so it appears there is illumos#11990 which has a link to an open code review. This is not pressing just yet as the TGs currently use static assignment, but at some point we'll want to get eyes on that and get it landed.