Open rzezeski opened 2 years ago
As part of this issue write a DHCPv4 test and get rid of the obsolete dhcp_req
test.
Now that I have #68 figured out I can dump the dhcp4
layer and see the problem clear as day.
root@sled1:/opt/cargo-bay# truss -x ioctl -t ioctl ~/opteadm dump-layer -p xde0 dhcp4
ioctl(3, 0xDE00001F, 0xFFFFFC7FFFDF7860) = 0
Layer dhcp4
======================================================================
Inbound Flows
----------------------------------------------------------------------
PROTO SRC IP SPORT DST IP DPORT HITS ACTION
Outbound Flows
----------------------------------------------------------------------
PROTO SRC IP SPORT DST IP DPORT HITS ACTION
Inbound Rules
----------------------------------------------------------------------
ID PRI PREDICATES ACTION
Outbound Rules
----------------------------------------------------------------------
ID PRI PREDICATES ACTION
1 1 inner.ether.dst=FF:FF:FF:FF:FF:FF inner.ether.src=A8:40:25:FF:00:01 inner.ip.src=0.0.0.0 inner.ip.dst=255.255.255.255 inner.ip.proto=UDP inner.ulp.dst=67 inner.ulp.src=68 dhcp4.msg_type=Request "HAIRPIN: DHCPv4 ACK: 10.0.0.1"
0 1 inner.ether.dst=FF:FF:FF:FF:FF:FF inner.ether.src=A8:40:25:FF:00:01 inner.ip.src=0.0.0.0 inner.ip.dst=255.255.255.255 inner.ip.proto=UDP inner.ulp.dst=67 inner.ulp.src=68 dhcp4.msg_type=Discover "HAIRPIN: DHCPv4 OFFER: 10.0.0.1"
root@iz1:~# dladm show-vnic
LINK OVER SPEED MACADDRESS MACADDRTYPE VID
vnic0 ? 0 2:8:20:d7:e9:1c random 0
root@sled1:/opt/cargo-bay# cat sled1/06-create-xde0-sled1.sh
#!/bin/bash
name=xde0
instance_mac=A8:40:25:ff:00:01
instance_ip=10.0.0.1
gateway_mac=A8:40:25:00:00:01
gateway_ip=10.0.0.254
boundary_services_addr=fd00:99::1
boundary_services_vni=99
vpc_vni=10
source_underlay_addr=fd00:1::1
./opteadm xde-create \
$name \
$instance_mac $instance_ip \
$gateway_mac $gateway_ip \
$boundary_services_addr $boundary_services_vni \
$vpc_vni \
$source_underlay_addr
The inner.ether.src=A8:40:25:FF:00:01
does not match the mac address of the VNIC (2:8:20:d7:e9:1c
). The topo scripts need to make sure these MAC addresses agree with each other.
Since the xde device is claiming the MAC address we can't fix the VNIC.
root@sled1:/opt/cargo-bay# dladm create-vnic -t -l xde0 -m A8:40:25:ff:00:01 vnic0
dladm: vnic creation over xde0 failed: MAC address reserved for use by underlying data-link
This is one reason why we need to get the VNIC out of the equation: xde is the virtual NIC.
Okay so I hacked my way to success by giving the xde device a bogus MAC address just to get it out of the way.
@@ -325,8 +348,17 @@ unsafe extern "C" fn xde_ioc_create(req: &CreateXdeReq) -> c_int {
mreg.m_callbacks = &mut xde_mac_callbacks;
- let mut src = req.private_mac.to_bytes();
- mreg.m_src_addr = src.as_mut_ptr();
+ // let mut src = req.private_mac.to_bytes();
+ //
+ // TODO Total hack to allow the VNIC to have the guest's MAC
+ // address. The VNIC **NEEDS** to have the guest's MAC address or
+ // else none of the rules will match against the source MAC address.
+ //
+ // The real answer is to stop putting VNICs atop xde. The xde
+ // device needs to sit in the place where a VNIC would usually go.
+ mreg.m_src_addr = EtherAddr::from(
+ [0xA8, 0x40, 0x25, 0x77, 0x77, 0x77]
+ ).to_bytes().as_mut_ptr();
match mac::mac_register(mreg as *mut mac::mac_register_t, &mut xde.mh) {
0 => {}
@@ -1404,7 +1436,7 @@ fn new_port(
xde_dev_name: String,
mh: *mut mac::mac_handle,
private_ip: Ipv4Addr,
- _private_mac: EtherAddr,
+ private_mac: EtherAddr,
gateway_mac: EtherAddr,
gateway_ip: Ipv4Addr,
boundary_services_addr: Ipv6Addr,
@@ -1414,9 +1446,9 @@ fn new_port(
ectx: Arc<ExecCtx>,
snat: Option<SnatCfg>,
) -> Result<Box<Port<opte_core::port::Active>>, ()> {
- let mut private_mac = [0u8; 6];
- unsafe { mac::mac_unicast_primary_get(mh, &mut private_mac) };
- let private_mac = EtherAddr::from(private_mac);
+ // let mut private_mac = [0u8; 6];
+ // unsafe { mac::mac_unicast_primary_get(mh, &mut private_mac) };
+ // let private_mac = EtherAddr::from(private_mac);
With that in place instead of running ./09-create-vnic.sh
I manually created the VNIC over xde0
.
root@sled1:/opt/cargo-bay# dladm create-vnic -t -l xde0 -m A8:40:25:ff:00:01 vnic0
root@sled1:~# zlogin iz1
[Connected to zone 'iz1' pts/3]
Last login: Wed Mar 16 11:19:18 on pts/3
The illumos Project helios-1.0.21050 February 2022
root@iz1:~# dladm show-vnic
LINK OVER SPEED MACADDRESS MACADDRTYPE VID
vnic0 ? 0 a8:40:25:ff:0:1 fixed 0
And now DHCP works.
root@iz1:~# ipadm create-addr -t -T dhcp vnic0/v4
root@iz1:~# ipadm show-addr
ADDROBJ TYPE STATE ADDR
lo0/v4 static ok 127.0.0.1/8
vnic0/v4 dhcp ok 10.0.0.1/32
lo0/v6 static ok ::1/128
root@sled1:/opt/cargo-bay# ~/opteadm dump-layer -p xde0 dhcp4
Layer dhcp4
======================================================================
Inbound Flows
----------------------------------------------------------------------
PROTO SRC IP SPORT DST IP DPORT HITS ACTION
Outbound Flows
----------------------------------------------------------------------
PROTO SRC IP SPORT DST IP DPORT HITS ACTION
Inbound Rules
----------------------------------------------------------------------
ID PRI PREDICATES ACTION
Outbound Rules
----------------------------------------------------------------------
ID PRI PREDICATES ACTION
1 1 inner.ether.dst=FF:FF:FF:FF:FF:FF inner.ether.src=A8:40:25:FF:00:01 inner.ip.src=0.0.0.0 inner.ip.dst=255.255.255.255 inner.ip.proto=UDP inner.ulp.dst=67 inner.ulp.src=68 dhcp4.msg_type=Request "HAIRPIN: DHCPv4 ACK: 10.0.0.1"
0 1 inner.ether.dst=FF:FF:FF:FF:FF:FF inner.ether.src=A8:40:25:FF:00:01 inner.ip.src=0.0.0.0 inner.ip.dst=255.255.255.255 inner.ip.proto=UDP inner.ulp.dst=67 inner.ulp.src=68 dhcp4.msg_type=Discover "HAIRPIN: DHCPv4 OFFER: 10.0.0.1"
root@sled1:~/dtrace# ./opte-trace opte-rule-match.d
MATCH DIR LAYER FLOW ACTION
YES out dhcp4 UDP,0.0.0.0:68,255.255.255.255:67 HAIRPIN: DHCPv4 OFFER: 10.0.0.1
YES out dhcp4 UDP,0.0.0.0:68,255.255.255.255:67 HAIRPIN: DHCPv4 ACK: 10.0.0.1
Actually one remaining issue is that the Classless Static Route option didn't seem to take as I see nothing in the routing able about the gateway (note in this case the Falcon topo is using 10.0.0.254
as the gateway, not 10.0.0.1
which is the IP of the zone -- this keeps breaking my brain as my home network is 10.0.0.0/24
with a .1
gateway):
root@iz1:~# netstat -rn
Routing Table: IPv4
Destination Gateway Flags Ref Use Interface
-------------------- -------------------- ----- ----- ---------- ---------
10.0.0.1 10.0.0.1 UH 2 0 vnic0
127.0.0.1 127.0.0.1 UH 2 36 lo0
That said I need to start popping my yak stack a bit here so this can probably become a new issue.
There's an open illumos issue on that. @jclulow had been looking at that in the past.
Yea, so it appears there is illumos#11990 which has a link to an open code review. This is not pressing just yet as the TGs currently use static assignment, but at some point we'll want to get eyes on that and get it landed.
EDIT: This is blocked on illumos#11990
I tested both DHCP and ICMP on a very hacked up Omicron + OPTE environment. However, after the USDT work and some rust toolchain updates I was no longer able to build that environment. Rather than spend an inordinate amount of time tracking that down I went ahead and pushed anyways. However, I also made a few adjustments to the DHCP/ICMP rule predicates, to make sure they more fully qualify the requests being made by the guest. But since I could no longer stand up my Omicron+OPTE environment I couldn't test it.
Either those additional qualifications were too stringent and broke DHCP or the illumos netstack is doing something slightly different than Linux and I need to account for it. Unfortunately, I also couldn't dump the
dhcp
layer rules because of #68.The excerpts below show the zone failing to get a DHCP reply because the
xde0
device is failing to match the packet against thedhcp4
layer. This causes it to make it to therouter
layer which rejects it, as there is no route to the virtual gateway.This last point has me thinking: we should have a least priority rule (aka high value) in the
dhcp4
layer (and reallydhcp4
+icmp
+arp
should probably all be merged into agateway
layer) that predicates on only the destination address of the virtual gateway and performsDrop
. That would prevent traffic destined for the virtual gateway from leaking past those first layers. Otherwise, I imagine the router could end up sending it out default route aka the "Internet" Gateway. Also, it just kind of makes sense to constrain the traffic from proceeding any further than it should.