oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
251 stars 39 forks source link

Duplicate route after switch zone receives underlay address #2750

Open jmpesp opened 1 year ago

jmpesp commented 1 year ago

After https://github.com/oxidecomputer/omicron/pull/2713, the switch zone is assigned an underlay address:

root@oxz_switch:~# ipadm | grep omicron6
oxControlService0/omicron6 static ok    fd00:1122:3344:104::2/64

But the sled that the switch zone is located on advertises the sled prefix through ddmd:

james@frostypaws:~$ ipadm | grep sled6
underlay0/sled6   static   ok           fd00:1122:3344:104::1/64
james@frostypaws:~$ /opt/oxide/mg-ddm/ddmadm get-originated
Prefix
fd00:1122:3344:1::/64
fd00:1122:3344:104::/64    <----
fdb0:8061:5f11:ab31::/64

This causes a duplicate route to exist inside the switch zone for the sled's underlay address (in this scenario, ixgbe4 is connected to the frostypaws sled):

root@oxz_switch:~# netstat -rn -f dst:fd00:1122:3344:104::1

Routing Table: IPv6
  Destination/Mask            Gateway                   Flags Ref   Use    If   
--------------------------- --------------------------- ----- --- ------- ----- 
fd00:1122:3344:104::/64     fd00:1122:3344:104::2       U       6    1523 oxControlService0 
fd00:1122:3344:104::/64     fe80::8261:5fff:fe11:ab31   UG      1       0 ixgbe4 
default                     fd00:1122:3344:104::1       UG      2     138

This causes GENEVE packets destined for the sled to hop to oxControlService0 instead of ixgbe4. ixgbe4 is directly connected to the link used for OPTE, and this causes packets destined for an instance to not get processed by OPTE.

jmpesp commented 1 year ago

This is possibly related to the route burglary Ry had seen ?

jmpesp commented 1 year ago

Though possibly not:

root@oxz_switch:~# ndp -a | grep fe80::8261:5fff:fe11:ab31
ixgbe4 80:61:5f:11:ab:31  dynamic REACHABLE    fe80::8261:5fff:fe11:ab31
rcgoodfellow commented 1 year ago

Part of the problem here is that the transit router is running with the illumos kernel as the underlying routing/forwarding platform and not Dendrite. This means that instead of routes going onto the switch ASIC which is decoupled from the OS routing tables, they are headed to illumos.

That being said, the routing setup here is a bit awkward in the sense that an underlay prefix is shared between the GZ and the switch zone, and, the GZ and the switch zone effectively have a physical communication loop running between them. This results in a logically inconsistent set of routes between the two - even if the OS/Dendrite split makes that inconsistency OK under normal circumstances.

I think this can still be made to work by assigning a more specific underlay address and corresponding routes to the switch zone. Using the addresses above, we could instead have.

Since more specific routes take precedence, this has the effect of going through the zone etherstub specifically to talk to control plane infrastructure such as sled-agent listening in the GZ on fd00:1122:3344:104::1 and then using the advertised route from ddmd going out a switch port for everything else.

Strictly speaking, this does not need to be a /128 we could decide on some leading number of addresses that have this routing behavior.

jmpesp commented 1 year ago

Using:

diff --git a/sled-agent/src/services.rs b/sled-agent/src/services.rs
index 94689564..f2cc13e5 100644
--- a/sled-agent/src/services.rs
+++ b/sled-agent/src/services.rs
@@ -1476,9 +1478,14 @@ impl ServiceManager {
                         "Ensuring address {} exists",
                         addr.to_string()
                     );
+
+                    // Routes to the underlay network are taken care of by ddm.
+                    // Create this address with a prefix of 128 so as to not
+                    // create a conflicting route to the sled's /64.
                     let addr_request =
-                        AddressRequest::new_static(IpAddr::V6(*addr), None);
+                        AddressRequest::new_static(IpAddr::V6(*addr), Some(128));
                     zone.ensure_address(addr_request).await?;
+
                     info!(
                         self.inner.log,
                         "Ensuring address {} exists - OK",

Results in:

root@oxz_switch:~# ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
lo0/v6            static   ok           ::1/128
ixgbe0/ll         addrconf ok           fe80::8261:5fff:fe07:41d8/10
ixgbe1/ll         addrconf ok           fe80::8261:5fff:fe07:41d9/10
ixgbe2/ll         addrconf ok           fe80::8261:5fff:fe11:ab30/10
ixgbe4/ll         addrconf ok           fe80::8edc:d4ff:feaf:d2a8/10
ixgbe5/ll         addrconf ok           fe80::8edc:d4ff:feaf:d2a9/10
oxBootstrap0/ll   addrconf ok           fe80::8:20ff:fe0b:54fb/10
oxBootstrap0/bootstrap6 static ok       fdb0:8061:5f11:ab31::2/64
oxControlService0/ll addrconf ok        fe80::8:20ff:fe07:ba47/10
oxControlService0/omicron6 static ok    fd00:1122:3344:104::2/128

root@oxz_switch:~# netstat -rn -f dst:fd00:1122:3344:104::1

Routing Table: IPv6
  Destination/Mask            Gateway                   Flags Ref   Use    If   
--------------------------- --------------------------- ----- --- ------- ----- 
fd00:1122:3344:104::/64     fe80::8:20ff:fe9c:323       UG      2     115 oxBootstrap0 
fd00:1122:3344:104::/64     fe80::8261:5fff:fe11:ab31   UG      1       0 ixgbe4

Strangely, we're seeing this oxBootstrap0 route! Though, now the sled agent is failing to re-enable the running switch zone with the new underlay address:

{
  "msg": "Re-enabling running switch zone (new address)",
  "v": 0,
  "name": "SledAgent",
  "level": 30,
  "time": "2023-04-04T21:08:31.378327626Z",
  "hostname": "frostypaws",
  "pid": 2138,
  "component": "BootstrapAgent",
  "new": "[fd00:1122:3344:104::2, ::1]",
  "old": "[::1]"
}
{
  "msg": "Adding address: Static(V6(Ipv6Network { addr: fd00:1122:3344:104::2, prefix: 128 }))",
  "v": 0,
  "name": "SledAgent",
  "level": 30,
  "time": "2023-04-04T21:08:31.378369114Z",
  "hostname": "frostypaws",
  "pid": 2138,
  "zone": "oxz_switch",
  "component": "BootstrapAgent"
}
{
  "msg": "Failed to activate switch: Failed to do 'Adding Route' by running command in zone: Error running command in zone 'oxz_switch': Command [/usr/sbin/zlogin oxz_switch /usr/sbin/route add -inet6 default -inet6 fd00:1122:3344:104::1] executed and failed with status: exit status: 128  stdout: add net default: gateway fd00:1122:3344:104::1: Network is unreachable\n  stderr: ",
  "v": 0,
  "name": "SledAgent",
  "level": 40,
  "time": "2023-04-04T21:08:33.500631482Z",
  "hostname": "frostypaws",
  "pid": 2138,
  "sled_id": "a58845be-9a1e-4464-af7a-117ae5ab6784",
  "component": "SledAgent"
}

If I comment out the part where that default route is added:

diff --git a/sled-agent/src/services.rs b/sled-agent/src/services.rs
index 94689564..f2cc13e5 100644
--- a/sled-agent/src/services.rs
+++ b/sled-agent/src/services.rs
@@ -1486,14 +1493,14 @@ impl ServiceManager {
                     );
                 }

-                if let Some(info) = self.inner.sled_info.get() {
-                    zone.add_default_route(info.underlay_address)
-                        .await
-                        .map_err(|err| Error::ZoneCommand {
-                            intent: "Adding Route".to_string(),
-                            err,
-                        })?;
-                }
+                //if let Some(info) = self.inner.sled_info.get() {
+                //    zone.add_default_route(info.underlay_address)
+                //        .await
+                //        .map_err(|err| Error::ZoneCommand {
+                //            intent: "Adding Route".to_string(),
+                //            err,
+                //        })?;
+                //}

                 for service in &request.services {
                     let smfh = SmfHelper::new(&zone, service);

I no longer see the Failed to activate switch message but the strange oxBootstrap0 route remains:

root@oxz_switch:~# ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
lo0/v6            static   ok           ::1/128
ixgbe0/ll         addrconf ok           fe80::8261:5fff:fe07:41d8/10
ixgbe1/ll         addrconf ok           fe80::8261:5fff:fe07:41d9/10
ixgbe2/ll         addrconf ok           fe80::8261:5fff:fe11:ab30/10
ixgbe4/ll         addrconf ok           fe80::8edc:d4ff:feaf:d2a8/10
ixgbe5/ll         addrconf ok           fe80::8edc:d4ff:feaf:d2a9/10
oxBootstrap0/ll   addrconf ok           fe80::8:20ff:fe56:ab19/10
oxBootstrap0/bootstrap6 static ok       fdb0:8061:5f11:ab31::2/64
oxControlService0/ll addrconf ok        fe80::8:20ff:fe0a:d7ef/10
oxControlService0/omicron6 static ok    fd00:1122:3344:104::2/128

root@oxz_switch:~# netstat -rn -f dst:fd00:1122:3344:104::1

Routing Table: IPv6
  Destination/Mask            Gateway                   Flags Ref   Use    If   
--------------------------- --------------------------- ----- --- ------- ----- 
fd00:1122:3344:104::/64     fe80::8:20ff:fe20:5537      UG      2    1481 oxBootstrap0 
fd00:1122:3344:104::/64     fe80::8261:5fff:fe11:ab31   UG      1       0 ixgbe4

Addressability to dpd seems ok:

root@oxz_switch:~# curl -I http://[fd00:1122:3344:104::2]:12224/
HTTP/1.1 404 Not Found
content-type: application/json
x-request-id: 45202c52-e397-4a50-b658-3dbf637de6de
content-length: 84
date: Tue, 04 Apr 2023 21:21:12 GMT

root@oxz_switch:~# 
logout

[Connection to zone 'oxz_switch' pts/3 closed]
james@frostypaws:~/omicron$ curl -I http://[fd00:1122:3344:104::2]:12224/
HTTP/1.1 404 Not Found
content-type: application/json
x-request-id: da10af64-3dd7-43f0-bfb7-070065b56af3
content-length: 84
date: Tue, 04 Apr 2023 21:21:17 GMT

Even from other sleds:

james@dinnerbone:~$ ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
igb0/dhcp         dhcp     ok           10.0.0.4/24
lo0/v6            static   ok           ::1/128
ixgbe0/ll         addrconf ok           fe80::21b:21ff:fec1:ffe0/10
net1/ll           addrconf ok           fe80::8:20ff:fe1e:88a3/10
bootstrap0/ll     addrconf ok           fe80::8:20ff:fe91:b7cb/10
bootstrap0/bootstrap6 static ok         fdb0:1b:21c1:ffe0::1/64
underlay0/ll      addrconf ok           fe80::8:20ff:fe0d:f3/10
underlay0/sled6   static   ok           fd00:1122:3344:101::1/64

james@dinnerbone:~$ curl -I http://[fd00:1122:3344:104::2]:12224/
HTTP/1.1 404 Not Found
content-type: application/json
x-request-id: 86f3eaae-2c9d-4c0d-a39a-80fde7de0778
content-length: 84
date: Tue, 04 Apr 2023 21:22:03 GMT
jmpesp commented 1 year ago

note, with

root@oxz_switch:~# netstat -rn -f dst:fd00:1122:3344:104::1

Routing Table: IPv6
  Destination/Mask            Gateway                   Flags Ref   Use    If   
--------------------------- --------------------------- ----- --- ------- ----- 
fd00:1122:3344:104::/64     fe80::8:20ff:fe20:5537      UG      2    1770 oxBootstrap0 
fd00:1122:3344:104::/64     fe80::8261:5fff:fe11:ab31   UG      2       1 ixgbe4

if we ping from the switch zone to the global zone's underlay address:

root@oxz_switch:~# ping fd00:1122:3344:104::1
fd00:1122:3344:104::1 is alive

snoop in another window sees this use oxBootstrap0:

root@oxz_switch:~# snoop -d oxBootstrap0 src fd00:1122:3344:104::1 or dst fd00:1122:3344:104::1
Using device oxBootstrap0 (promiscuous mode)
fdb0:8061:5f11:ab31::2 -> fd00:1122:3344:104::1 ICMPv6 Echo request (ID: 9611 Sequence number: 0)
fd00:1122:3344:104::1 -> fdb0:8061:5f11:ab31::2 ICMPv6 Echo reply (ID: 9611 Sequence number: 0)
jmpesp commented 1 year ago

It turns out this was due to ipv6-routing being enabled in the global zone. If this is true, then illumos will perform ipv6 router advertisements, and this is being picked up by the switch zone over the bootstrap interface:

james@frostypaws:~$ ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
igb0/v4           dhcp     ok           10.0.0.7/24
lo0/v6            static   ok           ::1/128
igb0/linklocal    addrconf ok           fe80::6d9:f5ff:fe21:8ac4/10
ixgbe3/ll         addrconf ok           fe80::8261:5fff:fe11:ab31/10
net1/ll           addrconf ok           fe80::8:20ff:fe78:4bfd/10
bootstrap0/ll     addrconf ok           fe80::8:20ff:fe99:d459/10
bootstrap0/bootstrap6 static ok         fdb0:8061:5f11:ab31::1/64

james@frostypaws:~$ pfexec zlogin oxz_switch
[Connected to zone 'oxz_switch' pts/3]
The illumos Project     helios-1.0.21472        February 2023
root@oxz_switch:~# ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
lo0/v6            static   ok           ::1/128
ixgbe0/ll         addrconf ok           fe80::8261:5fff:fe07:41d8/10
ixgbe1/ll         addrconf ok           fe80::8261:5fff:fe07:41d9/10
ixgbe2/ll         addrconf ok           fe80::8261:5fff:fe11:ab30/10
ixgbe4/ll         addrconf ok           fe80::8edc:d4ff:feaf:d2a8/10
ixgbe5/ll         addrconf ok           fe80::8edc:d4ff:feaf:d2a9/10
oxBootstrap3/ll   addrconf ok           fe80::8:20ff:feb7:b1c2/10
oxBootstrap3/bootstrap6 static ok       fdb0:8061:5f11:ab31::2/64
oxControlService3/ll addrconf ok        fe80::8:20ff:fef1:60e7/10
oxControlService3/omicron6 static ok    fd00:1122:3344:104::2/128

root@oxz_switch:~# netstat -rn -f dst:fd00:1122:3344:104::1

Routing Table: IPv6
  Destination/Mask            Gateway                   Flags Ref   Use    If   
--------------------------- --------------------------- ----- --- ------- ----- 
fd00:1122:3344:104::/64     fe80::8:20ff:fe99:d459      UG      2     107 oxBootstrap3 
fd00:1122:3344:104::/64     fe80::8261:5fff:fe11:ab31   UG      1       0 ixgbe4 

root@oxz_switch:~# ndp -i oxBootstrap3 fe80::8:20ff:fe99:d459
fe80::8:20ff:fe99:d459 (fe80::8:20ff:fe99:d459) at 2:8:20:99:d4:59 router temp

If I disable ipv6-routing in the global zone, that oxBootstrap3 route no longer shows up in the switch zone.

My fake scrimlet's global zone being configured as an ipv6 router is a configuration error on my part - the sled agent ensures that the global zone enables ipv6 forwarding, but I had previously enabled ipv6 routing in a past testing configuration.

I'm closing this. Thanks for all your help @rcgoodfellow !

jmpesp commented 1 year ago

I closed this too early - we (read: Ry) figured out why the oxBootstrap route existed but the duplicate route issue remains even when that's solved.