oxidecomputer / opte

packets go in, packets go out, you can't explain that
Mozilla Public License 2.0
37 stars 9 forks source link

Post-handshake TCP packets incorrectly sent to gateway #200

Closed plotnick closed 2 years ago

plotnick commented 2 years ago

Slightly unclear if this is a real bug or just a side-effect of the current external IP hacks; please feel free to close with "WONTFIX" if the latter. But I wanted to record the issue in case anyone encounters it again. Many thanks to @jmpesp for helping to diagnose the issue; any networking errors or confusion in the description that follows are my own.

The symptom is that a guest instance running under Omicron on Helios responds to ICMP pings from a Linux system on the same (ethernet & IPv4) network, but SSH hangs after the initial TCP handshake. The same thing happens trying to SSH into a guest from the host Helios machine.

The cause seems to be that after (but not during) the initial TCP handshake, OPTE tries to send all packets to the internet gateway (i.e., the destination ethernet address is the gateway's MAC), regardless of whether the destination is on the same subnet as the guest's external IP.

The specific setup is this: my Linux development machine (moneta) is at 192.168.0.42 and the Helios host machine is at 192.168.0.43; they are connected via a Gb ethernet switch. My internet gateway is a Ubiquiti router at 192.168.0.1 with a MAC address of 44:d9:e7:07:12:23; those addresses are configured in the smf/sled-agent/config.toml file. I've allocated an IP pool with the range 192.168.0.100-192.168.0.200 in Nexus; nothing else (including DHCP) is using those addresses.

I booted a Debian generic-cloud instance, and it was allocated the external IP address 192.168.0.101. It responds to pings from the Linux machine at that address, but trying to SSH into it fails: verbose mode reports Connection established, but then hangs. Likewise nc 192.168.0.101 22 reports no errors, but shows no output. On the Helios host, I ran pfexec snoop -o ssh.snoop -d net0 tcp and tried again; the following is the result of that capture:

ssh.zip

Examining the capture in Wireshark shows that that the initial three-way TCP handshake succeeds. Packet 4 tries to start the SSH session, but packet 5 is a TCP ACK from the Helios host with an ethernet destination of the gateway and an IP destination of the Linux machine. The gateway responds with a TCP RST, presumably because it wasn't involved in the handshake; all subsequent traffic follows the same pattern.

As a work-around and to prove that this was indeed the cause of the failure, using the gateway as an SSH jump host works fine, and I can log into the Debian guest:

moneta:~% ssh -o StrictHostKeyChecking=no -A -J ubnt@192.168.0.1 debian@192.168.0.101
Welcome to EdgeOS

By logging in, accessing, or using the Ubiquiti product, you
acknowledge that you have read and understood the Ubiquiti
License Agreement (available in the Web UI at, by default,
http://192.168.1.1) and agree to be bound by its terms.

ubnt@192.168.0.1's password: 
Linux debian 5.10.0-13-cloud-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Mon Jul 25 17:45:27 2022 from 192.168.0.1
debian@debian:~$ 

Outbound networking from the guest also works, e.g., having logged in as above, I can ping 1.1.1.1.

rzezeski commented 2 years ago

The cause seems to be that after (but not during) the initial TCP handshake, OPTE tries to send all packets to the internet gateway (i.e., the destination ethernet address is the gateway's MAC), regardless of whether the destination is on the same subnet as the guest's external IP.

The design of the "external IP hack" is to forward all external traffic to the gateway configured in the confi.toml -- so this is by design. This was done to avoid the extra work of doing routery stuff in OPTE itself when we can just forward to the local router in the home/lab network.

In your specific case the problem seems to be that Ubiquiti is doing some sort of middleware TCP inspection itself. It leaves the handshake alone (the handshake segments from your moneta IP have a src MAC that looks to be for a supermicro board), but then once the connection is established it seems the Ubituiti router is doing some of its own inspection and decides to send a reset segment for the first data ACK from the guest (notice segment 6's src MAC is for a Ubiquiti router). For whatever reason, this router doesn't like the fact that its being asked to forward packets that are destined to the local subnet and could have just been sent directly. On my home network you just get ICMP redirects from the router, which seems like the more reasonable thing to do. Perhaps there is a config you can toggle?

plotnick commented 2 years ago

Ok, that all makes sense. The problem may be that the initial traffic from the Linux box to Helios doesn't hit the router at all, since they're both connected to a local switch (different from the router's); this might be what's confusing the router and making it send the RST instead of redirects. I tried twiddling some hairpin NAT settings on the router, but no luck :disappointed: Thanks for the feedback, though.