Closed plotnick closed 2 years ago
The cause seems to be that after (but not during) the initial TCP handshake, OPTE tries to send all packets to the internet gateway (i.e., the destination ethernet address is the gateway's MAC), regardless of whether the destination is on the same subnet as the guest's external IP.
The design of the "external IP hack" is to forward all external traffic to the gateway configured in the confi.toml -- so this is by design. This was done to avoid the extra work of doing routery stuff in OPTE itself when we can just forward to the local router in the home/lab network.
In your specific case the problem seems to be that Ubiquiti is doing some sort of middleware TCP inspection itself. It leaves the handshake alone (the handshake segments from your moneta
IP have a src MAC that looks to be for a supermicro board), but then once the connection is established it seems the Ubituiti router is doing some of its own inspection and decides to send a reset segment for the first data ACK from the guest (notice segment 6's src MAC is for a Ubiquiti router). For whatever reason, this router doesn't like the fact that its being asked to forward packets that are destined to the local subnet and could have just been sent directly. On my home network you just get ICMP redirects from the router, which seems like the more reasonable thing to do. Perhaps there is a config you can toggle?
Ok, that all makes sense. The problem may be that the initial traffic from the Linux box to Helios doesn't hit the router at all, since they're both connected to a local switch (different from the router's); this might be what's confusing the router and making it send the RST
instead of redirects. I tried twiddling some hairpin NAT settings on the router, but no luck :disappointed: Thanks for the feedback, though.
Slightly unclear if this is a real bug or just a side-effect of the current external IP hacks; please feel free to close with "WONTFIX" if the latter. But I wanted to record the issue in case anyone encounters it again. Many thanks to @jmpesp for helping to diagnose the issue; any networking errors or confusion in the description that follows are my own.
The symptom is that a guest instance running under Omicron on Helios responds to ICMP pings from a Linux system on the same (ethernet & IPv4) network, but SSH hangs after the initial TCP handshake. The same thing happens trying to SSH into a guest from the host Helios machine.
The cause seems to be that after (but not during) the initial TCP handshake, OPTE tries to send all packets to the internet gateway (i.e., the destination ethernet address is the gateway's MAC), regardless of whether the destination is on the same subnet as the guest's external IP.
The specific setup is this: my Linux development machine (
moneta
) is at192.168.0.42
and the Helios host machine is at192.168.0.43
; they are connected via a Gb ethernet switch. My internet gateway is a Ubiquiti router at192.168.0.1
with a MAC address of44:d9:e7:07:12:23
; those addresses are configured in thesmf/sled-agent/config.toml
file. I've allocated an IP pool with the range192.168.0.100
-192.168.0.200
in Nexus; nothing else (including DHCP) is using those addresses.I booted a Debian generic-cloud instance, and it was allocated the external IP address
192.168.0.101
. It responds to pings from the Linux machine at that address, but trying to SSH into it fails: verbose mode reportsConnection established
, but then hangs. Likewisenc 192.168.0.101 22
reports no errors, but shows no output. On the Helios host, I ranpfexec snoop -o ssh.snoop -d net0 tcp
and tried again; the following is the result of that capture:ssh.zip
Examining the capture in Wireshark shows that that the initial three-way TCP handshake succeeds. Packet 4 tries to start the SSH session, but packet 5 is a TCP
ACK
from the Helios host with an ethernet destination of the gateway and an IP destination of the Linux machine. The gateway responds with a TCPRST
, presumably because it wasn't involved in the handshake; all subsequent traffic follows the same pattern.As a work-around and to prove that this was indeed the cause of the failure, using the gateway as an SSH jump host works fine, and I can log into the Debian guest:
Outbound networking from the guest also works, e.g., having logged in as above, I can ping
1.1.1.1
.