Description
TCP sockets today don't validate the route when timeouts occur. This causes an issue where say the route disappears or a nic is removed from the system. In such cases Netstack TCP sockets would keep trying to retransmit even though the route is probably unusable due to either the NIC being removed or the address being removed from the underlying NIC.
Description TCP sockets today don't validate the route when timeouts occur. This causes an issue where say the route disappears or a nic is removed from the system. In such cases Netstack TCP sockets would keep trying to retransmit even though the route is probably unusable due to either the NIC being removed or the address being removed from the underlying NIC.
WritePacket() will correctly fail all writes https://cs.opensource.google/gvisor/gvisor/+/master:pkg/tcpip/stack/route.go;drc=1f0f687cbe49c4af272abc47d5d974e86fef6c01;l=206 but today in TCP these errors are ignored and we don't react to them. Also EINVAL is probably the incorrect error to return.
We should either return EHOSTUNREACH here or on a retransmit validate the route before using it similar to how linux does (see: https://github.com/torvalds/linux/blob/9ff9b0d392ea08090cd1780fb196f36dbb586529/net/ipv4/af_inet.c#L1276)
In fact linux even attempts to find a new route to the destination and only fails if there are no valid routes anymore.
This method is assigned to tcp->rebuild_header here https://github.com/torvalds/linux/blob/9ff9b0d392ea08090cd1780fb196f36dbb586529/net/ipv4/tcp_ipv4.c#L2139
Which is then called on the retransmit_skb path here https://github.com/torvalds/linux/blob/9ff9b0d392ea08090cd1780fb196f36dbb586529/net/ipv4/tcp_output.c#L3163
Is this feature related to a specific bug? TCP sockets can live for a long time even after NIC/Route is not valid anymore.
Do you have a specific solution in mind? See above: