opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.26k stars 725 forks source link

ICMP fragmentation required not returned to NAT origin #4094

Closed jgoerzen closed 3 years ago

jgoerzen commented 4 years ago

Important notices Before you add a new report, we ask you kindly to acknowledge the following:

[X] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md

[X] I have searched the existing issues and I'm convinced that mine is new.

Describe the bug

My LAN and WAN interfaces have a standard MTU of 1500.

In this setup, Wireguard interfaces have a MTU of 1420. Indeed, wg0 is configured with mtu 1420. wg0 is the outbound gateway for all traffic here.

However, when a packet originating from the LAN is destined to go out wg0, if its DF (don't fragment) bit is set and its size is too large for the wg0 MTU -- quite possible -- an ICMP "fragmentation required" message is generated. Unfortunately, this message is not propagated back to the origin on the LAN. Therefore, TCP path mtu discovery (PMTUD) is broken. Also, some UDP situations that enable the DF bit are broken as well.

The only way to get TCP to work here is with MSS clamping. It works, but it's a kludge and doesn't fix the non-TCP protocols.

A tcpdump across wg0 shows many packets like this:

(I have redacted my gateway IP to x.x.x.x)

15:48:20.467934 IP (tos 0x0, ttl 64, id 50466, offset 0, flags [none], proto ICMP (1), length 56)
    x.x.x.x > x.x.x.x: ICMP 142.4.200.132 unreachable - need to frag (mtu 1420), length 36
    IP (tos 0x0, ttl 64, id 23520, offset 0, flags [DF], proto UDP (17), length 1498)
    x.x.x.x.12335 > [redacted].5001: UDP, length 1470

The underlying UDP packets coming in on the LAN port are:

15:49:47.954083 IP (tos 0x0, ttl 64, id 28647, offset 0, flags [DF], proto UDP (17), length 1498)
    [redacted].49112 > [redacted2].5001: [udp sum ok] UDP, length 1470

And the firewall log shows:

 00:00:00.011264 rule 65/0(match): pass out on wg0: (tos 0x0, ttl 64, id 6338, offset 0, flags [none], proto ICMP (1), length 56)
    x.x.x.x > x.x.x.x: ICMP [redacted] unreachable - need to frag (mtu 1420), length 36
    (tos 0x0, ttl 64, id 34584, offset 0, flags [DF], proto UDP (17), length 1498)
    x.x.x.x.34601 > [redacted].5001: UDP, length 1470

Those ICMP packets are never transmitted back out the LAN side. Note that with this, as with many common Wireguard confirugations, the provisioned IP and netmask is x.x.x.x/32. In this case, I wonder if this is causing confusion since we are seeing a "pass out" rule match?

I have an interface and gateway configured on my Wireguard interface, along with corresponding NAT rules.

To Reproduce

Steps to reproduce the behavior:

  1. Establish a Wireguard connection as a default destination for outbound NAT
  2. Use something to generate large packets (tracepath, iperf -u, ping -Mdo, etc)
  3. Observe ICMP fragmentation needed packets never returning to the LAN

Expected behavior

The ICMP frag needed packets are expected and correct in this situation. The problem is that they are not being returned to the origin on the LAN.

Additional Notes

This that I have tried that have not fixed this:

Although Wireguard is involved here, I do not believe the bug lies in the plugin, as it seems to be correctly generating the ICMP packets. Something is amiss at a deeper level.

Environment Software version used and hardware type if relevant. e.g.:

OPNsense 20.1.6-amd64 FreeBSD 11.2-RELEASE-p19-HBSD OpenSSL 1.1.1g 21 Apr 2020

tdilo commented 4 years ago

I have observed the same issue on a PPPoE connection with a MTU of 1492. In my case, ICMP fragmentation required packets appear in the firwall log (pass), but are sent from 127.0.0.1 to the WAN IP address and are subsequently not forwarded to the NAT clients.

 00:00:00.523965 rule 113/0(match): pass out on lo0: (tos 0x0, ttl 64, id 33017, offset 0, flags [none], proto ICMP (1), length 56)
    127.0.0.1 > [WAN IP]: ICMP [TARGET IP] unreachable - need to frag (mtu 1492), length 36
    (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 1493)
    [WAN IP] > [TARGET IP]: ICMP echo request, id 9311, seq 5, length 1473

Sending overly large packets with DF bit set from LAN to local wireguard clients (no NAT involved) does send out ICMP fragmentation required packets to the LAN.

jgoerzen commented 4 years ago

Thanks for sharing that. It's good to know that indeed Wireguard isn't the culprit, but I guess that means unfortunately that there's a deeper problem in there somewhere.

mimugmail commented 4 years ago

You could try Firewall : Advanced : Normalization and add a rule for traffic from LAN to WG to remove the DF bit. Then the Firewall can frag the packets

jgoerzen commented 4 years ago

Thanks for the suggestion. I had experimented with that workaround. It does allow, eg, tracepath to egress. However, there are reasons for path MTU discovery (pmtud) and this breaks it. I found it had a seriously negative impact on performance in some situations.

And it is, after all, a workaround; opnsense should be properly passing back these ICMP reports.

tdilo commented 4 years ago

After setting sysctl net.inet.icmp.reply_from_interface=1 ICMP fragmentation needed packets are sent from the incoming interface instead of 127.0.0.1 and PMTUD now works as expected for me.

mimugmail commented 4 years ago

Nice catch, I think this should be added to 20.7 as a default

AdSchellevis commented 4 years ago

sounds like a good idea to swap our default, let's put it on the list.

jgoerzen commented 4 years ago

Interesting find, @tdilo . I tried applying that sysctl on my system and it didn't change the behavior in the wireguard case. ifconfig wg0 does show the mtu of 1420 so I'm not sure why it worked for you and not for me. A tcpdump across wg0 didn't show any difference either.

I tried applying the change both at the command line and with the web UI, and also a reboot. I verified it was applied by using sysctl.

Perhaps there is something else going on here as well?

I also saw https://redmine.pfsense.org/issues/3666 from a few years ago

L1ghtn1ng commented 4 years ago

@jgoerzen you can from the cli do opnsense-patch a95f9439656293631408cd186f78ac059eea58b5 and this will apply the change to your system

jgoerzen commented 4 years ago

Hi @L1ghtn1ng ,

Thanks for the tip. Unfortunately, the problem still persisted as before. sysctl net.inet.icmp.reply_from_interface verified that the sysctl is indeed set.

To test this, I deleted the change I had made in the GUI under system -> tunables and applied the patch with the command you gave command. (That's a very nifty feature, by the way!) The patch applied fine. I then rebooted.

Patching file etc/inc/system.inc using Plan A...
Hunk #1 succeeded at 85 (offset 1 line).
Hunk #2 succeeded at 134 (offset 1 line).
done
All patches have been applied successfully.  Have a nice day.
jgoerzen commented 4 years ago

Perhaps there is an additional problem here with wireguard, or tun, or the local and remote interfaces having the same IP on wg0? (as is common in these setups)

L1ghtn1ng commented 4 years ago

Try looking at this https://homenetworkguy.com/how-to/configure-wireguard-opnsense/

jgoerzen commented 4 years ago

That was very similar to what I did, with one exception: I was unable to get the opnsense gateway to come up until I saw the tip about putting a "fake" gateway IP in the WireGuard configuration at https://forum.opnsense.org/index.php?topic=15105.15

I used one from unused RFC1918 address space rather than 1.2.3.4 but that fixed it for me.

The system is routing packets appropriately in general, and with MSS clamping TCP is working fine. It's just not returning ICMP needs-frag packets.

John

fichtner commented 4 years ago

@jgoerzen Maybe shared forwarding is also an issue here. Can you disable it from firewall: settings: advanced?

jgoerzen commented 4 years ago

@fichtner Thanks -- in fact, I already had it disabled due to https://github.com/opnsense/src/issues/46 / https://github.com/opnsense/src/issues/52 . Before disabling shared forwarding, the kernel would reliably panic a few minutes after bringing up wireguard.

fichtner commented 4 years ago

@jgoerzen ah ok, if that is the case we may be looking into deep FreeBSD territory. One last testing point might be the 20.7-BETA based on HardenedBSD/FreeBSD 12.1, but I can understand that this is difficult to put into production at this point in time. A first release candidate is still 6-8 weeks away.

In any case have you looked for same error descriptions over at https://bugs.freebsd.org/bugzilla/ ?

Cheers, Franco

jgoerzen commented 4 years ago

Hi,

As this issue can be easily reproduced, it wouldn't be too bad for me to try 20.7-BETA, see what it does, and then go back to the current release version. Are there any particular settings or troubleshooting you'd like me to try, or just do a run-of-the-mill upgrade with my existing config?

jgoerzen commented 4 years ago

Also I have had no luck finding similar bugs in the FreeBSD bugzilla.

jgoerzen commented 4 years ago

One additional comment: I haven't upgraded to 20.7-BETA yet, but thought it would be interesting to compare with OpenVPN. The verdict: OpenVPN handles this case correctly. The ICMP fragmentation needed appears only on my LAN interface, em1, and never on the OpenVPN interface, ovpnc1.

18:35:51.766753 IP (tos 0xc0, ttl 63, id 6625, offset 0, flags [none], proto ICMP (1), length 576)
    x.x.x.x > y.y.y.y: ICMP time exceeded in-transit, length 556
    IP (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto UDP (17), length 1442)
    y.y.y.y.39533 > z.z.z.z.44444: UDP, length 1414

In this example:

x.x.x.x was the OpenVPN interface IP address y.y.y.y was the IP address of the client on the LAN, behind NAT z.z.z.z was the remote Internet target of the tracepath

In the Wireguard example I posted above, note the complete absence of the IP address of the client on the LAN. Also, the Wireguard ICMP frag-needed packet could only be seen on wg0, never on em1.

Whether this difference is due to OpenVPN vs. Wireguard itself, or the particular /32 configuration that is common in Wireguard, I can't say. Hope this helps, though.

I set up OpenVPN identically to Wireguard - disabled pulling routes, set up an interface, gateway, NAT rules, firewall rules, etc.

AdSchellevis commented 3 years ago

This issue has been automatically timed-out (after 180 days of inactivity).

For more information about the policies for this repository, please read https://github.com/opnsense/core/blob/master/CONTRIBUTING.md for further details.

If someone wants to step up and work on this issue, just let us know, so we can reopen the issue and assign an owner to it.

zhenliangliang commented 10 months ago

我也遇到了同样的问题,访问github 提示unreachable - need to frag (mtu 1414),