Route-based IPsec: packets between 1373-1472 get lost

jroehler23 commented 5 years ago

Important notices Before you add a new report, we ask you kindly to acknowledge the following:

[X] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md

[X] I have searched the existing issues and I'm convinced that mine is new.

Describe the bug A ping over the VTI with a packet size between 1373 - 1472 gets lost. 1372 is the maximum packet size with DF flag set. Bigger packets than 1472 are working fine. Also ping to the internet are working fine with all packet sizes.

To Reproduce Ping over a VTI with packet size 1373-1472.

Environment Software version used and hardware type if relevant. e.g.:

OPNsense 19.7.2-amd64 FreeBSD 11.2-RELEASE-p12-HBSD OpenSSL 1.0.2s 28 May 2019

jroehler23 commented 5 years ago

I have now updated to the latest version OPNsense 19.7.4_1-amd64, but still the same behavior. Packets between 1373 and 1472 are not transmitted to the other end of the tunnel.

I can see in the logs a different processing of packets bigger than 1372 bytes. First they hit the VTI, then they were cut into two pieces and hit the IPSec interface as two. Then they are lost.

When the packet exceed 1472 bytes, the two pieces reach the other end of the tunnel at the IPSec interface there, were put together and the VTI interface there get the complete packet and transfer it finally to the LAN.

What happen to the packets between 1373 and 1472 bytes?

As information: I use for IPSec communication NAT-T on port 4500 with UDP.

mimugmail commented 5 years ago

@jroehler23 as nobody worked on it yet it's no wonder why it should be fixed. :) I told an apprentice to build a lab for reproducing.

There can only be 3 reasons:

Bug in Strongswan
Bug in FreeBSD
ICMP blocked somewhere

jroehler23 commented 5 years ago

I have also the same problem when I try to contact an access point in the remote network. The answer packets of this device are exact in this size region. This is a https comunication so the tcp is also affected.

fichtner commented 5 years ago

reason 2 might be interesting for @bu7cher to look at since he wrote the VTI support in FreeBSD :)

mimugmail commented 5 years ago

Don't wake sleeping dogs :) I'll test the setup here in lab first ... (next week)

bu7cher commented 5 years ago

What MTU size have if_ipsec(4) interfaces? What encryption algorithm is used? Are these packets originated from tunnel endpoint or are they routed? ICMP with DF flag will be dropped, if resulted packet exceeds the MTU, so I think it should be expected.

jroehler23 commented 5 years ago

The MTU of all tunnel interfaces are standard of OPNsense. Ifconfig shows 1400.

I have tried different encryptions:

AES (256 Bits) + SHA512 + DH-Gruppe 16 in phase 1 and AES (256 Bits) | SHA512 | Gruppe 16 (4096 bits) in phase 2.
AES (128 Bits) + SHA256 + DH-Gruppe 14 in phase 1 and 3DES | SHA256 | no PFS in phase 2.
AES (128 Bits) + SHA256 + DH-Gruppe 14 in phase 1 and AES (128 Bits) | SHA256 | no PFS in phase 2.

My installation:

local LAN --- OPNsense (with CARP on internet and LAN side) --- INTERNET --- LANcom router --- transfer network --- OPNsense --- remote LAN. All with fixed IP addresses.

bu7cher commented 5 years ago

The MTU of all tunnel interfaces are standard of OPNsense. Ifconfig shows 1400. local LAN --- OPNsense (with CARP on internet and LAN side) --- INTERNET --- LANcom router --- transfer network --- OPNsense --- remote LAN. All with fixed IP addresses.

And are you trying to ping from local LAN to remote LAN? What exact command do you use? ping -s 1373 -D ?

jroehler23 commented 5 years ago

I try to ping for my laptop in the local LAN to a device in the remote LAN, e.g. an access point.

My command was:

P:>ping 192.168.203.150 -l 1373

Ping wird ausgeführt für 192.168.203.150 mit 1373 Bytes Daten: Zeitüberschreitung der Anforderung. Zeitüberschreitung der Anforderung. Zeitüberschreitung der Anforderung. Zeitüberschreitung der Anforderung.

Ping-Statistik für 192.168.203.150: Pakete: Gesendet = 4, Empfangen = 0, Verloren = 4 (100% Verlust)

bu7cher commented 5 years ago

Ok, thanks for the report, I'll try to take a look at the related code at the end of this week.

mimugmail commented 5 years ago

My lab is ready, clients are linux and when I do a ping -s 1380 <ip> I get a "Need to frag but DF bit set". Next results when I have more time .. was just a quick one

jroehler23 commented 5 years ago

Here is the same, when I set the DF bit and 1380 bytes. The ping is rejected.

mimugmail commented 5 years ago

Which is expected behavior since you tell the gateway to not fragment but packets are too big?

jroehler23 commented 5 years ago

That's right! But when I do not set the DF bit with 1380 bytes, the packet is gone.

jroehler23 commented 5 years ago

This is a ping from my IP to a remote IP via a IPsec with 32 bytes. Normal ping: vti_CR | | Sep 26 15:00:32 | 192.168.100.100 | 192.168.203.150 | icmp | let out anything from firewall host itself |

This is a ping with 1380 bytes: (no response) IPsec | | Sep 26 15:00:40 | 192.168.100.100 | 192.168.203.150 | icmp | IPsec internal host to host | IPsec | | Sep 26 15:00:40 | 192.168.100.100 | 192.168.203.150 | icmp | IPsec internal host to host |

This is a ping with 1500 bytes: (response) IPsec | | Sep 26 15:07:59 | 192.168.203.150 | 192.168.100.100 | icmp | IPSec allow all IN | IPsec | | Sep 26 15:07:59 | 192.168.203.150 | 192.168.100.100 | icmp | IPSec allow all IN | IPsec | | Sep 26 15:07:59 | 192.168.100.100 | 192.168.203.150 | icmp | IPsec internal host to host | IPsec | | Sep 26 15:07:59 | 192.168.100.100 | 192.168.203.150 | icmp | IPsec internal host to host |

mimugmail commented 5 years ago

This looks funny, when I ping vom Machine A, this is the output of tcpdump of FW-A on LAN interface:

16:45:04.474290 IP 10.0.1.10 > 10.0.0.10: ICMP echo request, id 2229, seq 1, length 1388
16:45:04.474398 IP 10.0.1.1 > 10.0.1.10: ICMP 10.0.0.10 unreachable - need to frag (mtu 1400), length 36
16:45:05.485068 IP 10.0.1.10 > 10.0.0.10: ICMP echo request, id 2229, seq 2, length 1376
16:45:05.485091 IP 10.0.1.10 > 10.0.0.10: ip-proto-1
16:45:05.487310 IP 10.0.0.10 > 10.0.1.10: ICMP echo reply, id 2229, seq 2, length 1388

So the reply packet will be reassembled by OPN itself and the client doesn't recognize it as the reply packet

mimugmail commented 5 years ago

ok, I'm a bit further, packets within the range getting fragmented by the client and on FW-B LAN interface I see the reassembled icmp request which doesn't get replied. When the size is above 1472, also the packet leaving FW-B to Machine-B is fragmented. There must be some sysctl for icmp, frags or reassembling, like min value for frags etc. Still diggin .. @bu7cher if you have an idea, jump in :)

jroehler23 commented 5 years ago

ICMP is not my only problem. This is the try to connect to an accesspoint in the remote lan via https. The beginning of the connection is working, because the packets are small enough. Then the packets are in the specific range and it is not working.

IPsec Sep 27 12:13:30 192.168.103.220: 192.168.200.113 tcp Default deny rule
IPsec Sep 27 12:13:30 192.168.103.220:443 192.168.200.113:57191 tcp Default deny rule
IPsec Sep 27 12:13:30 192.168.103.220: 192.168.200.113 tcp Default deny rule
IPsec Sep 27 12:13:30 192.168.103.220: 192.168.200.113 tcp Default deny rule
IPsec Sep 27 12:13:30 192.168.103.220:443 192.168.200.113:57193 tcp Default deny rule
IPsec Sep 27 12:13:30 192.168.103.220:443 192.168.200.113:57192 tcp Default deny rule
IPsec Sep 27 12:13:29 192.168.103.220: 192.168.200.113 tcp Default deny rule
IPsec Sep 27 12:13:29 192.168.103.220:443 192.168.200.113:57193 tcp Default deny rule
IPsec Sep 27 12:13:29 192.168.103.220: 192.168.200.113 tcp Default deny rule
IPsec Sep 27 12:13:29 192.168.103.220:443 192.168.200.113:57193 tcp Default deny rule
IPsec Sep 27 12:13:29 192.168.103.220: 192.168.200.113 tcp Default deny rule
IPsec Sep 27 12:13:29 192.168.103.220:443 192.168.200.113:57192 tcp Default deny rule
IPsec Sep 27 12:13:29 192.168.103.220: 192.168.200.113 tcp Default deny rule
IPsec Sep 27 12:13:29 192.168.103.220:443 192.168.200.113:57192 tcp Default deny rule
IPsec Sep 27 12:13:29 192.168.103.220: 192.168.200.113 tcp Default deny rule
IPsec Sep 27 12:13:29 192.168.103.220:443 192.168.200.113:57191 tcp Default deny rule
IPsec Sep 27 12:13:29 192.168.103.220: 192.168.200.113 tcp Default deny rule
IPsec Sep 27 12:13:29 192.168.103.220:443 192.168.200.113:57191 tcp Default deny rule
vti_CR Sep 27 12:13:29 192.168.200.113:57194 192.168.103.220:443 tcp let out anything from firewall host itself
vti_CR Sep 27 12:13:29 192.168.200.113:57193 192.168.103.220:443 tcp let out anything from firewall host itself
vti_CR Sep 27 12:13:29 192.168.200.113:57192 192.168.103.220:443 tcp let out anything from firewall host itself
vti_CR Sep 27 12:13:29 192.168.200.113:57191 192.168.103.220:443 tcp let out anything from firewall host itself
vti_CR Sep 27 12:13:28 192.168.200.113:57190 192.168.103.220:443 tcp let out anything from firewall host itself
vti_CR Sep 27 12:13:28 192.168.200.113:57189 192.168.103.220:443 tcp let out anything from firewall host itself
vti_CR Sep 27 12:13:28 192.168.200.113:57188 192.168.103.220:443 tcp let out anything from firewall host itself

jroehler23 commented 5 years ago

mimugmail commented 5 years ago

When I disable pf completely the ping works fine ...

mimugmail commented 5 years ago

Firewall : Settings : Advanced : Disable interface scrub <- tick this one and it starts working. @bu7cher you can opt out, thanks for your help 👍

jroehler23 commented 4 years ago

You are right, this helps with the ICMP but not with a https/tcp connection. I get the same log as shown before.

mimugmail commented 4 years ago

Can you set the MSS on your LAN interface to 1300?

jroehler23 commented 4 years ago

This is not working. Same behavior and log data. And also no connection to the accesspoint.

jroehler23 commented 4 years ago

Deactivating SCRUP is not a suitable solution. When this is off, at least one of my IPsec VTIs are not working any more. I do not understand what happens with or without scrup, but without the tunnel does not come up.

mimugmail commented 4 years ago

When you disable scrub you can add specific rules below, add a rule, choose protocol tcp and set mss to 1300, rest default and save. Seems to work for me

jroehler23 commented 4 years ago

Can I see somewhere what scrub options are set before I disable it completely? As I wrote, one of the IPsec tunnels is not coming up when I disable scrub.

mimugmail commented 4 years ago

/tmp/rules.debug .. copy to other location, change, diff against new file

mimugmail commented 4 years ago

@jroehler23 two interesting discussions going on. I don't have the time to follow right now, but maybe you can play around with these values / infos:

https://lists.freebsd.org/pipermail/freebsd-net/2019-December/054951.html https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=242744

@bu7cher thx for your inputs in ML, maybe this is also related to route-based tunnels as they are also kind of mode=transport.

jroehler23 commented 4 years ago

@mimugmail thanks for this hint. I read through it, but I'm not that deep in freebsd develeopment. Any suggestions where to "try sysctl net.inet.ipsec.dfbit=0 that is documented in ipsec(4) manual"? And how can I get rid of this setting if it is not working?

igpit commented 4 years ago

sadly i found this bug too late. i assume we have hit this bug as well, but we did not test with ICMP. we just noticed problems with SMB timeouts while otherwise the tunnel appeared to be working fine.

we had a IPSEC VTI VPN between a opnsense and a pfsense running fine for a few month tunneling mainly SMB traffic. for unkonwn reasons from one day to the next (no update, no config change to routers or servers, error persisted across all system reboots) accessing SMB servers across the tunnel kind of stopped working. to be exact, name resolution was ok, you could ping the server, see the shares, do the auth, but on the client it just timed out on access on the share / file access. we looked at samba server logs with high log level , the client timeout corresponded to just silence in the log files. after like 30-60sec we see some reset of if the communication, until it stops again on access. we do not know if "the last data sent" was from server or client, so who was waiting or retrying. this was reproducible for all clients with all FreeNAS servers. connecting to windows based SMB server worked though.

in need for a fast fix we changed the IPSec implementation back to a phase 2 policy based setup, since then no more issues. as the issue did appear magically i dont think we can reproduce it.

the servers can be ruled out. i assume it was a problem in pfsense 2.4.4-RELEASE-p3 or opnsense 19.7.8 or an incompatibility talking IPSEC VTI with each other.

AdSchellevis commented 4 years ago

This issue has been automatically timed-out (after 180 days of inactivity).

For more information about the policies for this repository, please read https://github.com/opnsense/core/blob/master/CONTRIBUTING.md for further details.

If someone wants to step up and work on this issue, just let us know, so we can reopen the issue and assign an owner to it.

igpit commented 4 years ago

can we consider this solved? i fear this is still present?

mimugmail commented 4 years ago

Still present, maybe also add upstream label https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=242744

igpit commented 3 years ago

it has been a while and the upstream FreeBSD bug is apparently still open. i suppose the issue with lost large packets over VTI still persists in opnsense? is there a known easy workaround or fix implemented in opnsense behind the scnenes? or should it be a listed known issue?

mimugmail commented 3 years ago

Maybe this is related: https://forum.netgate.com/topic/89558/ipsec-pmtu/17 https://redmine.pfsense.org/issues/7801 https://redmine.pfsense.org/projects/pfsense/repository/1/revisions/a8e97945b4fdaa9c5228bddf2964d95fb505ee4b/diff

AdSchellevis commented 2 years ago

As I was looking at a similar problem (using openvpn), I thought it might be a good idea to leave some notes in this ticket for reference.

In most discussions I've seen so far around these type of issues, PMTU is being referred as the technology which is broken. According to the RFC (https://datatracker.ietf.org/doc/html/rfc1191) however this would suggest that inbound packets are being tagged as DF (do not fragment), which usually isn't the case (at least not in the cases I've seen). Although for IPsec one should be able to enforce or keep a DF flag using net.inet.ipsec.dfbit [1]

If PMTU isn't the issue, maybe it is more logical that mtu sizes across technology stacks are not always calculated correctly, in which case fragmentation isn't enforced to push it to the next step (and the packet would be lost). Logical areas of interest in these cases would be ip_input() [2] , ip_tryforward() [3] and scrubbing (Normalisation) in pf (which can be disabled completely or for specific traffic using https://github.com/opnsense/core/commit/ad2a5758d92accff60b48608434fc99798099bb3).

I'm not sure there really is an issue in the kernel, just wanted to leave some ideas in which direction to look if someone is digging into similar issues.

[1] https://www.freebsd.org/cgi/man.cgi?query=ipsec&sektion=4&format=html https://datatracker.ietf.org/doc/html/rfc2401#section-6.1 [2] https://github.com/freebsd/freebsd-src/blob/a7e7700fa741d64a31e9d7596175fc0461687b86/sys/netinet/ip_input.c#L584 [3] https://github.com/freebsd/freebsd-src/blob/a7e7700fa741d64a31e9d7596175fc0461687b86/sys/netinet/ip_fastfwd.c#L481

opnsense / core

Route-based IPsec: packets between 1373-1472 get lost #3674