Open mcacker opened 10 years ago
that is a symptom of either a NAT router timing out your connection or one end restaring/crashing swan. If the first, try enabling DPD (see man ipsec.conf)
the server that openswan is running on is also acting as a NAT server:
iptables -t nat -A POSTROUTING -o eth0 -s 0.0.0.0/0 -j MASQUERADE
I see no indication that iptables either on this end of the tunnel, or the other end are timing out the connection. I'm monitoring the openswan process and it's not restarting or crashing.
I've tried adding
iptables -t mangle -A POSTROUTING -o $OUTGOING_INTERFACE -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
which just seems to have made things worse.
I'll try adding the DPD configuration, but given my observations, doubt that will help.
Any other ideas?
try: iptables -t mangle -A POSTROUTING -o $OUTGOING_INTERFACE -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1440
but i dont think the issues are mtu related, so i dont think tcpmss will help. The udp 4500 packet is an IKE packet (or ESPinUDP data packet) that failed to reach the remote server. That server either stopped the IKE daemon or there is a NAT router in front of it that closed its nat mapping for that port.
i've added
iptables -t mangle -A POSTROUTING -o eth0 -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1436
as opposed to the earlier mangle setting, and things appear to be working much better. The application hasn't misbehaved after having been running for about 10x as long as it previously did before encountering errors. That being said, the error
pluto[4177]: ERROR: asynchronous network error report on eth0 (sport=4500) for message to 54.92.5.246 port 4500, complainant 10.89.1.129: Message too long [errno 90, origin ICMP type 3 code 4 (not authenticated)]
is still occurring, so I expect that i'm dropping some messages, just not enough to result in the errors I was seeing before.
I got the value 1436 from a wild guess that the MTU setting discussed in http://docs.aws.amazon.com/AmazonVPC/latest/NetworkAdminGuide/Introduction.html might have some relevance to the problem. If I didn't mention it before, the system is running in AWS and the tunnels are between AWS regions.
Well, although running for longer the system eventually crashed. I'm trying a smaller MTU, but still seeing the error. I'll try adding DPD after completing these tests.
I take it that the IKE daemon is one of the ipsec processes. I see the following ipsec processes on both tunnel servers:
root 3901 1 0 20:36 pts/0 00:00:00 /bin/sh /usr/libexec/ipsec/_plutorun --debug --uniqueids yes --force_busy no --nocrsend no --strictcrlpolicy no --nat_traversal yes --keep_alive --protostack netkey --force_keepalive no --disable_port_floating no --virtual_private %v4:172.16.0.0/12,%v4:192.168.0.0/16,%v4:10.0.0.0/8,%v4:!10.89.3.0/24 --listen --crlcheckinterval 0 --ocspuri --nhelpers --dump --opts --stderrlog --wait no --pre --post --log daemon.error --plutorestartoncrash true --pid /var/run/pluto/pluto.pid
root 3902 1 0 20:36 pts/0 00:00:00 logger -s -p daemon.error -t ipsec__plutorun
root 3903 3901 0 20:36 pts/0 00:00:00 /bin/sh /usr/libexec/ipsec/_plutorun --debug --uniqueids yes --force_busy no --nocrsend no --strictcrlpolicy no --nat_traversal yes --keep_alive --protostack netkey --force_keepalive no --disable_port_floating no --virtual_private %v4:172.16.0.0/12,%v4:192.168.0.0/16,%v4:10.0.0.0/8,%v4:!10.89.3.0/24 --listen --crlcheckinterval 0 --ocspuri --nhelpers --dump --opts --stderrlog --wait no --pre --post --log daemon.error --plutorestartoncrash true --pid /var/run/pluto/pluto.pid
root 3904 3901 0 20:36 pts/0 00:00:00 /bin/sh /usr/libexec/ipsec/_plutoload --wait no --post
root 3908 3903 0 20:36 pts/0 00:00:00 /usr/libexec/ipsec/pluto --nofork --secretsfile /etc/ipsec.secrets --ipsecdir /etc/ipsec.d --use-netkey --uniqueids --nat_traversal --virtual_private %v4:172.16.0.0/12,%v4:192.168.0.0/16,%v4:10.0.0.0/8,%v4:!10.89.3.0/24
so it doesn't look like an IKE process has terminated.
I'm not sure if iptables, which is meant to be doing the natting & mtu mangling, is running:
[root@ip-10-89-1-167 ipsec.d]# chkconfig --list | grep iptables
iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off
Even after stopping and starting iptables, it shows it off, and I don't see it in the process list, so maybe I should look into that further
I ran a wireshark capture (tcpdump) and it looks to me like there are some packets that are too large right around the time that we get an error message in the ipsec log. See that attached images
Aug 13 16:37:53 ip-10-89-3-158 pluto[3473]: "ap-northeast-1-2-to-us-west-1-1" #4: STATE_QUICK_I2: sent QI2, IPsec SA established tunnel mode {ESP=>0x743600bf <0x9c919b7c xfrm=AES_256-HMAC_SHA1 NATOA=none NATD=54.193.52.174:4500 DPD=none}
Aug 13 16:43:33 ip-10-89-3-158 pluto[3473]: ERROR: asynchronous network error report on eth0 (sport=4500) for message to 54.193.52.174 port 4500, complainant 10.89.3.129: Message too long [errno 90, origin ICMP type 3 code 4 (not authenticated)]
FYI iptables -t mangle -A POSTROUTING -o eth0 -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1375
has fixed the problem
On Thu, 14 Aug 2014, mcacker wrote:
iptables -t mangle -A POSTROUTING -o eth0 -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1375 has fixed the problem
Great! I am very surprised to hear that, but I will add this information to the libreswan wiki to help people like you in the future.
Thanks for getting back to me.
Hi,
i'm experiencing communication failures between servers connected over Openswan tunnels, and am seeing what looks like potential MTU issues. I've increased the logging and frequently see messages like below in /var/log/secure
Is this an MTU issues, and if so, why does it occur, and how do I go about solving it? We're using a clustering technology that requires handshakes (jgroups) and we see messages sent and responded to, but the response is not received.
I've see suggestions to modify iptables:
but don't know if this is the correct solution, or if all these settings are really appropriate.
thanks, Mitchell
My configuration looks like:
the other side of the tunnel looks much the same.