zh99998 / sigmavpn

Automatically exported from code.google.com/p/sigmavpn
0 stars 0 forks source link

ARP across sigmavpn tunnel within linux bridge devices gets dropped #2

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hello -

I am trying to get a l2 vpn bridge setup using sigmavpn.  The sigmavpn tunnel 
starts up fine, and I add each tunnel endpoing to a linux bridge (with brctl) 
that contains another ethernet device on either end.  None of the ethernet, 
tunnel, or bridge devices contain any IP addresses, and all are ifconfig UP 
state.  The endpoints of the real ethernet devices are switches, to which are 
connected PCs.

The problem is that when one PC pings another PC across the tunnel, the ARP 
request gets dropped on the remote bridge devices.  I can see the ARP traffic 
flow from source PC, into the ethernet device on the bridge, through the 
bridge, out the tunnel device that is part of the sigma tunnel, into the remote 
tunnel device, into the remote bridge... But then it never gets passed out the 
remote ethernet device to the remote switch and PC.

If I manually stuff ARP tables on either end, then it works just fine.

I noticed that the ARP request starts out as 46 bytes on the local side, but by 
the time it comes out of the sigmavpn tunnel at the remote end, it is 1532 
bytes. It is still 1532 bytes in the bridge devices itself.  I thought this was 
tunnel padding that would be stripped, but when I do a manual ARP stuffing and 
look at the pings, I do not see the same behavior: the ping traffic is 40 bytes 
end-to-end.

Any ideas?  I am using the latest sigmavpn source from git, built on Ubuntu 
14.04 LTS.

Thanks!

Original issue reported on code.google.com by mcl...@gmail.com on 12 May 2014 at 4:08

GoogleCodeExporter commented 9 years ago
I should note that if I remove sigmavpn tunnel and just put in ethernet 
interfaces connected to a switch in their place, then all works fine.  So it 
appears that sigmavpn is adding something particular to ARP traffic.  Perhaps 
it is due to an MTU problem?  I tried changing MTU values to something lower 
but that did not work.

Original comment by mcl...@gmail.com on 12 May 2014 at 4:12

GoogleCodeExporter commented 9 years ago
Hi,

This is very interesting behaviour, and certainly not by design. SigmaVPN 
should not modify any of the packets that it passes, and there is no special 
treatment for ARP packets across tunnels, but there may be a hard limit on MTU 
1500.

If you see the large packets arriving through the SigmaVPN tunnel on the remote 
side then that would suggest that some fragmentation is working, but you could 
verify that: use tcpdump or another packet tracer to examine the encrypted 
packets travelling over the link. Also consider looking at the packets coming 
through the tunnel whilst running tcpdump with -vvv flag, which will show 
things like correct or incorrect checksums/headers. 

Reducing MTU values may ultimately solve the problem, but those changes are 
maybe best made on all affected interfaces - the ethernet interfaces, the 
bridges and the SigmaVPN "tap" devices.

Please report your findings and I can investigate further.

Original comment by neilalexanderr on 12 May 2014 at 8:28

GoogleCodeExporter commented 9 years ago
I tried using quicktun and it worked fine, using the same nacltai publick and 
private keypairs as I am using with sigmavpn.

OK, did a little more investigating and more verbose tcpdump.

On the actual tunnel interfaces on each end, I did a tcpdump -vvv.  For the 
local interface (closest to the source ping, from which the ARP is originating):

19:18:05.711508 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
192.168.0.1 tell 192.168.0.2, length 46
19:18:11.211440 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
192.168.0.1 tell 192.168.0.2, length 46
19:18:16.710431 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
192.168.0.1 tell 192.168.0.2, length 46

And here is the capture from the remote tunnel interface (ignore timestamps, 
they were captured at different times):

19:18:55.209800 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
192.168.0.1 tell 192.168.0.2, length 1532
19:19:00.710628 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
192.168.0.1 tell 192.168.0.2, length 1532

If I capture on the actual ethernet devices over which the encrypted traffic is 
flowing, here is the local side (BAD CHECKSUM!):

19:21:35.706128 IP (tos 0x0, ttl 64, id 63669, offset 0, flags [DF], proto UDP 
(17), length 120)
    30.0.0.2.2337 > 40.0.0.2.2337: [bad udp cksum 0x4679 -> 0xd9d2!] UDP, length 92
19:21:41.206780 IP (tos 0x0, ttl 64, id 63670, offset 0, flags [DF], proto UDP 
(17), length 120)
    30.0.0.2.2337 > 40.0.0.2.2337: [bad udp cksum 0x4679 -> 0xa9f8!] UDP, length 92
19:21:46.706884 IP (tos 0x0, ttl 64, id 63671, offset 0, flags [DF], proto UDP 
(17), length 120)
    30.0.0.2.2337 > 40.0.0.2.2337: [bad udp cksum 0x4679 -> 0xb28e!] UDP, length 92

And here is the remote side (CHECKSUM OK):

19:22:52.705781 IP (tos 0x0, ttl 63, id 63683, offset 0, flags [DF], proto UDP 
(17), length 120)
    30.0.0.2.2337 > 40.0.0.2.2337: [udp sum ok] UDP, length 92
19:22:58.204450 IP (tos 0x0, ttl 63, id 63684, offset 0, flags [DF], proto UDP 
(17), length 120)
    30.0.0.2.2337 > 40.0.0.2.2337: [udp sum ok] UDP, length 92
19:23:03.705615 IP (tos 0x0, ttl 63, id 63685, offset 0, flags [DF], proto UDP 
(17), length 120)
    30.0.0.2.2337 > 40.0.0.2.2337: [udp sum ok] UDP, length 92

Original comment by mcl...@gmail.com on 12 May 2014 at 11:30

GoogleCodeExporter commented 9 years ago
In quicktun, the tcpdump -vvv looks as follows for the local encrypted 
interface (includes the successful ARP followed by a few successful pings)  of 
note ALSO has a BAD CHECKSUM:

19:46:46.575906 IP (tos 0x0, ttl 64, id 64657, offset 0, flags [DF], proto UDP 
(17), length 120)
    30.0.0.2.2998 > 40.0.0.2.2998: [bad udp cksum 0x4679 -> 0x6767!] UDP, length 92
19:46:46.576734 IP (tos 0x0, ttl 63, id 4742, offset 0, flags [DF], proto UDP 
(17), length 120)
    40.0.0.2.2998 > 30.0.0.2.2998: [udp sum ok] UDP, length 92
19:46:46.576961 IP (tos 0x0, ttl 64, id 64658, offset 0, flags [DF], proto UDP 
(17), length 134)
    30.0.0.2.2998 > 40.0.0.2.2998: [bad udp cksum 0x4687 -> 0x87e9!] UDP, length 106
19:46:46.577646 IP (tos 0x0, ttl 63, id 4743, offset 0, flags [DF], proto UDP 
(17), length 134)
    40.0.0.2.2998 > 30.0.0.2.2998: [udp sum ok] UDP, length 106
19:46:47.579546 IP (tos 0x0, ttl 64, id 64659, offset 0, flags [DF], proto UDP 
(17), length 134)
    30.0.0.2.2998 > 40.0.0.2.2998: [bad udp cksum 0x4687 -> 0x21be!] UDP, length 106
19:46:47.581089 IP (tos 0x0, ttl 63, id 4744, offset 0, flags [DF], proto UDP 
(17), length 134)
    40.0.0.2.2998 > 30.0.0.2.2998: [udp sum ok] UDP, length 106
19:46:51.578936 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 30.0.0.1 
tell 30.0.0.2, length 28
19:46:51.579343 ARP, Ethernet (len 6), IPv4 (len 4), Reply 30.0.0.1 is-at 
00:0c:29:45:3f:86 (oui Unknown), length 46
19:46:51.588306 IP (tos 0x0, ttl 63, id 4745, offset 0, flags [DF], proto UDP 
(17), length 120)
    40.0.0.2.2998 > 30.0.0.2.2998: [udp sum ok] UDP, length 92
19:46:51.590214 IP (tos 0x0, ttl 64, id 64660, offset 0, flags [DF], proto UDP 
(17), length 120)
    30.0.0.2.2998 > 40.0.0.2.2998: [bad udp cksum 0x4679 -> 0x2453!] UDP, length 92

And the corresponding traffic on the remote end of the encrypted interface:

19:46:46.576244 IP (tos 0x0, ttl 63, id 64657, offset 0, flags [DF], proto UDP 
(17), length 120)
    30.0.0.2.2998 > 40.0.0.2.2998: [udp sum ok] UDP, length 92
19:46:46.576599 IP (tos 0x0, ttl 64, id 4742, offset 0, flags [DF], proto UDP 
(17), length 120)
    40.0.0.2.2998 > 30.0.0.2.2998: [bad udp cksum 0x4679 -> 0x0924!] UDP, length 92
19:46:46.577200 IP (tos 0x0, ttl 63, id 64658, offset 0, flags [DF], proto UDP 
(17), length 134)
    30.0.0.2.2998 > 40.0.0.2.2998: [udp sum ok] UDP, length 106
19:46:46.577524 IP (tos 0x0, ttl 64, id 4743, offset 0, flags [DF], proto UDP 
(17), length 134)
    40.0.0.2.2998 > 30.0.0.2.2998: [bad udp cksum 0x4687 -> 0xef4a!] UDP, length 106
19:46:47.580167 IP (tos 0x0, ttl 63, id 64659, offset 0, flags [DF], proto UDP 
(17), length 134)
    30.0.0.2.2998 > 40.0.0.2.2998: [udp sum ok] UDP, length 106
19:46:47.580819 IP (tos 0x0, ttl 64, id 4744, offset 0, flags [DF], proto UDP 
(17), length 134)
    40.0.0.2.2998 > 30.0.0.2.2998: [bad udp cksum 0x4687 -> 0x03e4!] UDP, length 106

For the actual tunnel interfaces, you see proper ARP and ping traffic, and 
sizes do not differ on either end. 

Original comment by mcl...@gmail.com on 12 May 2014 at 11:49

GoogleCodeExporter commented 9 years ago
Hi Neil -

Any ideas as to what might be going on?

Thanks!

Original comment by mcl...@gmail.com on 19 May 2014 at 3:55

GoogleCodeExporter commented 9 years ago
I'm still doing some investigating with what's going on but without much luck 
so far. I'll let you know if I come up with anything.

Original comment by neilalexanderr on 27 May 2014 at 4:44

GoogleCodeExporter commented 9 years ago
OK, thanks for looking into it and let me know if I can do anything else to 
help out.

Original comment by mcl...@gmail.com on 27 May 2014 at 4:46

GoogleCodeExporter commented 9 years ago
I have made some changes to the boundaries in the nacltai code, which I think 
may have fixed the issue. Can you please try pulling the latest source from git 
and feed back? Thanks for your patience.

Original comment by neilalexanderr on 27 May 2014 at 10:43

GoogleCodeExporter commented 9 years ago
Thanks!  Initial testing seems to show that this now works.  I'll be doing more 
testing today and keep you posted.

Original comment by mcl...@gmail.com on 29 May 2014 at 12:04