openwrt / odhcpd

This repository is a mirror of https://git.openwrt.org/?p=project/odhcpd.git. Pull requests will be accepted which will be merged in odhcpd.git
GNU General Public License v2.0
163 stars 100 forks source link

Relaying fails intermittently #15

Closed jech closed 10 years ago

jech commented 10 years ago

I'm running odcpd (version 2014-06-18-82f3096351911d8c4f3b38e7a5bbeaf75938b6b8) on an OpenWRT box with three interfaces in the lan network (unbridged). Since my ISP only gives me a /64, I'm using relay mode:

config dhcp wan6
    option dhcpv6 relay
    option ra relay
    option ndp relay
    option master 1

config dhcp 'lan'
        [...]
        option dhcpv6 'relay'                    
        option ra 'relay'                        
        option ndp relay

This usually works, but after there has been no IPv6 traffic, I'm getting timeouts on the order of a dozen of seconds or so. It's difficult to reproduce, but the one packet capture that I managed to get seemed to indicate that the OpenWRT box was sending neighbour solicitations over the wan interface rather than the wifi0 one.

I'll provide more info if I can manage to capture a better dump.

--jch

sbyx commented 10 years ago

OK, I suppose this is due to neighbor entries timing out prematurely. odhcpd internally watches the kernel neighbor cache to populate (and depopulate) its internal state of which client is on which interface. In case it doesn't (yet) know the destination interface it should try to reach them on every interface except the originating one, maybe something is broken there. I will have a look but chance are I'm not going to be able to do much about for the next 2 weeks. Sorry.

jech commented 10 years ago

Unfortunately, it looks like the outage is stable under load -- we never recover. The router's neighbour table (ip -6 neigh show) looks like this:

client-address dev wlan1  FAILED
client-address dev wlan0 lladdr client-MAC STALE
router-address dev eth1 lladdr router-MAC router REACHABLE
other-address dev wlan0 lladdr client-MAC REACHABLE
client-address dev eth0.1  FAILED

Note that the client has two IPv6 addresses (written client-address and other-address), due to the use of privacy extensions. The MAC-derived address (other-address) is working fine, while the privacy address is marked as STALE.

--jch

sbyx commented 10 years ago

OK, as you can see by the latest commits I'm in the middle of rewriting this now. Also due to the underspecific packet socket draining performance from general forwarding on unrelated interfaces. Hope to have something better ready soon.

jech commented 10 years ago

Take your time, Steve. I'll be adding to this report as I collect more data, but please don't take it as trying to get you to do stuff.

-- Juliusz

sbyx commented 10 years ago

Should be fixed finally.

jech commented 10 years ago

Nope. Same symptom — launching an IPv6 DHT causes all of my IPv6 connections to hang. Here's the state of the router's neighbour table:

2a01:e34:ec22:84a0:9d39:4cef:e851:89fe dev wlan0  FAILED
2a01:e34:ec22:84a0:9d39:4cef:e851:89fe dev wlan1 lladdr 24:77:03:1a:db:64 STALE
fe80::2677:3ff:fe1a:db64 dev wlan1 lladdr 24:77:03:1a:db:64 STALE
2a01:e34:ec22:84a0::1 dev wlan1  FAILED
fe80::e246:9aff:fe4e:9177 dev eth1 lladdr e0:46:9a:4e:91:77 STALE
2a01:e34:ec22:84a0:9d39:4cef:e851:89fe dev eth0.1  FAILED
2a01:e34:ec22:84a0:9d39:4cef:e851:89fe dev eth1  INCOMPLETE
2a01:e34:ec22:84a0::1 dev eth0.1  FAILED
2a01:e34:ec22:84a0::1 dev eth1 lladdr 00:24:d4:bf:3a:8f router STALE
fe80::224:d4ff:febf:3a8f dev eth1 lladdr 00:24:d4:bf:3a:8f router REACHABLE

This is with version 2014-08-23-24452e1e3e9adfd9d8e183db1aa589f77727f5a7

mchouque commented 9 years ago

Hello,

Was this ever fixed? I believe I have a similar issue: I have the same config than Jech, I'll try to do some capture to see what's going on.

I'm running 2014-08-23-24452e1e3e9adfd9d8e183db1aa589f77727f5a7 on Barrier Breaker.

jech commented 9 years ago

I'm no longer able to reproduce this on Chaos Calmer. Perhaps it was fixed by the kernel upgrade?

jech commented 9 years ago

Nope, the issue is back :-/