sbyx / hnetd

HomeNet-CP implementation (WIP)
Apache License 2.0
39 stars 16 forks source link

trying to bring up hnetd on ubuntu part 1: injecting RAs causes some packet loss #24

Open dtaht opened 9 years ago

dtaht commented 9 years ago

I got a string of issues (Starting with I dont think the cable modem is routing the delegation, but I am working on that separately), watching hnetd go...

What I see is behavior like this, every 30 seconds or so the ra is getting refreshed and during that interval packets are lost.

root@ranger:~# [dhcpv6.script] eth2 updated

root@ranger:~# ping6 www.bufferbloat.net[dhcpv6.script] eth2 ra-updated

PING www.bufferbloat.net(shipka.bufferbloat.net) 56 data bytes From shipka.bufferbloat.net icmp_seq=1 Destination unreachable: Unknown code 5 From shipka.bufferbloat.net icmp_seq=2 Destination unreachable: Unknown code 5 64 bytes from shipka.bufferbloat.net: icmp_seq=3 ttl=56 time=13.7 ms 64 bytes from shipka.bufferbloat.net: icmp_seq=4 ttl=56 time=13.5 ms 64 bytes from shipka.bufferbloat.net: icmp_seq=5 ttl=56 time=12.0 ms 64 bytes from shipka.bufferbloat.net: icmp_seq=6 ttl=56 time=14.1 ms

Tried it again a few seconds later:

rtt min/avg/max/mdev = 12.045/13.429/14.147/0.721 ms root@ranger:~# ping6 www.bufferbloat.net PING www.bufferbloat.net(shipka.bufferbloat.net) 56 data bytes 64 bytes from shipka.bufferbloat.net: icmp_seq=1 ttl=56 time=13.7 ms 64 bytes from shipka.bufferbloat.net: icmp_seq=2 ttl=56 time=13.4 ms [dhcpv6.script] eth2 ra-updated From shipka.bufferbloat.net icmp_seq=3 Destination unreachable: Unknown code 5 From shipka.bufferbloat.net icmp_seq=4 Destination unreachable: Unknown code 5 64 bytes from shipka.bufferbloat.net: icmp_seq=5 ttl=56 time=13.6 ms

fingon commented 9 years ago

This seems like client-side issue; DHCPv6 client is not using ip -6 route replace (or change), but instead doing del + add, and for some reason that seems to take second or two to be processed..

fingon commented 9 years ago

Aha. We identified problem in the bundled (generic) dhcpv6.script at least (it removes + adds addresses -> potential time for NUD etc). Given OpenWrt client, this should not happen, but reopening and giving it to Steven ;)

sbyx commented 9 years ago

I tried to optimize this bit and not flush all addresses in general: https://github.com/sbyx/hnetd/commit/3f51f1f73df8c40c319ea576f17af491d7432754

sbyx commented 9 years ago

Some user also just found a bug in odhcp6c which could result in similar behavior but the bug only occurs if the DHCPv6 server hands out addresses with low lifetimes (e.g. about 1 minute). Its supposed to be fixed now.

fingon commented 9 years ago

I will declare victory. Reopen if the problem persists with sbyx's patch ;)

dtaht commented 9 years ago

Nope.

Built chaos calmer last night, tried it. It certainly is stabler... but I never get an ra on the delegated interfaces and although the dhcpv6 prefix assigned has a duration of weeks, it is still refreshing on a less than minute interval. I can give access to this box if needed.

every 15 seconds I get a message like this

Sun Dec 21 18:53:53 2014 daemon.info hnetd[1911]: platform: interface update for eth0 detected Sun Dec 21 18:53:53 2014 daemon.info hnetd[1911]: platform: interface update for eth0 detected Sun Dec 21 18:53:53 2014 daemon.info hnetd[1911]: platform: interface update for wlan0 detected Sun Dec 21 18:53:53 2014 daemon.info hnetd[1911]: platform: interface update for wlan1 detected Sun Dec 21 18:53:53 2014 daemon.info hnetd[1911]: platform: interface update for wlan2 detected Sun Dec 21 18:53:54 2014 daemon.warn odhcpd[1004]: A default route is present but there is no public prefix on wlan0 thus we don't announce a default route! Sun Dec 21 18:54:01 2014 authpriv.info dropbear[10971]: Child connection from 172.21.0.219:37244 Sun Dec 21 18:54:01 2014 daemon.info hnetd[1911]: platform: interface update for eth1 detected Sun Dec 21 18:54:01 2014 daemon.info hnetd[1911]: platform: interface update for lo detected Sun Dec 21 18:54:01 2014 daemon.info hnetd[1911]: platform: interface update for eth0 detected Sun Dec 21 18:54:01 2014 daemon.info hnetd[1911]: platform: interface update for eth0 detected Sun Dec 21 18:54:01 2014 daemon.info hnetd[1911]: platform: interface update for wlan0 detected

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2601:9:4e01:6de0::1/64 scope global dynamic valid_lft 48sec preferred_lft 18sec inet6 fe80::ea94:f6ff:fe91:2ea4/64 scope link valid_lft forever preferred_lft forever 10: ifb4eth0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qlen 32 inet6 fe80::a043:5aff:fe86:a326/64 scope link valid_lft forever preferred_lft forever 11: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 fe80::ea94:f6ff:fe91:2ea3/64 scope link valid_lft forever preferred_lft forever 12: wlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2601:9:4e01:6de1::1/64 scope global dynamic valid_lft 48sec preferred_lft 18sec inet6 fe80::ea94:f6ff:fe91:2ea2/64 scope link valid_lft forever preferred_lft forever 13: wlan2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2601:9:4e01:6de2::1/64 scope global dynamic valid_lft 48sec preferred_lft 18sec inet6 fe80::e894:f6ff:fe91:2ea3/64 scope link valid_lft forever preferred_lft forever

sbyx commented 9 years ago

Could have a look tomorrow but will not know more until i see if the upstream router behaves correctly. Cannot do much if the dhcpv6 server refreshes leases as often. Had something similar here: https://forum.openwrt.org/viewtopic.php?id=54502

Am 21. Dezember 2014 19:54:45 MEZ, schrieb "Dave Täht" notifications@github.com:

Nope.

Built chaos calmer last night, tried it. It certainly is stabler... but I never get an ra on the delegated interfaces and although the dhcpv6 prefix assigned has a duration of weeks, it is still refreshing on a less than minute interval. I can give access to this box if needed.

every 15 seconds I get a message like this

Sun Dec 21 18:53:53 2014 daemon.info hnetd[1911]: platform: interface update for eth0 detected Sun Dec 21 18:53:53 2014 daemon.info hnetd[1911]: platform: interface update for eth0 detected Sun Dec 21 18:53:53 2014 daemon.info hnetd[1911]: platform: interface update for wlan0 detected Sun Dec 21 18:53:53 2014 daemon.info hnetd[1911]: platform: interface update for wlan1 detected Sun Dec 21 18:53:53 2014 daemon.info hnetd[1911]: platform: interface update for wlan2 detected Sun Dec 21 18:53:54 2014 daemon.warn odhcpd[1004]: A default route is present but there is no public prefix on wlan0 thus we don't announce a default route! Sun Dec 21 18:54:01 2014 authpriv.info dropbear[10971]: Child connection from 172.21.0.219:37244 Sun Dec 21 18:54:01 2014 daemon.info hnetd[1911]: platform: interface update for eth1 detected Sun Dec 21 18:54:01 2014 daemon.info hnetd[1911]: platform: interface update for lo detected Sun Dec 21 18:54:01 2014 daemon.info hnetd[1911]: platform: interface update for eth0 detected Sun Dec 21 18:54:01 2014 daemon.info hnetd[1911]: platform: interface update for eth0 detected Sun Dec 21 18:54:01 2014 daemon.info hnetd[1911]: platform: interface update for wlan0 detected

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2601:9:4e01:6de0::1/64 scope global dynamic valid_lft 48sec preferred_lft 18sec inet6 fe80::ea94:f6ff:fe91:2ea4/64 scope link valid_lft forever preferred_lft forever 10: ifb4eth0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qlen 32 inet6 fe80::a043:5aff:fe86:a326/64 scope link valid_lft forever preferred_lft forever 11: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 fe80::ea94:f6ff:fe91:2ea3/64 scope link valid_lft forever preferred_lft forever 12: wlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2601:9:4e01:6de1::1/64 scope global dynamic valid_lft 48sec preferred_lft 18sec inet6 fe80::ea94:f6ff:fe91:2ea2/64 scope link valid_lft forever preferred_lft forever 13: wlan2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2601:9:4e01:6de2::1/64 scope global dynamic valid_lft 48sec preferred_lft 18sec inet6 fe80::e894:f6ff:fe91:2ea3/64 scope link valid_lft forever preferred_lft forever


Reply to this email directly or view it on GitHub: https://github.com/sbyx/hnetd/issues/24#issuecomment-67780252

sbyx commented 9 years ago

Also our fix here was for the generic debian/ubuntu variant. The openwrt variant should have reasonable refresh logic. Problems were mostly related to upstream routers / isps. pcaps of ras / dhcpv6 could help here.

dtaht commented 9 years ago

Well I could have another problem in not seeing ras. But shouldnt the delegation respect the preferred duration in the dhcpv6 delegation message, not the external ra?

I sent you a note privately. I cut this router off from the rest of the network. Do with it as you will.

dtaht commented 9 years ago

Mr Barth wrote me privately:

Just had look, unfortunately not much we can do about it.

Your upstream router sends RAs every 20-30 seconds with router and address lifetimes being high enough (30 minutes and 60 minutes) so we can filter here (OpenWrt will update the info once per minute).

However the problem is this.

08:30:34.280253 IP6 (hlim 128, next-header UDP (17) payload length: 165) fe80::22e5:2aff:feb8:87.dhcpv6-server > fe80::ea94:f6ff:fe91:2ea5.dhcpv6-client: [udp sum ok] dhcp6 reply (xid=7a506e (client-ID hwaddr type 1 e894f6912ea5) (IA_NA IAID:1 T1:15 T2:22 (IA_ADDR 2601:9:4e01:6d00:ea94:f6ff:fe91:2ea5 pltime:30 vltime:60)) (IA_PD IAID:1 T1:15 T2:22 (IA_PD-prefix 2601:9:4e01:6de0::/60 pltime:30 vltime:60)) (DNS-server cdns01.comcast.net cdns02.comcast.net) (server-ID hwaddr type 1 20e52ab80087))

As you can see the server sets T1 for IA_NA and IA_PD to 15s which means that the OpenWrt router has to send renews every 15 seconds and get's new state every 15 seconds (i.e. address timers are bumped etc.) Also the address lifetimes are pretty low 60/30 so we could not really skip updates to OpenWrt even if we wanted to.

The downstream addressing issue is caused by this: OpenWrt's prefix assignment and hnet's conflict with each other so hnet only picks up prefixes from the wan6 interface if it has "option delegate 0" set thus ignores the prefix in this case (if the upstream interface has proto=hnet then it does that automatically).

So you either have to put all downstream interface as proto hnet and let hnet handle prefix assignment / delegation by making the upstream interface also proto=hnet OR adding "option delegate 1". Alternatively use option delegate 1 (default) and let OpenWrt handle delegation exclusively but then hnet won't work with it, both at the same time doesn't work (yet?).

sbyx commented 9 years ago

Yeah maybe adding to this: the client supports DHCPv6 reconfigure as is required by RFC 7084 so I don't see the point of excessively low timeouts and T1 / T2 values by the server. It's not really a bug though, just causes constant inconvient updates throughout the network.

You noted something with reqprefix being 56 and a /60 being delegated somewhere? Can you elaborate on that? In general we cannot do much about what the provider delegates to you.

dtaht commented 9 years ago

I swear I had seen it set IA_NA and IA_PD for the more sane "week" at some point.

Also, I have reqprefix of 56, but I seem to be getting a /60, yet I have a /56 defined in the routing table.

As for setting hnet vs openwrt methods on by default, the whole point of the test was to get hnet to work for the first time, so I am going to go change this and reboot.

dtaht commented 9 years ago

I enabled hnet universally and get no addresses assigned at all. (I left myself a backdoor via the wlan, tho)

In logread:

Mon Dec 22 13:23:32 2014 user.notice proto-hnet: proto_hnet_setup eth1/lan Mon Dec 22 13:23:32 2014 user.notice proto-hnet: Ignoring hnet on 'lan' and 'wan'. Please rename your interface to avoid conflicts.

My configuration.

config interface 'lan'
option ifname 'eth1'
option proto 'hnet'
option ip6assign '64'
option defaultroute '0'
option ipaddr '172.21.0.22'
option netmask '255.255.255.0'

config interface 'wan'
option ifname 'eth0'
option proto 'hnet'
option mode 'auto'
option ip6assign '64'
option ip4assign '24'

config interface '@wan6'

option ifname 'eth0'

option _orig_ifname 'eth0'

option _orig_bridge 'false'

option proto 'hnet'

option mode 'auto'

option ip6assign '64'

option ip4assign '24'

sbyx commented 9 years ago

"Ignoring hnet on 'lan' and 'wan'. Please rename your interface to avoid conflicts." Can you please rename "lan" and "wan" to something else, doesn't matter what really, e.g. "hnet-lan" and "hnet-wan".

dtaht commented 9 years ago

I had had a reqprefix of 56, and the routing table showed a /56, when the actual delegation was a /60.

dtaht commented 9 years ago

change them there and through all the firewall code???

sbyx commented 9 years ago

The firewall zones still stay "lan" and "wan" so this is fine and your rules should (most likely) stay the same just the interface names change.

We had some issues with other OpenWrt components assuming interface names "lan" and "wan" to be special that's why we added this check. You could optionally try to remove the check in /lib/netifd/proto/hnet.sh and see if it works out, maybe these issues are gone.

As for the /56 don't know where that comes from can you paste the route entry from ip?

dtaht commented 9 years ago

I changed the zones to reflect what you asked. I did get an ipv6 address on the outside network, but it also picked up a default route from the internal network, ignoring my defaultroute 0 on the inside network.

root@yurtlabtest:~# ip route default via 10.1.10.1 dev eth0 proto static src 10.1.10.10 metric 1002 default via 172.21.0.1 dev eth1 proto static src 172.21.0.22 metric 1003

It then propagated that to my universe via babel.

I deleted the second route, but still had no connectivity from the rest of the network, so had to nuke it to get back online. Not sure if it was a routing loop or a firewall problem. (the outside address is also natting, sigh)

And it did not assign any addresses to to eth1.

config interface 'hnetlan' option ifname 'eth1' option proto 'hnet' option ip6assign '64' option defaultroute '0' option ipaddr '172.21.0.22' option netmask '255.255.255.0'

config interface 'hnetwan' option ifname 'eth0' option proto 'hnet' option mode 'auto' option ip6assign '64' option ip4assign '24' option reqprefix 56

config interface '@wan6'

option ifname 'eth0'

option _orig_ifname 'eth0'

option _orig_bridge 'false'

option proto 'hnet'

option mode 'auto'

option ip6assign '64'

option ip4assign '24'

sbyx commented 9 years ago

I assume hnet had fun with babel then since hnet's soon-to-be-removed internal routing would actually honor defaultroute=0. Wondering where that default route comes from.

On the /56 route matter I assume its the uplink router announcing itself as delegated router via a Route Information Option in the RA which would be valid. Check if the /56 route das a gateway to said upstream router, if so then thats valid. In this case the upstream router has the /56 but doesn't want you to delegate more than a /60 for whatever reason. The only case where the /56 would be of concern is if that was a local blackhole route but I don't see how this could happen.

fingon commented 9 years ago

To me it sounds as if you had some non-homenet-aware DHCP server on eth1 and it does (completely logically) assume it is upstream and therefore does no assignments on it. If so, you could try using some hnet proto sub-category which does not allow for upstream there..?

dtaht commented 9 years ago

If the /56 bit is important, we can break that out into a separate bug. Tackling that first, what I see is this:

root@yurtlabtest:~# ip -6 route | grep /60 default from 2601:9:4e00:9d0::/60 via fe80::2ac6:8eff:febb:9ff0 dev wlan0 proto 42 metric 1024 default from 2601:9:4e01:6de0::/60 via fe80::22e5:2aff:feb8:87 dev eth0 proto static metric 1024 default from fde5:dfb9:df90:fff0::/60 via fe80::2ac6:8eff:febb:9ff0 dev wlan0 proto 42 metric 1024 2601:9:4e00:9d0::/60 via fe80::2ac6:8eff:febb:9ff0 dev wlan0 proto 42 metric 1024 unreachable 2601:9:4e00:9d0::/60 dev lo proto 42 metric 4294967295 error -128 2601:9:4e01:6d00::/56 from 2601:9:4e01:6de0::/60 via fe80::22e5:2aff:feb8:87 dev eth0 proto static metric 1024 unreachable 2601:9:4e01:6de0::/60 dev lo proto static metric 1000000000 error -128 unreachable 2601:9:4e01:6de0::/60 dev lo proto static metric 2147483647 error -128 root@yurtlabtest:~# ip -6 route | grep /56 2601:9:4e01:6d00::/56 from :: via fe80::22e5:2aff:feb8:87 dev eth0 proto static metric 1024 2601:9:4e01:6d00::/56 from 2601:9:4e01:6d00:ea94:f6ff:fe91:2ea5 via fe80::22e5:2aff:feb8:87 dev eth0 proto static metric 1024 2601:9:4e01:6d00::/56 from 2601:9:4e01:6de0::/60 via fe80::22e5:2aff:feb8:87 dev eth0 proto static metric 1024

config interface 'hnetlan' option ifname 'eth1' option proto 'hnet' option ip6assign '64' option defaultroute '0' option ipaddr '172.21.0.22' option netmask '255.255.255.0' option mode 'adhoc' option ip4assign '24'

config interface 'hnetwan' option ifname 'eth0' option proto 'hnet' option mode 'auto' option ip6assign '64' option ip4assign '24' option reqprefix '56'

sbyx commented 9 years ago

yeah that's what i thought /56 comes from a Route Information Option in the RA of the upstream router so its correct.

dtaht commented 9 years ago

as for what happens when I did it that way, well, I still ended up unroutable via ipv4, I did get ipv6 addresses assigned to the hnet interfaces, I did not get RAs or addresses assigned to clients downstream of eth1.

It also borked my babel configuration badly. I have never seen it do this...

example:

172.20.5.0/24 via 172.21.2.224 dev wlan0 proto 42 onlink unreachable 172.20.5.0/24 proto 42 metric 4294967295 onlink
172.20.6.0/24 via 172.21.2.224 dev wlan0 proto 42 onlink unreachable 172.20.6.0/24 proto 42 metric 4294967295 onlink 172.20.6.1 via 172.21.2.224 dev wlan0 proto 42 onlink unreachable 172.20.6.1 proto 42 metric 4294967295 onlink 172.20.7.0/24 via 172.21.2.224 dev wlan0 proto 42 onlink unreachable 172.20.7.0/24 proto 42 metric 4294967295 onlink 172.20.8.0/24 via 172.21.2.224 dev wlan0 proto 42 onlink unreachable 172.20.8.0/24 proto 42 metric 4294967295 onlink 172.20.20.0/24 via 172.21.2.224 dev wlan0 proto 42 onlink unreachable 172.20.20.0/24 proto 42 metric 4294967295 onlink 172.20.47.0/24 via 172.21.2.224 dev wlan0 proto 42 onlink

dtaht commented 9 years ago

Going back to the /56 comment.... what you are saying is that I do have a full /56 available, however the router is only using a /60 for its own purposes?

Or are you saying that the /60 is enforced by the server on the other side (as in only allowing one /60 per router requesting out of the customer's /56 prefix range).

If the latter, then, well, hnetd's premises need to be rethunk, I think, if you want to then actually use the rest of the darn 56 you would have to add relaying in from other routers on the network rather than allowing hnetd to be the master. ?

Or (sigh) connect every router directly to the cable modem. Which makes (the so desparately needed on a cable modem) bandwidth management impossible with todays tools.

sbyx commented 9 years ago

Well whatever you connect the OpenWrt router to (ISP, some hipnet CPE, whatever else) decides how much of a prefix is going to be delegated to the OpenWrt router whether it runs homenet, hipnet or is just an ordinary IPv6 router. So that upstream router / CPE has a /56 and is delegated for it, but only seems to gives out /60s to subsequent routers even if there are other hints. I mean it probably cannot give out a full /56 since it needs some addresses for itself but probably could hand out a /57 or /58 or whatever. Maybe there is a config switch somewhere on said upstream router. Nothing we can do about it really, I mean we could try to fire up multiple DHCPv6 clients and try to get multipe /60 and do dirty magic but there is a limit as to how far we can arrange ourselves with upstream routers / ISPs that don't do what we want.

dtaht commented 9 years ago

I am disabling the router and going back to bed and pulling a pillow over my head. Emailed john to comment.

dtaht commented 9 years ago

and to finish up here, after reverting to use the dhcpv6 client, the native-non-hnetd methods - and rebooting, the cable modem is showing prefixes as available then not a ms later, as per whatever other bug that was. (I have duplicated this happening with both dibbler and isc dhclient, btw). So although there may be a /56 available another daemon cant seem to get anything using a different duid.

/me goes back to bed.