Closed robbat2 closed 7 years ago
If UnicastOnly=on
is set, then the flood does not happen.
Hmm.. I can picture the code, and that makes sense, and it should be fixed. One of the problems is that the RA is randomly delayed, so once radvd gets an RS, it just schedules the RA to happen after a short delay, but doesn't remember the RS. So the info in the RS needs to be linked to the RA somehow so it known to whom it should send the RA. The schedule/delay queue will just need to store the source of the RS, but this is where it can get complicated. Currently, only one RA is scheduled. If 0, 1, or N RS's come in, still just one RA is scheduled. This is fine because the RA is multicast to the all-nodes group. So the RA timer queue will need to have one slot per client in the client list (I guess).
Kinda just thinking out loud here, I'll look into the code more later when I get time. If you want to take a stab at fixing it, please do. :)
See RFC 7772 section 2.1: https://tools.ietf.org/html/rfc7772#section-2.1
In the case of a large network, this leads to a all-nodes
RA every 3 seconds, which wakes up all of the devices on the network.
Suggested new behavior:
all-nodes
RA cycle alone (the new RA-splitting code has placeholders for scheduling RA options at different rates).That does sound like a better idea. Just to clarify...
Leave all-node multicast cycle alone? Unicast responses to RS's?
Yes, leave the multicast cycle entirely alone.
In the worst case, immediately after we send a complete set of unicast RA to a single client, we have our regularly scheduled multicast, which for this window aligns to schedule ALL packets.
The overall objective here is that it should be feasible to INCREASE the multicast interval, but still get a near-immediate unicast response to RS, without flooding the network.
Ok, this is going to be easiest to fix after RA-splitting is merged, because:
I'd like to add a +1 to this Having a periodic multicast advertisement, but unicast responses to RS, would help us also with RFC7772 issues. It would be good to be able to do this without having to configure a list of unicast clients as well. We have many battery-powered devices coming and going from a large wireless lan, and every time one connects and sends an RS, all the others get their radio woken up by the multicast RA reply.
Yes, this is still an issue; I started a patchset for it, but got sidetracked by other work.
The major issue was trying to decide on scheduling and tracking requesting nodes issues. The latter is a problem because it represents a potential DoS attack if we try to track all unicast clients (if they just spoof lots of different addresses, they can overwhelm radvd).
@spakka can you read/test out PR #69?
This PR represents the minimal fix. There are further improvements suggested in TODO for later, like deliberately deferring the unicast RA if we are getting close to the time of multicast RA; as well as forcing more multicast RAs when topology is changing.
Hi, thanks for the quick reply. I've reviewed the patch and it looks in compliance with the RFC. I tried to test it but wasn't able to generate an RS with the SLLA option, so i moved the rfc7772_unicast_response test outside the option block, to make it apply to all RSs, and it works great. Only thing is that if a unicast response is destined for a link-local address and there is no existing ND entry on the router (and the SLLA option hasn't been provided in the RS), then it triggers an ND/NA from the router to resolve the link-local address so it can form the reply RA. Anyway basically yea it works well as long as the sender sets the SLLA option. Thanks!
I can trigger it by having Linux restart an interface (libvirt instance of Ubuntu 14.04, patched quagga running on the bridge virbr0
):
11:00:10.426779 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::5054:ff:feca:a591 > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 16
source link-address option (1), length 8 (1): 52:54:00:ca:a5:91
0x0000: 5254 00ca a591
11:00:10.427420 IP6 (flowlabel 0xea489, hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::1 > fe80::5054:ff:feca:a591: [icmp6 sum ok] ICMP6, router advertisement, length 16
hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 1800s, reachable time 10000s, retrans time 5000s
In your prod environment, can you look for other RS and see how many of them do have SLLA set? If some OS doesn't properly set the SLLA option, it might be worthwhile to have a variant that doesn't require SLLA to be set, with a suitable warning that it would trigger multicast ND.
Not able to test the full prod environment as it is a meeting/conference network, and the next conference is in just over a month :) But I tested with all the clients I have nearby, here are the results:
macOS 10.12.3 sierra - sends SLLA 2 times out of 4
iOS 10.2.1 (iPhone 5s) - no SLLA
Android 7.0 (sony xperia phone) - sends SLLA
Android 5.1 (nexus 7 tablet) - sends SLLA
Ubuntu 16.04.2 LTS w/ Network Manager, running 4.9 kernel (dell laptop) - no SLLA
Windows 10 (dell laptop) - no SLLA
So it isn't universally supported, and at least the iPhone is definitely in the category of small battery-based device (actually, all the above devices are battery-powered!) Not sure why macOS sometimes sets SLLA and sometimes doesn't.
Also, here are some observations from part of the IETF discussion of the draft that lead to RFC7772: https://www.ietf.org/mail-archive/web/v6ops/current/msg22464.html
Note that you mention an option with a warning that it would trigger multicast ND from the router - on the router w/radvd that I tested, the NS was sent by the router to a solicited-node multicast address, targeting the link-local address it is trying to resolve. This is still better than sending to all-nodes, as the solicited-node multicast can be discarded by the device radio in non-targetted devices, without waking up the CPU. Whereas the all-nodes RA must wake up the CPU.
Ok, let's drop the SLLA requirement then, and trigger the NS/NA cycle. Apple iOS devices, Network Manager and Windows do represent a LOT of devices.
@spakka sounds like we should file a bug for upstream networkmanager to get RS right :-) as well. I don't know if there's any good way to submit bugs for Windows/iOS these days.
@reubenhwk are you ok with just dropping the SLLA requirement, or would you like respecting it to be a config option?
13:23:14.314745 52:54:00:cc:65:a1 > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 62: (hlim 255, next-header ICMPv6 (58) payload length: 8) fe80::5054:ff:fecc:65a1 > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 8
13:23:14.315044 52:54:00:1b:2f:1f > 33:33:ff:cc:65:a1, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::1 > ff02::1:ffcc:65a1: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::5054:ff:fecc:65a1
source link-address option (1), length 8 (1): 52:54:00:1b:2f:1f
0x0000: 5254 001b 2f1f
13:23:14.315300 52:54:00:cc:65:a1 > 52:54:00:1b:2f:1f, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5054:ff:fecc:65a1 > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fe80::5054:ff:fecc:65a1, Flags [solicited, override]
destination link-address option (2), length 8 (1): 52:54:00:cc:65:a1
0x0000: 5254 00cc 65a1
13:23:14.315318 52:54:00:1b:2f:1f > 52:54:00:cc:65:a1, ethertype IPv6 (0x86dd), length 70: (flowlabel 0xcb1e2, hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::1 > fe80::5054:ff:fecc:65a1: [icmp6 sum ok] ICMP6, router advertisement, length 16
hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 1800s, reachable time 10000s, retrans time 5000s
13:23:19.319537 52:54:00:cc:65:a1 > 52:54:00:1b:2f:1f, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5054:ff:fecc:65a1 > fe80::1: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::1
source link-address option (1), length 8 (1): 52:54:00:cc:65:a1
0x0000: 5254 00cc 65a1
13:23:19.319566 52:54:00:1b:2f:1f > 52:54:00:cc:65:a1, ethertype IPv6 (0x86dd), length 78: (hlim 255, next-header ICMPv6 (58) payload length: 24) fe80::1 > fe80::5054:ff:fecc:65a1: [icmp6 sum ok] ICMP6, neighbor advertisement, length 24, tgt is fe80::1, Flags [solicited]
I think I missed the SLLA requirement dropping question. RADVD only set it optionally. Why is it in question right now?
On Thu, Mar 16, 2017 at 1:27 PM, Robin H. Johnson notifications@github.com wrote:
Ok, let's drop the SLLA requirement then, and trigger the NS/NA cycle. Apple iOS devices, Network Manager and Windows do represent a LOT of devices.
@spakka https://github.com/spakka sounds like we should file a bug for upstream networkmanager to get RS right :-) as well. I don't know if there's any good way to submit bugs for Windows/iOS these days.
@reubenhwk https://github.com/reubenhwk are you ok with just dropping the SLLA requirement, or would you like respecting it to be a config option?
13:23:14.314745 52:54:00:cc:65:a1 > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 62: (hlim 255, next-header ICMPv6 (58) payload length: 8) fe80::5054:ff:fecc:65a1 > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 8 13:23:14.315044 52:54:00:1b:2f:1f > 33:33:ff:cc:65:a1, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::1 > ff02::1:ffcc:65a1: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::5054:ff:fecc:65a1 source link-address option (1), length 8 (1): 52:54:00:1b:2f:1f 0x0000: 5254 001b 2f1f 13:23:14.315300 52:54:00:cc:65:a1 > 52:54:00:1b:2f:1f, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5054:ff:fecc:65a1 > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fe80::5054:ff:fecc:65a1, Flags [solicited, override] destination link-address option (2), length 8 (1): 52:54:00:cc:65:a1 0x0000: 5254 00cc 65a1 13:23:14.315318 52:54:00:1b:2f:1f > 52:54:00:cc:65:a1, ethertype IPv6 (0x86dd), length 70: (flowlabel 0xcb1e2, hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::1 > fe80::5054:ff:fecc:65a1: [icmp6 sum ok] ICMP6, router advertisement, length 16 hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 1800s, reachable time 10000s, retrans time 5000s 13:23:19.319537 52:54:00:cc:65:a1 > 52:54:00:1b:2f:1f, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5054:ff:fecc:65a1 > fe80::1: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::1 source link-address option (1), length 8 (1): 52:54:00:cc:65:a1 0x0000: 5254 00cc 65a1 13:23:19.319566 52:54:00:1b:2f:1f > 52:54:00:cc:65:a1, ethertype IPv6 (0x86dd), length 78: (hlim 255, next-header ICMPv6 (58) payload length: 24) fe80::1 > fe80::5054:ff:fecc:65a1: [icmp6 sum ok] ICMP6, neighbor advertisement, length 24, tgt is fe80::1, Flags [solicited]
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/reubenhwk/radvd/issues/63#issuecomment-287181017, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMTe3nRWZnzY_yQQTkghFlKInuWOVwHks5rmZsrgaJpZM4K3Hwq .
RFC7772 5.1.1 says to qualify for a unicast RA response, the RS needs to
@spakka tested and found that lots of clients don't set an SLLA option in the RS (including Windows, NetworkManager & iOS mobile devices).
The language in the RFC does not say MUST contain an SLLA, so I propose to only have the unspecified address test.
Ah. Got it. I'm ok with dropping the requirement.
On Sat, Mar 18, 2017 at 2:51 PM, Robin H. Johnson notifications@github.com wrote:
- The PR is implemented with both of the requirements at the moment, but could be changed to just unspecified easily.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/reubenhwk/radvd/issues/63#issuecomment-287577496, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMTe6gKEtWK0UimiM8fOXduACZs6DBEks5rnFH1gaJpZM4K3Hwq .
Updated to drop the requirement.
Expected result: If a
UnicastOnly=no
instance with a fixed client list gets a RS, it should only respond to that client.Actual result: The instance sends RAs to every single client.
Effects: Hosts that did not solicit the RA are flooded by RAs when not needed.
Host setup:
radvd.conf:
Test trigger:
debug=5 logging
Network capture: