radvd-project / radvd

radvd | Official repository: https://github.com/radvd-project/radvd
https://radvd.litech.org/
Other
203 stars 107 forks source link

default route flapping: AdvDefaultLifetime vs route 0::/0 #65

Closed robbat2 closed 8 years ago

robbat2 commented 8 years ago

A non-zero AdvDefaultLifetime should not be permitted if sending a an explicit route of 0::/0 with zero lifetime, as it causes a default route flap event on the nodes receiving the RA.

This was a corner case found in further testing of RA-splitting, and reproduced it without the splitting code, I think the best way to handle it is during config file validation.

Proposed solution:

Config file for old radvd:

AdvDefaultLifetime 1800;
route 0::/0 { AdvRouteLifetime 0; };

When arriving at another node (in a single RA), leads to the following events in the kernel (from ip -6 monitor ro)

Timestamp: Sat Nov 19 19:18:27 2016 730425 usec
default via fe80::5054:ff:fe1b:2f1f dev eth0  proto ra  metric 1024 
Timestamp: Sat Nov 19 19:18:27 2016 730500 usec
Deleted default via fe80::5054:ff:fe1b:2f1f dev eth0  proto ra  metric 1024  expires 1800sec

For 75us, there was an additional default route (that could have gotten traffic depending on other default routes present).

Using the new codebase, you can put in LOTS of routes, which take non-trivial time to send and process: Config file for showing the problem further:

AdvDefaultLifetime 1800;
# LOTS of routes or options in the middle
# I used X=0..65535 "route 2001:db8:XXXX::/64 {};"
route 0::/0 { AdvRouteLifetime 0; };

The AdvDefaultLifetime is present on every RA, but the explicit route only goes out MUCH later.

This lead to the default route being altered for ~250ms in my test environment (deliberately slow VM with rate-limited network to represent a mobile device).

reubenhwk commented 8 years ago

This sounds good. Are there likely a whole lot of existing config files out there doing this? If so, I'd propose we issue a warning, and change the values set in the config file to something that won't case the flap, but don't fail. I wouldn't want people's configs to suddenly stop working when they update to a later version of RADVD.

If this is very unlikely, however, I'd just say let's fail the config file and make the user's change their settings.

Let me know your opinion.

robbat2 commented 8 years ago

I don't know how many of them are doing it, maybe we just warn then as you say.

I only found it because I had a typo in my script that was generating test configurations (generate a pile of different routes on the server, and test on the client that all of them were seen, without any other route changes), and noticed the default route flap.

robbat2 commented 8 years ago

New version does not abort anymore.