radvd-project / radvd

radvd | Official repository: https://github.com/radvd-project/radvd
https://radvd.litech.org/
Other
203 stars 107 forks source link

radvd sends old MAC address in LL option when MAC changes #9

Closed bernhardschmidt closed 10 years ago

bernhardschmidt commented 11 years ago

Originally reported to Debian at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=508466

Interfaces can occasionally change MAC addresses in-flght. The most common example are Linux bridges, that are created using a random MAC and then change the MAC address to the first interface added to the bridge.

If radvd is started before the first interface is added, it learns the old MAC address of the bridge and continues to advertise it in the LL addr option, overwriting the entry in the client's neighbor cache at every advertisement. This creates blackhole periods of about 10-20 seconds each time.

The bug is at least present in 1.8.5. I'm going to test 1.9 later, but since there is nothing in the changes suggesting a fix I assume it is still affected.

bernhardschmidt commented 11 years ago

Sorry, too fast. 1.8.5 sends from the wrong MAC (the old one) in the L2 header, but has the updated MAC in the RA option. At least in my tests here. So something has changed.

I just tested this case which was my usual problem from the past

It used to be

at this point, which broke all attached VMs horribly. The current behaviour is still ugly, since the box is sending traffic from a MAC address it doesn't own. Theoretically this might trigger security alerts, but in those cases one should probably set a fixed MAC anyway.

reubenhwk commented 11 years ago

Hi Bernhard,

Do you know how to fix it?

On Sun, May 12, 2013 at 12:49 PM, Bernhard Schmidt <notifications@github.com

wrote:

Sorry, too fast. 1.8.5 sends from the wrong MAC (the old one) in the L2 header, but has the updated MAC in the RA option. At least in my tests here. So something has changed.

I just tested this case which was my usual problem from the past

  • create bridge vmbr0, it gets the MAC 8a:7c:c4:8a:95:5d and the LL address fe80::887c:c4ff:fe8a:955d
  • add tap-vm1 with MAC fe:16:3e:7c:30:aa, MAC of vmbr0 changes as well
  • start radvd
    • RA has L2 source fe:16:3e:7c:30:aa, L3 source fe80::887c:c4ff:fe8a:955d, ICMPv6 option fe:16:3e:7c:30:aa
    • add tap-vm2 with MAC fe:16:3e:7f:56:9f
    • RA has L2 source fe:16:3e:7c:30:aa, L3 source fe80::887c:c4ff:fe8a:955d, ICMPv6 option fe:16:3e:7c:30:aa
    • remove tap-vm1 (shutdown of guest), vmbr0 changes MAC to fe:16:3e:7f:56:9f
    • RA has L2 source fe:16:3e:7c:30:aa, L3 source fe80::887c:c4ff:fe8a:955d, ICMPv6 option fe:16:3e:7f:56:9f

It used to be

  • RA has L2 source fe:16:3e:7c:30:aa, L3 source fe80::887c:c4ff:fe8a:955d, ICMPv6 option fe:16:3e:7c:30:aa

at this point, which broke all attached VMs horribly. The current behaviour is still ugly, since the box is sending traffic from a MAC address it doesn't own. Theoretically this might trigger security alerts, but in those cases one should probably set a fixed MAC anyway.

— Reply to this email directly or view it on GitHubhttps://github.com/reubenhwk/radvd/issues/9#issuecomment-17784144 .

reubenhwk commented 11 years ago

Hi Bernhard,

On Sun, May 12, 2013 at 12:49 PM, Bernhard Schmidt <notifications@github.com

wrote:

Sorry, too fast. 1.8.5 sends from the wrong MAC (the old one) in the L2 header, but has the updated MAC in the RA option. At least in my tests here. So something has changed.

I just tested this case which was my usual problem from the past

  • create bridge vmbr0, it gets the MAC 8a:7c:c4:8a:95:5d and the LL address fe80::887c:c4ff:fe8a:955d
  • add tap-vm1 with MAC fe:16:3e:7c:30:aa, MAC of vmbr0 changes as well
  • start radvd
    • RA has L2 source fe:16:3e:7c:30:aa, L3 source fe80::887c:c4ff:fe8a:955d, ICMPv6 option fe:16:3e:7c:30:aa
    • add tap-vm2 with MAC fe:16:3e:7f:56:9f
    • RA has L2 source fe:16:3e:7c:30:aa, L3 source fe80::887c:c4ff:fe8a:955d, ICMPv6 option fe:16:3e:7c:30:aa
    • remove tap-vm1 (shutdown of guest), vmbr0 changes MAC to fe:16:3e:7f:56:9f
    • RA has L2 source fe:16:3e:7c:30:aa, L3 source fe80::887c:c4ff:fe8a:955d, ICMPv6 option fe:16:3e:7f:56:9f

It used to be

  • RA has L2 source fe:16:3e:7c:30:aa, L3 source fe80::887c:c4ff:fe8a:955d, ICMPv6 option fe:16:3e:7c:30:aa

at this point, which broke all attached VMs horribly. The current behaviour is still ugly, since the box is sending traffic from a MAC address it doesn't own. Theoretically this might trigger security alerts, but in those cases one should probably set a fixed MAC anyway.

— Reply to this email directly or view it on GitHubhttps://github.com/reubenhwk/radvd/issues/9#issuecomment-17784144 .

reubenhwk commented 11 years ago

Do you know how to fix it?

On Wed, May 15, 2013 at 6:29 AM, Reuben Hawkins reubenhwk@gmail.com wrote:

Hi Bernhard,

On Sun, May 12, 2013 at 12:49 PM, Bernhard Schmidt < notifications@github.com> wrote:

Sorry, too fast. 1.8.5 sends from the wrong MAC (the old one) in the L2 header, but has the updated MAC in the RA option. At least in my tests here. So something has changed.

I just tested this case which was my usual problem from the past

  • create bridge vmbr0, it gets the MAC 8a:7c:c4:8a:95:5d and the LL address fe80::887c:c4ff:fe8a:955d
  • add tap-vm1 with MAC fe:16:3e:7c:30:aa, MAC of vmbr0 changes as well
  • start radvd
    • RA has L2 source fe:16:3e:7c:30:aa, L3 source fe80::887c:c4ff:fe8a:955d, ICMPv6 option fe:16:3e:7c:30:aa
    • add tap-vm2 with MAC fe:16:3e:7f:56:9f
    • RA has L2 source fe:16:3e:7c:30:aa, L3 source fe80::887c:c4ff:fe8a:955d, ICMPv6 option fe:16:3e:7c:30:aa
    • remove tap-vm1 (shutdown of guest), vmbr0 changes MAC to fe:16:3e:7f:56:9f
    • RA has L2 source fe:16:3e:7c:30:aa, L3 source fe80::887c:c4ff:fe8a:955d, ICMPv6 option fe:16:3e:7f:56:9f

It used to be

  • RA has L2 source fe:16:3e:7c:30:aa, L3 source fe80::887c:c4ff:fe8a:955d, ICMPv6 option fe:16:3e:7c:30:aa

at this point, which broke all attached VMs horribly. The current behaviour is still ugly, since the box is sending traffic from a MAC address it doesn't own. Theoretically this might trigger security alerts, but in those cases one should probably set a fixed MAC anyway.

— Reply to this email directly or view it on GitHubhttps://github.com/reubenhwk/radvd/issues/9#issuecomment-17784144 .

danrl commented 11 years ago

Sorry to joining the discussion without being asked for, but I follow the project a little since I often work with radvd. Is there any code in radvd that fetches the source link-layer address? Without fetching the current address and comparing it to an previously fetched one, we are not even able to detect this change right now, are we?

I'd be happy to provide a patch, but I have no clue were to start. Can you give me a hint were to look at?

reubenhwk commented 11 years ago

Hi Dan,

Thanks for joining in. In the latest commit on the master branch on github, I renamed setup_device_info to update_device_info and I'm calling it in send.c. I didn't test this (other than just to make sure radvd still works in general), but I'm not sure it'll solve the problem, but that may be a good place to start.

Here's the commit..

https://github.com/reubenhwk/radvd/commit/8784bc4f9411d166ba54291942abfa5cce388576

I'm only guessing, but I suspect the problem may be deeper than this. Can you tell if update_device_info and send.c is updating the L2 or the L3 mac address? From Bernhard's email, it seems like the L3 is updated by the L2 is not.

Let me know what you find.

Thanks, Reuben

On Wed, May 15, 2013 at 10:55 AM, Dan Luedtke notifications@github.comwrote:

Sorry to joining the discussion without being asked for, but I follow the project a little since I often work with radvd. Is there any code in radvd that fetches the source link-layer address? Without fetching the current address and comparing it to an previously fetched one, we are not even able to detect this change right now, are we?

I'd be happy to provide a patch, but I have no clue were to start. Can you give me a hint were to look at?

— Reply to this email directly or view it on GitHubhttps://github.com/reubenhwk/radvd/issues/9#issuecomment-17955004 .

danrl commented 11 years ago

Hi all,

So, here is what I found out. No code yet, since I am on the road using a mobile phone. I'll provide a patch later.

struct Interface gets updated by update_device_info() in device-linux.c (or bsd equivalent) using ioctl() for low level socket access. ioctl() is called with SIOCGIFMTU set. But should also be called with SIOCGIFHWADDR set to update the hwaddr as well. That should solve the problem theoretically.

EDIT: I overlooked line 49, where SIOCGIFHWADDR is already used. I will run a few tests to answer your question.

Cheers

Dan

reubenhwk commented 11 years ago

Hi Dan,

It looks like ioctl is called with SIOCGIIFHWADDR just after the MTU is checked. I wondering is this is where the correct L3 address is picked up. But is this also going to fix the L2 problem?

On Thu, May 16, 2013 at 10:43 PM, Dan Luedtke notifications@github.comwrote:

Hi all,

So, here is what I found out. No code yet, since I am on the road using a mobile phone. I'll provide a patch later.

struct Interface gets updated by update_device_info() in device-linux.c (or bsd equivalent) using ioctl() for low level socket access. ioctl() is called with SIOCGIFMTU set. But should also be called with SIOCGIFHWADDR set to update the hwaddr as well. That should solve the problem theoretically. ioctl() in radvd was last changed about 8 years ago, I guess chaning hwaddresses weren't much of a problem back then. cf67f85https://github.com/reubenhwk/radvd/commit/cf67f85c3cc2d45fa6ddce2abeb4df46f0f446a4

Cheers

Dan

— Reply to this email directly or view it on GitHubhttps://github.com/reubenhwk/radvd/issues/9#issuecomment-18044774 .

danrl commented 11 years ago

I finally got the time to run some tests. Here are the results:

[root@tunafish danrl]# uname -a
Linux tunafish 3.9.0+ #7 SMP Sat May 4 08:59:39 CEST 2013 x86_64 GNU/Linux
[root@tunafish radvd]# cat radvd.conf
interface enp0s25 {
    AdvSendAdvert on;
    MinRtrAdvInterval 3;
    MaxRtrAdvInterval 10;
    prefix 2001:db8:1:0::/64 {
        AdvOnLink on;
        AdvAutonomous on;
        AdvRouterAddr off;
    };
};
[root@tunafish danrl]# ip link show dev enp0s25
2: enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 00:12:34:56:78:aa brd ff:ff:ff:ff:ff:ff

[root@tunafish radvd]# ./radvd --nodaemon -debug=5 --config=radvd.conf

RESULT: RA is sent from 00:12:34:56:78:aa with correct LLAddr

[root@tunafish danrl]# ip link set address 00:12:34:56:78:bb dev enp0s25
[root@tunafish danrl]# ip link show dev enp0s25
2: enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 00:12:34:56:78:bb brd ff:ff:ff:ff:ff:ff

RESULT: RA is sent from 00:12:34:56:78:bb with correct LLAddr

Note: The source link-local address did not change! This is correct, since the OS did no auto(re)configuration of the link-local address.

I vote for "closed".

PS: While digging through the code got me thinking how https://github.com/reubenhwk/radvd/blob/2e9ac5930b0d300cc7c76658dc1bdaf4ecdd441a/TODO#L27 could be implemented. It looks like a lot of work to me. Are there any ideas yet?

reubenhwk commented 11 years ago

On Sat, May 18, 2013 at 1:50 PM, Dan Luedtke notifications@github.comwrote:

I finally got the time to run some tests. Here are the results:

[root@tunafish danrl]# uname -a Linux tunafish 3.9.0+ #7 SMP Sat May 4 08:59:39 CEST 2013 x86_64 GNU/Linux [root@tunafish radvd]# cat radvd.conf interface enp0s25 { AdvSendAdvert on; MinRtrAdvInterval 3; MaxRtrAdvInterval 10; prefix 2001:db8:1:0::/64 { AdvOnLink on; AdvAutonomous on; AdvRouterAddr off; }; }; [root@tunafish danrl]# ip link show dev enp0s25 2: enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000 link/ether 00:12:34:56:78:aa brd ff:ff:ff:ff:ff:ff

[root@tunafish radvd]# ./radvd --nodaemon -debug=5 --config=radvd.conf

RESULT: RA is sent from 00:12:34:56:78:aa with correct LLAddr

[root@tunafish danrl]# ip link set address 00:12:34:56:78:bb dev enp0s25 [root@tunafish danrl]# ip link show dev enp0s25 2: enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000 link/ether 00:12:34:56:78:bb brd ff:ff:ff:ff:ff:ff

RESULT: RA is sent from 00:12:34:56:78:bb with correct LLAddr

Note: The source link-local address did not change! This is correct, since the OS did no auto(re)configuration of the link-local address.

I vote for "closed".

Bernhard, can you verify?

PS: While digging through the code got me thinking how https://github.com/reubenhwk/radvd/blob/2e9ac5930b0d300cc7c76658dc1bdaf4ecdd441a/TODO#L27could be implemented. It looks like a lot of work to me. Are there any ideas yet?

I've been making attempts in bursts spaced out by long delays to organize the code better. For example, send use to (or maybe still does) call into reload config. This is a bit overkill and is doing way more than really needed when a single interface comes up. I think before getting to items in the TODO list, the code should be scrubbed, polished, made more efficient and reduce what it's actually doing to only what it needs to do.

— Reply to this email directly or view it on GitHubhttps://github.com/reubenhwk/radvd/issues/9#issuecomment-18107771 .

bernhardschmidt commented 11 years ago

I could not reproduce with Kernel 3.8 (Debian sid) and either git HEAD or even git tag release_1_8_5 . I can see that issue with Kernel 3.2 and radvd 1.8.5-1 from a Debian package. I'm going to either temporarily downgrade my workstation or get the source on my server that originally showed this problem.

bernhardschmidt commented 11 years ago

Okay, I feel a bit smarter (and a lot dumber for making such a noise) after putting some work into it.

First of all, the originally reported bug (radvd keeps sending the wrong LLaddr in L3) was present in release_1_7 and gone in release_1_8. I will tag the Debian bug accordingly.

The other bug (the wrong L2 source address) is there, but a lot less significant than I originally thought. Even current HEAD sends a lot of unsolicited RAs when the MAC address changes.

If I counted correctly, if radvd is run as root it sends 11 RAs, three with the old MAC in both L2 and L3, and eight with the new MAC in both L2 and L3. However, if you run with privsep (-u radvd), as Debian does it, it sends 15 RAs: Two with the old MAC in L2 and L3, one with the old MAC in L2 and the new MAC in L3, and 12 RAs with the new MAC in L2 and L3. See http://pastebin.com/GVmn4jXQ . I somehow focused on that one "wrong" packet in between.