radvd-project / radvd

radvd | Official repository: https://github.com/radvd-project/radvd
https://radvd.litech.org/
Other
203 stars 106 forks source link

Fixes for race condition with IgnoreIfMissing #16

Closed neocturne closed 10 years ago

neocturne commented 10 years ago

Hi, these patches fix a race condition I frequently hit on OpenWRT when an interface configured with IgnoreIfMissing doesn't have a link-local address for a short time after it is set up.

I think there might be more strange behaviour and race conditions related to missing interfaces, but this fix should be a step in the right direction :)

reubenhwk commented 10 years ago

I like the idea, but I think the commits aren't quite right. Please find my comments on the commits in github. Let me know if you have any questions.

Thanks in advance, Reuben

On Sat, Jan 11, 2014 at 6:09 PM, NeoRaider notifications@github.com wrote:

Hi, these patches fix a race condition I frequently hit on OpenWRT when an interface configured with IgnoreIfMissing doesn't have a link-local address for a short time after it is set up.

I think there might be more strange behaviour and race conditions related to missing interfaces, but this fix should be a step in the right direction

:)

You can merge this Pull Request by running

git pull https://github.com/NeoRaider/radvd fixes

Or view, comment on, or merge it at:

https://github.com/reubenhwk/radvd/pull/16 Commit Summary

  • Mark an interface as failed if a part of its setup fails
  • Correctly handle setup_linklocal_addr failure

File Changes

Patch Links:

— Reply to this email directly or view it on GitHubhttps://github.com/reubenhwk/radvd/pull/16 .

reubenhwk commented 10 years ago

I'm starting to think the correct way to handle this may be to change the HasFailed field to IsReady and only set IsReady when we know we can send messages on the interface. We may also be able to add RTM_NEWADDR to netlink.c so interfaces will get reloaded when addresses change. At any point where the interface state is in question, we set IsReady to false so we can later attempt to reinitialize it.

On Sun, Jan 12, 2014 at 2:34 AM, Reuben Hawkins reubenhwk@gmail.com wrote:

I like the idea, but I think the commits aren't quite right. Please find my comments on the commits in github. Let me know if you have any questions.

Thanks in advance, Reuben

On Sat, Jan 11, 2014 at 6:09 PM, NeoRaider notifications@github.comwrote:

Hi, these patches fix a race condition I frequently hit on OpenWRT when an interface configured with IgnoreIfMissing doesn't have a link-local address for a short time after it is set up.

I think there might be more strange behaviour and race conditions related to missing interfaces, but this fix should be a step in the right direction

:)

You can merge this Pull Request by running

git pull https://github.com/NeoRaider/radvd fixes

Or view, comment on, or merge it at:

https://github.com/reubenhwk/radvd/pull/16 Commit Summary

  • Mark an interface as failed if a part of its setup fails
  • Correctly handle setup_linklocal_addr failure

File Changes

Patch Links:

— Reply to this email directly or view it on GitHubhttps://github.com/reubenhwk/radvd/pull/16 .

neocturne commented 10 years ago

Adding RTM_NEWADDR to netlink.c and changing HasFailed to IsReady sound reasonable. I'd also propose rewriting setup_linklocal_addr to use getifaddrs, parsing something in /proc/net is probably bad for the performance.

By the way, I have question regarding check_allrouters_membership: What condition can lead to us losing the group membership besides the interface going down and up again - which we would notice anyways through netlink? Or in other words: Does is make sense to have check_allrouters_membership, another function parsing a file in /proc/net, and that even on every sent RA?

Thanks for reviewing my patches, I'll start implementing your proposed solution.

reubenhwk commented 10 years ago

On Sun, Jan 12, 2014 at 5:37 AM, NeoRaider notifications@github.com wrote:

Adding RTM_NEWADDR to netlink.c and changing HasFailed to IsReady sound reasonable. I'd also propose rewriting setup_linklocal_addr to use getifaddrs, parsing something in /proc/net is probably bad for the performance.

I agree. I've already done this in the radvd-2 branch, using gitifaddrs to get the linklocal_addr. The setup_linklocal_addr in device-bsd44.c used getifaddrs, so this is what I moved to device-common.c and I removed setup_linklocal_addr in device-linux.c, in the radvd-2 branch.

By the way, I have question regarding check_allrouters_membership: What condition can lead to us losing the group membership besides the interface going down and up again - which we would notice anyways through netlink? Or in other words: Does is make sense to have check_allrouters_membership, another function parsing a file in /proc/net, and that even on every sent RA?

I've wondered that myself. I suppose whoever put it there was thinking that maybe a user could change it at runtime....If that's the case, I don't really think radvd should check that setting every time. I'm actually not really sure how that even affects it. I think the only thing the allrouters membership really accomplishes is that's the multicast address to which the solicitations are sent. So radvd must join it, but maybe it shouldn't join it it send.c.

Also, I don't particularly like functions named check which actually do something... maybe ensure_allrouters_membership would be more clear about what it's actually doing.

Thanks for reviewing my patches, I'll start implementing your proposed solution.

Thanks for you help!

— Reply to this email directly or view it on GitHubhttps://github.com/reubenhwk/radvd/pull/16#issuecomment-32120481 .

reubenhwk commented 10 years ago

If you'd like, maybe you want to take a look at the radvd-2 branch. I'm working really hard to clean up the code and I'm planning on releasing that branch as v2.0 hopefully soon.

On Sun, Jan 12, 2014 at 10:23 AM, Reuben Hawkins reubenhwk@gmail.comwrote:

On Sun, Jan 12, 2014 at 5:37 AM, NeoRaider notifications@github.comwrote:

Adding RTM_NEWADDR to netlink.c and changing HasFailed to IsReady sound reasonable. I'd also propose rewriting setup_linklocal_addr to use getifaddrs, parsing something in /proc/net is probably bad for the performance.

I agree. I've already done this in the radvd-2 branch, using gitifaddrs to get the linklocal_addr. The setup_linklocal_addr in device-bsd44.c used getifaddrs, so this is what I moved to device-common.c and I removed setup_linklocal_addr in device-linux.c, in the radvd-2 branch.

By the way, I have question regarding check_allrouters_membership: What condition can lead to us losing the group membership besides the interface going down and up again - which we would notice anyways through netlink? Or in other words: Does is make sense to have check_allrouters_membership, another function parsing a file in /proc/net, and that even on every sent RA?

I've wondered that myself. I suppose whoever put it there was thinking that maybe a user could change it at runtime....If that's the case, I don't really think radvd should check that setting every time. I'm actually not really sure how that even affects it. I think the only thing the allrouters membership really accomplishes is that's the multicast address to which the solicitations are sent. So radvd must join it, but maybe it shouldn't join it it send.c.

Also, I don't particularly like functions named check which actually do something... maybe ensure_allrouters_membership would be more clear about what it's actually doing.

Thanks for reviewing my patches, I'll start implementing your proposed solution.

Thanks for you help!

— Reply to this email directly or view it on GitHubhttps://github.com/reubenhwk/radvd/pull/16#issuecomment-32120481 .

danrl commented 10 years ago

I am currently on a similar project to this one and I used the netlink interface on Linux instead of getifaddrs. I don't know if this is suitable for radvd as well, but the netlink interface seems to be the way to go on that platform. I had a problem with getifaddrs under BSD since the scope-id was embedded in the address instead of the correct field in the corresponding struct. AFAIK there is no such thing as netlink interface on BSD :(

neocturne commented 10 years ago

Okay, I'll base my work on the radvd-2 branch. I saw that you rebased radvd-2 onto this pull request, was that intentional?

What do you think about removing the disablenetlink mode and relying on netlink only on Linux (and stopping to randomly call update_device_info in places like send_ra), and retaining the interface state polling on BSD only?

I see you already added a comment about the config reload on receiving netlink updates, but is there really a reason to do these reloads on anything other than SIGHUP at all? I think just calling something like check_ifaces (and adjusting it so it can actually enable and not only disable interfaces) would be enough.

reubenhwk commented 10 years ago

On Sun, Jan 12, 2014 at 10:56 AM, NeoRaider notifications@github.comwrote:

Okay, I'll base my work on the radvd-2 branch. I saw that you rebased radvd-2 onto this pull request, was that intentional?

It was intentional mainly just to see how it would apply to all the changes I made (I new there would be a few conflicts). The radvd-2 branch is something I don't mind rewriting the history on (even though I know that's a pain once it's been shared).

The things radvd-2 branch adds are mainly these:

What do you think about removing the disablenetlink mode and relying on netlink only on Linux (and stopping to randomly call update_device_info in places like send_ra), and retaining the interface state polling on BSD only?

Somebody specifically needed to disable netlink a while back because of an unusual use case of radvd. I didn't really like the idea of disabling netlink, but I made an exception sense he needed it. I plan to eventually revisit that use case to see if that can be removed by making radvd much faster. The problem was that he had a few hundred tunnel interfaces which caused many netlink messages. I think the real fix is going to be to get radvd to reinitialize only one interface per netlink message. Right now, radvd just reinitializes all interfaces if the state of only one interface changes (in most cases).

I see you already added a comment about the config reload on receiving netlink updates, but is there really a reason to do these reloads on anything other than SIGHUP at all? I think just calling something like check_ifaces (and adjusting it so it can actually enable and not only disable interfaces) would be enough.

I'm not sure.

— Reply to this email directly or view it on GitHubhttps://github.com/reubenhwk/radvd/pull/16#issuecomment-32127005 .

neocturne commented 10 years ago

By the way, the radvd-2 branch doesn't compile for me (flex 2.5.37, Bison 2.7.12, GCC 4.8.2):

    gram.c:1541:1: error: conflicting types for ‘yyparse’
     yyparse (struct yydata * yydata)

A forward declaration of struct yydata solves the problem.

reubenhwk commented 10 years ago

On Sun, Jan 12, 2014 at 11:31 AM, NeoRaider notifications@github.comwrote:

By the way, the radvd-2 branch doesn't compile for me (flex 2.5.37, Bison 2.7.12, GCC 4.8.2):

gram.c:1541:1: error: conflicting types for ‘yyparse’
 yyparse (struct yydata * yydata)

A forward declaration of struct yydata solves the problem.

Awesome! Can you send a patch or make a pull request?

— Reply to this email directly or view it on GitHubhttps://github.com/reubenhwk/radvd/pull/16#issuecomment-32127864 .