zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.44k stars 6.4k forks source link

IP stack does not implement multicasting requirements of IPv6 RFCs and network driver model lacks features to implement it properly #3127

Closed zephyrbot closed 6 years ago

zephyrbot commented 7 years ago

Reported by Paul Sokolovsky:

It's one of the most quoted facts on IPv6 that it offers, and requires, multicasting support as its baseline feature. Turns out, that even the most ground level of the IPv6 connectivity - link layer (aka MAC) address resolution depends on multicasting, and kind of real, dynamic multicasting groups, not just static replacement of IPV4's ARP's broadcast.

This is definitely a complication, and it's hard to refrain from a note that whoever designed such (over)complications "achieved" their goals, with IPv6 being where it is now. However, thinking beyond the initial surprise, there might have been following reasoning for making it like that:

  1. Again, multicasting was intended to be in the DNA of IPv6, and be support and work as expected. Putting it at such low level as LL-addr resolution guarantees that someone working with IPv6 would have hard time "not noticing" and "skipping" need for multicasting.

  2. While the point 1 is likely the main one, the IPv6 designers also took opportunity to optimize address resolution, making it scalable to very large network links.

With this intro in mind, this ticket is being submitted to verify Zephyr's homework on IPv6 support wrt to p.1 above, and see if something was "forgotten" of "worked around" (to reoccur again and again).

Formal references:

RFC4291 "IP Version 6 Addressing Architecture" clearly states in https://tools.ietf.org/html/rfc4291#section-2.8 that a node must (among the bunch of unicast addresses) be a member of at least 2 following multicast groups:

Mentioned https://tools.ietf.org/html/rfc4291#section-2.7.1 defines the addresses of these groups:

In other words, last 3 bytes of node's unicast address is taken and appended to "FF02::1:FF", and this forms a dynamic multicast group, thus every node would be a member of different "Solicited-Node Address" multicast group.

https://tools.ietf.org/html/rfc4861 "Neighbor Discovery for IP version 6 (IPv6)" defines the purpose of "Solicited-Node Address" multicast group:

"Nodes accomplish address resolution by multicasting a Neighbor Solicitation that asks the target node to return its link-layer address. Neighbor Solicitation messages are multicast to the solicited-node multicast address of the target address."

So, it's the optimization talked about above - instead of sending Neighbor Solicitation to static "All nodes" group (effectively a broadcast), it is sent to "Solicited-Node Address" group of all nodes which share the same last 3 bytes of IP address, which even on links of a million of nodes will reach only 1 node on average, saving overall CPU time for processing (and battery for useless wakeups).

https://tools.ietf.org/html/rfc2464#section-7 "Transmission of IPv6 Packets over Ethernet Networks" describes how IPv6 multicast addresses are mapped to Ethernet multicast addresses:

"An IPv6 packet with a multicast destination address DST, consisting of the sixteen octets DST[1] through DST[16], is transmitted to the Ethernet multicast address whose first two octets are the value 3333 hexadecimal and whose last four octets are the last four octets of DST."

So, we traced it top to bottom: to comply with IPv6, any Etehrnet-like node needs to receive multicasts for Ethernet groups 33:33:00:00:00:01 and 33:33:FF:xx:xx:xx, where xx are 3 lowest bytes of its IPv6 address (repeated for multiple addresses).

Now on to driver interface. Conceptually, multicast traffic is not just "received", a particular multicast group should be "joined", after which it starts to receive multicast traffic for that group. It may later "leave" a group to stop receiving such traffic.

None of (3) existing Ethernet drivers appear to have any explicit multicast group management routines. In particular, struct net_if_api lacks any methods for multicast groups. Looking at the source of net_if_ipv6_maddr_add(), it signals an net_mgmt event on addition of multicast address:

net_mgmt_event_notify(NET_EVENT_IPV6_MADDR_ADD, iface);

It's unclear if a driver could listen to such event, and in either case, the interface doesn't appear to be sound: there should be explicit request to join a group, with explicit response, and if response is an error, it should be communicated back to user as "IPv6 networking is not functional".

Few last words on implementation strategies for handling multicasts in a particualar driver:

  1. Enabled promiscuous mode - works around all muticast complexities and disables all IPv6 optimizations done using it.
  2. As can be seen above, all IPv6 multicast addresses start with 33:33. Some hardware allows to match subset of bytes in packets, so it can be used to match all multicasts (but ignore legacy and foreign multicasts and broadcast). ENC28J60 has such a filter.
  3. Use actual multicast support in a device - be that a table for exact address matches or a hashtable with some selectivity.

(Imported from Jira ZEP-1673)

zephyrbot commented 7 years ago

by Paul Sokolovsky:

It should be noted that this issue currently affects only Ethernet, but would affect any other medium type with "native" IPv6 packet format and multicasting filtering in hardware, e.g. WiFi.

My assumption this doesn't affect 6lowpan technologies for some reason (or there would be similar reports).

zephyrbot commented 7 years ago

by Paul Sokolovsky:

This ticket is a generalization of previous reports for issues in 2 of 3 Etehrnet drivers: https://jira.zephyrproject.org/browse/ZEP-1544 , https://jira.zephyrproject.org/browse/ZEP-1602 . This ticket is also an attempt to ensure against "silent workaround" approach as done in e.g. https://gerrit.zephyrproject.org/r/gitweb?p=zephyr.git;a=commitdiff;h=f439fdaa4ffe5a69d58d4d533c8a8b3ff5be3109;hp=c182b02b74842a5a0f116829fecaf740ac135724 . Done like that, the problem will reoccur again and again for each hardware device, and will cause extensive confusion and research for solution due to the relative obscurity of IPv6.

zephyrbot commented 7 years ago

by Jukka Rissanen:

Currently IPv6 multicast join/leave is not supported, it is mentioned in networking todo list (subsys/net/TODO) so this will be done. Patches are welcome of course.

zephyrbot commented 7 years ago

by Paul Sokolovsky:

Jukka Rissanen : Thanks for the response. Glad to hear it was a known problem, hope this ticket will make it even more widely known and saves some time on diagnosing of tickets mentioned above (which seemed to be confusing for number of people).

I agree this qualifies as "offloading optimization", and definitely doesn't fit into 1.7.0, so I'm preparing a workaround for frdm_k64f, following the previous workaround for enc28j60 (but I find it important outlook that it's just a workaround, and also that the problem applies across devices).

zephyrbot commented 7 years ago

by Paul Sokolovsky:

frdm_k64f workaround posted: https://gerrit.zephyrproject.org/r/#/c/10991

zephyrbot commented 7 years ago

by Sharron LIU:

linking to ZEP-828, and setting same priority.

zephyrbot commented 7 years ago

by Jukka Rissanen:

FYI, I started to work with IPv6 multicast join/leave support. This is 1.8 stuff so will submit a bit later.

zephyrbot commented 7 years ago

by Paul Sokolovsky:

Jukka Rissanen : Great, thanks for heads-up.

And in https://gerrit.zephyrproject.org/r/#/c/10991 , Marcus Shawcroft asked me:

Do you have a feel for how much effort is involved in a 'proper' solution?

So, here're my 2 cents: First question would be to decide if adding mcast group management vmethods to {{struct net_if_api}}, like I propose in the ticket description, is good idea. Another issue is "impedance mismatch" between IP stack working with IP addresses and network drivers working with LL/MAC addresses of a particular hardware type, so it would be a question what kind of address network drivers receive in join/leave calls.

I don't have enough insight into Z IP stack to provide definitive answers to these questions, so it would be better for the core team to tackle that, and I'm happy that Jukka confirmed he started on that.

zephyrbot commented 7 years ago

by Jukka Rissanen:

I sent a patch that creates support for passing mcast group join/leave info to L2 and net drv. I did not implement the actual device driver support for mcux but sent a draft version that someone could use to implement it properly https://gerrit.zephyrproject.org/r/#/c/12994/

zephyrbot commented 7 years ago

by Jukka Rissanen:

There was some resistance for my proposal at https://gerrit.zephyrproject.org/r/#/c/12994/ so I abandoned the patch. If someone wants to send a new proposal and do it as suggested by Tomasz, then go ahead.

zephyrbot commented 7 years ago

by Andrei Laperie:

Since the original proposal for the solution was not accepted, we need to come up with the new one. Removing 1.9 version designation and assigning to Tomasz

zephyrbot commented 7 years ago

by Jukka Rissanen:

New proposal at https://github.com/zephyrproject-rtos/zephyr/pull/1283

zephyrbot commented 7 years ago

Related to ZEP-828

galak commented 6 years ago

@jukkar is this resolved?

galak commented 6 years ago

@laperie can you see if this has been resolved?

pfalcon commented 6 years ago

@galak : In my list, what's left is to actually implement Ethernet IPv6 multicast handling on a real device (frdm_k64f). Bumping to 1.11.

pfalcon commented 6 years ago

So, mcast handling for frdm_k64f is posted as https://github.com/zephyrproject-rtos/zephyr/pull/5255 . Unfortunately, still no joy, as it only exposes further problems with the subject of this ticket:

net> iface

Interface 0x2000a2c0 (Ethernet)
===============================
Link addr : 00:04:9F:8E:C0:46
MTU       : 1500
IPv6 unicast addresses (max 3):
    fe80::204:9fff:fe8e:c046 autoconf preferred infinite
    2001:db8::1 manual preferred infinite
IPv6 multicast addresses (max 2):
    ff02::1
    ff02::1:ff8e:c046
IPv6 prefixes (max 2):
    <none>

The above still doesn't comply with https://tools.ietf.org/html/rfc4291#section-2.8's

The Solicited-Node multicast address for each of its unicast and anycast addresses.

As can be seen above, there's no solicited multicast for 2001:db8::1, so it would not answer to pings. That's why https://github.com/zephyrproject-rtos/zephyr/pull/5255 doesn't remove promiscuous mode workaround.

And well, let's face it - it's overkill to store and handle explicit multicast addresses for very sneeze. And I guess, we should address this now. My proposal: let's go for a concept of "implicit multicast addresses", which aren't stored, just requested from a driver to be joined/left. Solicited-node addresses should be just such. "net iface" would print them as such too.

@jukkar : Too bad you can't comment on this idea, because I'm ready to start on it. @tbursztyka : Maybe you'd have comments. I otherwise acquaint myself with DAD which is on critical path of these changes, and hack on it.

pfalcon commented 6 years ago

DAD which is on critical path of these changes

One question I'd like to have an answer for is: Should DAD be performed on manually assigned addresses (like CONFIG_NET_APP_MY_IPV6_ADDR)?

https://tools.ietf.org/html/rfc4862 so far isn't very clear on that. It has:

The autoconfiguration process includes generating a link-local address, generating global addresses via stateless address autoconfiguration, and the Duplicate Address Detection procedure to verify the uniqueness of the addresses on a link.

The Duplicate Address Detection algorithm is performed on all addresses, independently of whether they are obtained via stateless autoconfiguration or DHCPv6.

That's so far pretty ambiguous - "all" as in "all at all", or "all described in this doc" (which are assigned by some automated means, not manually).

pfalcon commented 6 years ago

In addition, routers are expected to successfully pass the Duplicate Address Detection procedure described in this document on all addresses prior to assigning them to an interface.

pfalcon commented 6 years ago

tentative address - [...] An interface discards received packets addressed to a tentative address, but accepts Neighbor Discovery packets related to Duplicate Address Detection for the tentative address.

I don't think that I, as a user, want the manual address I specified to be "tentative", even if temporarily. I want it to be active and acting immediately.

pfalcon commented 6 years ago

Aha:

By default, all addresses should be tested for uniqueness prior to their assignment to an interface for safety. The test should individually be performed on all addresses obtained manually, via stateless address autoconfiguration, or via DHCPv6.

But still:

To accommodate sites that believe the overhead of performing Duplicate Address Detection outweighs its benefits, the use of Duplicate Address Detection can be disabled through the administrative setting of a per-interface configuration flag.

pfalcon commented 6 years ago

Ok, section https://tools.ietf.org/html/rfc4862#section-5.4 details all that.

pfalcon commented 6 years ago

https://tools.ietf.org/html/rfc4862#section-5.4.2

Before sending a Neighbor Solicitation, an interface MUST join the all-nodes multicast address and the solicited-node multicast address of the tentative address.

tbursztyka commented 6 years ago

what a monologue ;)

DAD is started in net_if_ipv6_addr_add() so net_app's fixed IPv6 address triggers DAD too. Afaik, net_if.c does not make any difference whether it's autoconf or manual also. Check but it should be ok.

pfalcon commented 6 years ago

@tbursztyka : Yeah, sorry, I wasn't sure I'll finish reading the RFC yesterday or find all answers, so logged as I proceeded ;-).

Afaik, net_if.c does not make any difference whether it's autoconf or manual also.

Yeah, that's what I saw immediately, and wondered if that's grounded. But the RFC states explicitly that solicated-node multicast addr is created "immediately" even for a tentative address, so it doesn't pose additional complexities for my "implicit mcast addrs" idea. Otherwise, the use of DAD is controlled by CONFIG_NET_IPV6_DAD, so that should address (orthogonal) concern of using DAD for maual addrs (you can turn DAD off completely).

Otherwise, I assume you don't have objections to this "implicit mcast addresses" idea.

tbursztyka commented 6 years ago

Otherwise, I assume you don't have objections to this "implicit mcast addresses" idea.

I am not confident with IPvh6 details:

As can be seen above, there's no solicited multicast for 2001:db8::1, so it would not answer to pings. That's why #5255 doesn't remove promiscuous mode workaround.

But isn't ff02::1 the multicast address for all addresses on the network (besides ll unicast ones)? btw, pinging 2001:db8::1 works on 15.4 for instance with that configuration. Is there something specific with ethernet/IPv6?

pfalcon commented 6 years ago

But isn't ff02::1 the multicast address for all addresses on the network (besides ll unicast ones)?

Well, that's how IPv6 differs from IPv4 - in IPv6, to resolve link-layer addr from IPv6, you don't send a broadcast (which in a 1M network segment would be received and processed by 1M nodes), but to "solicited node multicast address" (3224-bit), so O(1) of nodes would received/process it.

btw, pinging 2001:db8::1 works on 15.4 for instance with that configuration. Is there something specific with ethernet/IPv6?

Yeah, I wish I had an exact response why BLE/15.4 6lowpan "just works" (and how much it truly does). My hypothesis is that they are either inherently broadcast, or vice-versa, point-to-point (so Linux e.g. known lladdr before sending any packet) media. But regarding Ethernet, the mentioned in the original description https://tools.ietf.org/html/rfc2464 clearly represents the idea that Ethernet is multicast - not broadcast - media, and address discovery in it is optimized for large network segments.

tbursztyka commented 6 years ago

15.4 is broadcasting: so when you ping, it will run first a nbr discovery (thus broadcasting the discovery) and is succefful will then know the ll addr etc...

Sounds like we'll need your fix then when ethernet is enabled.

pfalcon commented 6 years ago

My proposal: let's go for a concept of "implicit multicast addresses"

Ok, so I decided to contain myself and fix the issue with existing framework, leaving optimizations for later. And it turned out not too hard, all the pieces were already there (I was afraid complex address matching would be required). https://github.com/zephyrproject-rtos/zephyr/pull/5287

carlescufi commented 6 years ago

@pfalcon according to you last comment this is fixed. Can the issue be closed?

pfalcon commented 6 years ago

Yes, there're actually more requirements popping up, e.g. https://github.com/zephyrproject-rtos/zephyr/pull/5846 , but the core ones now can be considered implemented, and we can deal with exceptions separately. Thanks.

carlescufi commented 6 years ago

@pfalcon thank you!