IP stack does not implement multicasting requirements of IPv6 RFCs and network driver model lacks features to implement it properly

nashif commented 7 years ago

Reported by Paul Sokolovsky:

It's one of the most quoted facts on IPv6 that it offers, and requires, multicasting support as its baseline feature. Turns out, that even the most ground level of the IPv6 connectivity - link layer (aka MAC) address resolution depends on multicasting, and kind of real, dynamic multicasting groups, not just static replacement of IPV4's ARP's broadcast.

This is definitely a complication, and it's hard to refrain from a note that whoever designed such (over)complications "achieved" their goals, with IPv6 being where it is now. However, thinking beyond the initial surprise, there might have been following reasoning for making it like that:

Again, multicasting was intended to be in the DNA of IPv6, and be support and work as expected. Putting it at such low level as LL-addr resolution guarantees that someone working with IPv6 would have hard time "not noticing" and "skipping" need for multicasting.
While the point 1 is likely the main one, the IPv6 designers also took opportunity to optimize address resolution, making it scalable to very large network links.

With this intro in mind, this ticket is being submitted to verify Zephyr's homework on IPv6 support wrt to p.1 above, and see if something was "forgotten" of "worked around" (to reoccur again and again).

Formal references:

RFC4291 "IP Version 6 Addressing Architecture" clearly states in https://tools.ietf.org/html/rfc4291#section-2.8 that a node must (among the bunch of unicast addresses) be a member of at least 2 following multicast groups:

"The All-Nodes multicast addresses defined in Section 2.7.1."
"The Solicited-Node multicast address for each of its unicast and anycast addresses."

Mentioned https://tools.ietf.org/html/rfc4291#section-2.7.1 defines the addresses of these groups:

All-Nodes: FF01::1 / FF02::1
Solicited-Node Address: FF02::1:FFXX:XXXX "Solicited-Node multicast address are computed as a function of a node's unicast and anycast addresses. A Solicited-Node multicast address is formed by taking the low-order 24 bits of an address (unicast or anycast) and appending those bits to the prefix FF02:0:0:0:0:1:FF00::/104"

In other words, last 3 bytes of node's unicast address is taken and appended to "FF02::1:FF", and this forms a dynamic multicast group, thus every node would be a member of different "Solicited-Node Address" multicast group.

https://tools.ietf.org/html/rfc4861 "Neighbor Discovery for IP version 6 (IPv6)" defines the purpose of "Solicited-Node Address" multicast group:

"Nodes accomplish address resolution by multicasting a Neighbor Solicitation that asks the target node to return its link-layer address. Neighbor Solicitation messages are multicast to the solicited-node multicast address of the target address."

So, it's the optimization talked about above - instead of sending Neighbor Solicitation to static "All nodes" group (effectively a broadcast), it is sent to "Solicited-Node Address" group of all nodes which share the same last 3 bytes of IP address, which even on links of a million of nodes will reach only 1 node on average, saving overall CPU time for processing (and battery for useless wakeups).

https://tools.ietf.org/html/rfc2464#section-7 "Transmission of IPv6 Packets over Ethernet Networks" describes how IPv6 multicast addresses are mapped to Ethernet multicast addresses:

"An IPv6 packet with a multicast destination address DST, consisting of the sixteen octets DST[1] through DST[16], is transmitted to the Ethernet multicast address whose first two octets are the value 3333 hexadecimal and whose last four octets are the last four octets of DST."

So, we traced it top to bottom: to comply with IPv6, any Etehrnet-like node needs to receive multicasts for Ethernet groups 33:33:00:00:00:01 and 33:33:FF:xx:xx:xx, where xx are 3 lowest bytes of its IPv6 address (repeated for multiple addresses).

Now on to driver interface. Conceptually, multicast traffic is not just "received", a particular multicast group should be "joined", after which it starts to receive multicast traffic for that group. It may later "leave" a group to stop receiving such traffic.

None of (3) existing Ethernet drivers appear to have any explicit multicast group management routines. In particular, struct net_if_api lacks any methods for multicast groups. Looking at the source of net_if_ipv6_maddr_add(), it signals an net_mgmt event on addition of multicast address:

net_mgmt_event_notify(NET_EVENT_IPV6_MADDR_ADD, iface);

It's unclear if a driver could listen to such event, and in either case, the interface doesn't appear to be sound: there should be explicit request to join a group, with explicit response, and if response is an error, it should be communicated back to user as "IPv6 networking is not functional".

Few last words on implementation strategies for handling multicasts in a particualar driver:

Enabled promiscuous mode - works around all muticast complexities and disables all IPv6 optimizations done using it.
As can be seen above, all IPv6 multicast addresses start with 33:33. Some hardware allows to match subset of bytes in packets, so it can be used to match all multicasts (but ignore legacy and foreign multicasts and broadcast). ENC28J60 has such a filter.
Use actual multicast support in a device - be that a table for exact address matches or a hashtable with some selectivity.

(Imported from Jira ZEP-1673)

nashif commented 7 years ago

by Paul Sokolovsky:

It should be noted that this issue currently affects only Ethernet, but would affect any other medium type with "native" IPv6 packet format and multicasting filtering in hardware, e.g. WiFi.

My assumption this doesn't affect 6lowpan technologies for some reason (or there would be similar reports).

nashif commented 7 years ago

by Paul Sokolovsky:

This ticket is a generalization of previous reports for issues in 2 of 3 Etehrnet drivers: GH-1410 , GH-1463 . This ticket is also an attempt to ensure against "silent workaround" approach as done in e.g. https://gerrit.zephyrproject.org/r/gitweb?p=zephyr.git;a=commitdiff;h=f439fdaa4ffe5a69d58d4d533c8a8b3ff5be3109;hp=c182b02b74842a5a0f116829fecaf740ac135724 . Done like that, the problem will reoccur again and again for each hardware device, and will cause extensive confusion and research for solution due to the relative obscurity of IPv6.

nashif commented 7 years ago

by Jukka Rissanen:

Currently IPv6 multicast join/leave is not supported, it is mentioned in networking todo list (subsys/net/TODO) so this will be done. Patches are welcome of course.

nashif commented 7 years ago

by Paul Sokolovsky:

Jukka Rissanen : Thanks for the response. Glad to hear it was a known problem, hope this ticket will make it even more widely known and saves some time on diagnosing of tickets mentioned above (which seemed to be confusing for number of people).

I agree this qualifies as "offloading optimization", and definitely doesn't fit into 1.7.0, so I'm preparing a workaround for frdm_k64f, following the previous workaround for enc28j60 (but I find it important outlook that it's just a workaround, and also that the problem applies across devices).

nashif commented 7 years ago

by Paul Sokolovsky:

frdm_k64f workaround posted: https://gerrit.zephyrproject.org/r/#/c/10991

nashif commented 7 years ago

by Sharron LIU:

linking to GH-735, and setting same priority.

nashif commented 7 years ago

by Jukka Rissanen:

FYI, I started to work with IPv6 multicast join/leave support. This is 1.8 stuff so will submit a bit later.

nashif commented 7 years ago

by Paul Sokolovsky:

Jukka Rissanen : Great, thanks for heads-up.

And in https://gerrit.zephyrproject.org/r/#/c/10991 , Marcus Shawcroft asked me:

Do you have a feel for how much effort is involved in a 'proper' solution?

So, here're my 2 cents: First question would be to decide if adding mcast group management vmethods to {{struct net_if_api}}, like I propose in the ticket description, is good idea. Another issue is "impedance mismatch" between IP stack working with IP addresses and network drivers working with LL/MAC addresses of a particular hardware type, so it would be a question what kind of address network drivers receive in join/leave calls.

I don't have enough insight into Z IP stack to provide definitive answers to these questions, so it would be better for the core team to tackle that, and I'm happy that Jukka confirmed he started on that.

nashif commented 7 years ago

by Jukka Rissanen:

I sent a patch that creates support for passing mcast group join/leave info to L2 and net drv. I did not implement the actual device driver support for mcux but sent a draft version that someone could use to implement it properly https://gerrit.zephyrproject.org/r/#/c/12994/

nashif commented 7 years ago

by Jukka Rissanen:

There was some resistance for my proposal at https://gerrit.zephyrproject.org/r/#/c/12994/ so I abandoned the patch. If someone wants to send a new proposal and do it as suggested by Tomasz, then go ahead.

nashif commented 7 years ago

by Andrei Laperie:

Since the original proposal for the solution was not accepted, we need to come up with the new one. Removing 1.9 version designation and assigning to Tomasz

nashif commented 7 years ago

by Jukka Rissanen:

New proposal at https://github.com/zephyrproject-rtos/zephyr/pull/1283

nashif commented 7 years ago

Related to GH-735

zephyriot / zep-jira14

IP stack does not implement multicasting requirements of IPv6 RFCs and network driver model lacks features to implement it properly #1527