zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.49k stars 6.42k forks source link

[RFC] net: l2: ieee802154: TSCH protocol support #50336

Open fgrandel opened 1 year ago

fgrandel commented 1 year ago

Why TSCH protocol support?

The IEEE 802.15.4e amendment introduced the Time Slotted Channel Hopping (TSCH) protocol to the standard. This was later consolidated into IEEE802.15.4-2015 and all following versions of the standard. Wikipedia has a quick introduction.

TSCH in IEEE 802.15.4 is designed around very similar concepts as the WirelessHART standard which is widely adopted in industrial applications. In fact both standards compete directly and have repeatedly inspired each other. TSCH combines very low packet rejection rates and real-time capabilities with excellent low-energy properties. It can be used in star and mesh topologies.

A dedicated IETF working group has defined a range of higher-level standards and profiles (6TSCH) on top of the protocol. Namely IPv6 over TSCH as well as an adaptation of the RPL routing protocol (among others). This defines a full network stack on top of TSCH which is equally well suited to large-scale IoT-applications as the more traditional non-beacon-enabled/6LoWPAN stack underlying Thread/ZigBee.

These two alternatives are however not to be regarded as being interchangeable. TSCH is optimized for harsh industrial environments with high reliability and deterministic timing requirements while the Thread/ZigBee protocols are more focused onto the office/home-automation market. TSCH might therefore increase Zephyr's attractiveness to industrial automation and medical use cases.

Implementation

This feature request is not new to Zephyr, it revives parts of #3710, namely #2399 which again requires #2397, #2392 and parts of #2398.

A detailed implementation plan follows below which can be transformed into individual issues and implemented step-by-step (iteratively) once the principle as such has been agreed.

Alternatives

Thread/Zigbee/WiSUN or even raw IEEE 802.15.4 might be regarded as alternatives. But these are to be seen as complementary protocols with different focus, strengths and weaknesses. Namely the superior approach to collision avoidance (deterministic latency) and channel hopping (reliability, resistance to interference) in TSCH as well as highly configurable throughput and latency per device (QoS) are strengths of the TSCH protocol while it is weaker when it comes to ease of configuration, deployment, administration and usage (which makes it less adequate in a home/office automation setting).

Solution Design

A premliminary analysis of the specification (IEEE 802.15.4-2015 or -2020) yielded the following tasks that might be implemented roughly in the order given below:

UML

TSCH operation

zephyr-tsch

IE Parser

IE Parser SM

fgrandel commented 1 year ago

@rlubos @jukkar @nandojve As always I'm trying to implement this in my free time so I prefer an iterative approach that leaves the code base in a clean and generally improved state even if I have to stop in the middle due to lack of time.

In that sense I'd like to start with two preparatory tasks:

What do yo think of this, do you agree that these two changes introduce value as such? If so I'd start by providing individual PRs for those two issues first. I'd just like a quick thumbs up from your side, especially before changing the (internal) structure of net_pkt to improve its memory footprint/maintainability/encapsulation.

@nandojve : As you can see in the above todo list I also intend to work towards the improvements discussed in #49775 - these come at a later stage, though, where they fit better what is actually needed to proceed.

rlubos commented 1 year ago

Thanks @fgrandel, I agree that the two initial tasks proposed by you sound like valuable additions. Adding OT guys on the CC, since the OpenThead is currently the main target of the 15.4-specific net_pkt fields. CC @edmont @lmaciejonczyk @canisLupus1313

As for get_config() - there were some discussions in the past regarding adding such an radio API function, here's the link for some background (https://github.com/zephyrproject-rtos/zephyr/issues/23347#issuecomment-596987429). Personally though, I have no problem for such an API extension.

fgrandel commented 1 year ago

@rlubos Thanks for your quick response. I'm going to work on those first steps then.

Taking up the ball re configuration getter to propose something more specific:

Do we need a generic getter in the radio API at all?

@tbursztyka 's argument against a generic configuration getter in #23347 was based on the assumption that L2 owns all configuration logic required in L2. While this is true for writable configuration it is not for read-only attributes of the driver:

Injecting driver properties as "capabilities" into L2 does not scale for non-boolean attributes and makes it too easy to violate the spec. A good example is IEEE802154_HW_2_4_GHZ and IEEE802154_HW_SUB_GHZ plus get_subg_channel_count(). This is clearly against the spec's much more flexible concept of integer-valued channel pages which uniquely determine the number and frequency bands of available channels. Adding a new getter for every driver attribute in the future doesn't scale well either.

The same is true for global Kconfig constants and direct access to DT driver props from L2. The much too broadly defined CONFIG_IEEE802154_2015 is a good example. It actually represents a mixture of (non-boolean and boolean) attributes/capabilities as well as parameters that should be configurable at runtime according to the spec (whether or not to use CSL for example). It's also not a good idea to tie features to a specific version of the spec. Should this constant now be renamed to _2020? ;-)

Having a generic getter for read-only attributes let's us model the API closer to the spec's flexible and future-proof PHY PIB concept.

What could be included in the generic getter?

Possible attributes to be served by a generic getter: 1) implementation-specific (non-PIB) read-only attributes that do not change at runtime and cannot be conceptualized as "capabilities", e.g. required buffer headspace. 2) read-only PIB attributes which do not change at runtime (not yet required). 3) valid ranges for writable configuration attributes, e.g. available channel pages, which do not change at runtime 4) the current state of attributes configurable via configure() - not recommended 5) internal driver state that changes at runtime (current time, current clock precision, etc.) - not recommended

I tend to follow along the lines of @tbursztyka 's argument to only include such attributes that are not writable by L2 and do not change at runtime (categories 1-3) to keep the changes required in existing driver code to a minimum.

I therefore propose the following minimal changes:

The length of the FCS as well as the algorithm required to calculate it (if the driver does not provide the FCS on RX) is a deterministic function of a few well-defined PIB attributes (category 2 or 4). So the above API also enables this requirement from #49775 without implementing it yet.

While the hardware capabilities could become just another "attribute" I'd vote against this, to keep the change small and to maintain similarity with the ethernet driver API.

I'd also not touch get_time() and get_sch_acc() as they are updated at runtime.

How to name the API?

I did a quick search across all driver APIs. Here my result:

If we follow @tbursztyka 's recommendation to not introduce a symmetric configuration getter then I propose that we use attr_get():

nandojve commented 1 year ago

@nandojve : As you can see in the above todo list I also intend to work towards the improvements discussed in #49775 - these come at a later stage, though, where they fit better what is actually needed to proceed.

Hi @fgrandel ,

It`ll be great have TSCH on Zephyr. Yes, I want to continue look on #49775 as soon new release is launched.

fgrandel commented 1 year ago

@rlubos @nandojve Now that the initial preparation has been submitted for review and merged I'm concentrating on introducing enhanced beacons and enhanced ACKs. They are the first thing that devices will have to send (TSCH PAN coordinator) and receive (initial synchronization of participating devices) in a TSCH.

If you agree then this iteration will contain sending and parsing of enhanced beacons and ACKs with TSCH in mind, i.e.:

This iteration introduces generally useful features (handling IEs in the soft-MAC layer) plus advances my personal development agenda.

fgrandel commented 1 year ago

Now that the initial preparation has been submitted for review and merged I'm concentrating on introducing enhanced beacons and enhanced ACKs.

@rlubos @nandojve I'm slowly making progress towards the next iteration. This step is going to be much larger than the previous ones. I'm developing it in the open in case you'd like to have a sneak preview. Once I got a working state I'm going to cut it into small steps and provide several PRs for it.

nandojve commented 1 year ago

Hi @fgrandel ,

Thank you for update us about this work. I understand this is a huge step inside Zephyr and I would like know what will be the base line that we need follow/compare. I mean:

1- What is the validation approach to make sure it is working properly? 2- How will be the setup environment, guideline? 3- What are the SW/HW involved in this process?

I think it is important we establish how people can put TSCH to work to make easy to test and decrease pressure in the test/regression phase during review.

I imagine that you already have a plan/setup and I was wondering if it is an a phase that can be shared?

fgrandel commented 1 year ago

@nandojve ...a few ideas but maybe less than you expect.

1- Currently I'm working heavily with unit tests. But I agree that proper integration testing will be very useful. It would be great if someone was in the position to test integration with existing implementations, e.g. OpenWSN, Contiki-NG or some of the proprietary stacks out there. I'll not be able to do this myself. I plan to test the framework with TI (SubGhz) and Nordic (2.4 GHz) chips. But this will be well into 2023 as there's still a LOT to do before this will actually work. In any case I plan to add the coordinator role to Zephyr (FFD). This will make it easier to test if it works with Zephyr on both sides.

2- I'll provide docs, unit tests and sample code as I need those anyway for my own testing. Setting TSCH up consists of configuring a few static things via Kconfig plus setting the necessary dynamic parameters via net mgmt (slotframes, links, etc.). Some configuration is fixed for now (timeslot config, hopping sequence) or may be configurable via beacons/IE by coordinator nodes running other stacks but can easily be added to the net mgmt API later on if more flexibility is needed by an application. The configuration API is already quite complete so feel free to have a look.

3- No software needed except for Zephyr itself. External sniffers should be able to debug the protocol as it's just the standard. Any IEEE 802.15.4-capable hardware should work iif it has a good LF-clock with at most 50ppm or less AND its hardware and driver is RX/TX-timing-capable (timestamping of RX packets + scheduling of TX packets). AFAICS currently only the Nordic-driver seems to support RX/TX-timing in Zephyr. So this is an important problem with existing drivers. Another place where rather small contributions could make a big impact.

nandojve commented 1 year ago

Hi @fgrandel ,

I think it will be necessary to follow https://docs.zephyrproject.org/latest/contribute/guidelines.html#submitting-proposals. This means that it is necessary a RFC to describe the full plan and state correct expectations.

This is a big change in the IEEE 802.15.4 stack and it will involve add Coordinator role. So, I imagine that currently RFC will have at least 2 or 3 main contributions: FFD, Coordinador and TSCH. On my view, you should chose focus only in one of the bands initially. This alone is already a lot of work and every piece that can be added later will be helpful on the review phase.

1) I took look in the OpenWSN and I'm not sure if I can help to test with that. I saw that there are RF2xx involved but it is not clear if they can be used in TSCH and it was not clear if they support nRF. Contiki-NG seems to have more radio support. RIOT-OS have an experimental TSCH using OpenWSN (again same doubts about radios). The common HW seems to be TI in all cases. If for some reason this proposal may not have a baseline or even you prefer to not use it, you should make it clear at RFC.

As far I remember, Zephyr doesn't have the FFD Coordinator role in the IEEE 802.15.4 and alone could be considered a complete new RFC. So, I think it will important you state what will be the goals for the FFD as Coordinator in the RFC. For instance, when you say They are the first thing that devices will have to send (TSCH PAN coordinator) and receive (initial synchronization of participating devices) in a TSCH. I expect that normal coordinator functions will be available because TSCH is more advanced. Is my assumption correct? If yes/no, please make it clear in the RFC.

2- This is very important to help is evaluate and review, please, state it on the RFC.

3- Remember that IEEE 802.15.4 should interoperate with other SW and this should be/or not establish by using a baseline. I don't see any reason to force a baseline mandatory but this should be clarified at RFC. I'm quite sure external people will try interoperate and if they have problems nobody will be able to help them, as consequence project image can be damaged.

AFAICS currently only the Nordic-driver seems to support RX/TX-timing in Zephyr. So this is an important problem with existing drivers. Another place where rather small contributions could make a big impact. You may be correct. I could add missing pieces in the rf2xx driver but I'm sure what I should provide in terms of API. If that is clear I could start to work in parallel to increase radio support in general.

In general terms I'm OK with your proposal. I think it will be necessary create a RFC just to make clear the goals and expectations for internal/external audiences.

fgrandel commented 1 year ago

@nandojve This IS an RFC and AFAICS one of the better ones. I'll successively add conceptual information to the initial post as it becomes obvious. See e.g. the UML design documents that have been added.

fgrandel commented 1 year ago

@rlubos I agree with @nandojve that discussing the approach for PRs, integration, testing, etc. early on makes sense. Due to the complexity of the topic I'd prefer a face-2-face conversation, though. I'm providing very complete documentation in this feature request/RFC but I have the impression that people don't have time to read it anyway. Maybe the networking forum would be appropriate? I would do that at a later stage, though, when the code is more complete.

fgrandel commented 1 year ago

@rlubos @nandojve I just finished introducing all the infrastructure and tests necessary to produce and handle enhanced beacons and enhanced ACKs in TSCH mode. I also fixed a lot of other stuff on my way. IMHO this is a good point to create PRs for reviewing and merging this important intermediate result. So that's what I'm going to do now successively.

rlubos commented 1 year ago

@fgrandel Yes, the networking forum could be a good place to present an RFC, we do it from time to time. We have a meeting tomorrow (https://lists.zephyrproject.org/g/networking/viewevent?repeatid=48578&eventid=1747829&calstart=2023-01-03) there's not much on the agenda so far so if you'd like I could give you the floor to present. Otherwise we can do it next month or later.

Nonetheless, please keep in mind that IMO the networking area in Zephyr is a bit understaffed therefore you may not get a lot of feedback, especially in such a specialized area.

fgrandel commented 1 year ago

@rlubos I think it would be great to get to know each other in person and ensure that we don't step on each others' toes. I don't expect anything in terms of feedback, I get along quite well on my own. I RSVPed and added the topic on the agenda (nice open approach btw).

IMO the main area that requires common attention is how we'll ensure conceptual alignment with the 2015/2020 spec versions in the future (see my proposals re net management/radio APIs and KConfig deprecations/additions). I think the approach in Zephyr could be a little more standards conforming and less "ad hoc" so that we don't make it unnecessarily hard to implement additional parts of the standard. I have quite a hard time to introduce my changes into the existing not-so-conforming KConfig/Net Mgmt/Radio APIs. I believe this can be fixed, though.

rlubos commented 1 year ago

@fgrandel Great, I'll add the topic to the agneda for tomorrow. @nandojve Hopefully you could join the meetting as well?

I RSVPed and added the topic on the agenda (nice open approach btw).

To be honest I'm not really sure how this RSVP thing works, I did not get any notification/answer whatsoever. For the future, I think it'd be easier to simply use the mailing list we have for netowrking (https://lists.zephyrproject.org/g/networking).

tbursztyka commented 1 year ago

Ok, iterations seem right. Though I didn't get why you want to pass FCS to all L2. It's 99% (if not 100) of the time handled in the hw, and I see no use for the l2 (openthread is openthread...). If a device cannot handle it, then the driver could do the verification in sw, but still no need to provide it to L2.

Also, try not to diverge with making useless changes like the _unused attribute, or the position of the ':' of bit fields, or renaming variables that are semantically precise to something generic such as ll_hdr_len to headroom (unless there is a very good reason to do so, like it really needs to reserve space for more than just the ll header?).

Anyway, it's a nice feature implementation proposal.

fgrandel commented 1 year ago

Though I didn't get why you want to pass FCS to all L2. [...] renaming variables that are semantically precise to something generic such as ll_hdr_len to headroom

Hi @tbursztyka have you seen the discussion and analysis in #49775? Both questions should be answered there on the conceptual level - but you're right that I did the implementation wrong wrt headroom/tailroom although the intention was what is specified there. I therefore think that would also be the right place to question the concept if you disagree in principle and not just with my implementation attempt.

In a nutshell:

Also, try not to diverge with making useless changes like the _unused attribute, or the position of the ':' of bit fields

Yeah, that will not make it into the final PRs. I'm just documenting padding/holes in structs and bitmaps for myself in case I need space and as a kind of "TODO" marker so that I can try to further optimize these structs if an opportunity turns up.

But then its also really annoying that lots of existing code are not formatted with clang-format which really makes maintaining legacy code a PITA wrt formatting. Formatting should not be something that gets into a devs way that much (the : of bit fields are a good example what I mean).

fgrandel commented 1 year ago

@nandojve

AFAICS currently only the Nordic-driver seems to support RX/TX-timing in Zephyr. So this is an important problem with existing drivers. Another place where rather small contributions could make a big impact. You may be correct. I could add missing pieces in the rf2xx driver but I'm sure what I should provide in terms of API. If that is clear I could start to work in parallel to increase radio support in general.

The existing Radio API already provides bits for timed RX/TX (introduced for CSL-support in OpenThread but covering a lot that will probably be required for TSCH as well). The following list is certainly not 100% correct/complete but gives you an indication of what I believe will be part of the package:

(plus some CSL specific stuff not needed for TSCH, like the rather questionable delegation of CSL IE insertion to the radio driver...)

I'm not 100% sure whether these existing concepts are exactly the ones needed in TSCH. I'll only be able to tell once I've implemented and tested TSCH operation. In any case adding OpenThread-CSL support to a driver is a good thing as such and even if it may not be everything, it will go a large step in the right direction and definitely have a lot of re-use in TSCH.

fgrandel commented 1 year ago

@nandojve

I expect that normal coordinator functions will be available because TSCH is more advanced. Is my assumption correct?

In TSCH a root time source is mandatory (which is the PAN coordinator, see sections 6.3.6 and 6.5.4 of IEEE 802.15.4-2020). As there are few TSCH implementations out there, everything that is needed to set up a TSCH network should be implemented in Zephyr.

I would rather not like to copy over the spec into this RFC, as the spec is public. I provide all the necessary pointers into the spec even naming the specific sections that need to be implemented. IMHO that is more than adequate for an RFC which should concentrate on the implementation details specific to Zephyr. That said, I don't think that detailed TSCH knowlege is required to do some basic review wrt Zephyr coding style, integration with other parts of the software, etc. If you have questions like the one you asked about how TSCH works, then please read the spec, though.

tbursztyka commented 1 year ago

In a nutshell:

* FCS handling: In practice, drivers before had to make an explicit distinction depending on the L2 implementation they serve which is really bad encapsulation and caused drivers to be ugly, buggy and diverge (see TODOs added). It seems unrealistic to change the OpenThread requirements so to make life easier for driver implementers it makes sense to change the much less used generic L2 IMHO. But there is also a conceptual reason, which is why Linux does it in the same way (see the links to the Linux source code in my analysis): Raw sockets in Linux send the FCS (but no PHY headers) to userspace on RX and we want Zephyr sockets to look just "as in Linux", right?

Ok the argument of being able at runtime to switch from/to AF_PACKET is a relevant one (that's why linux does it this way actually)

That said, getting the FCS is most likely going to be a burden for the driver dev, not the other way. See all TI's for instance, but a few others as well: they either do not forward the original FCS to the host sw or replace it with something (like CC2520). So it's going to add some overhead, thus why this was originally not done to save space and CPU (remember that zephyr was first meant to run on very small target at the beginning. Not so much these days it seems but that's another discussion).

* Headroom/Tailroom: We want an API that allows HW drivers to reserve additional headroom (e.g. for PHY header extensions) and tailroom (e.g. to cater for 2/4-byte FCS on TX) for zero-copy support in the driver while keeping the HW driver implementation a black box with a generic API in L2. Again this conforms to how Linux does it. I admit, though, that my implementation is bad as I treat Header IEs separately and really do mix the concepts of LL header space and HW headroom. So this is something that I definitely need to fix. Thanks for the hint!

Zephyr would first need a PHY layer before having these modifications. Fact that I am nit-picking on the wording has its logic: headroom/tailroom at the level of a buffer API is a perfect match as a buffer API is meant to be generic and has no ties to what will be stored into it. In 15.4 L2, this is not true anymore: you reserve space for something that means something in a very strict sense. So if you need space for PHY, get some variable named and dedicated to it, same for LL, FCS etc...

Also, try not to diverge with making useless changes like the _unused attribute, or the position of the ':' of bit fields

Yeah, that will not make it into the final PRs. I'm just documenting padding/holes in structs and bitmaps for myself in case I need space and as a kind of "TODO" marker so that I can try to further optimize these structs if an opportunity turns up.

But then its also really annoying that lots of existing code are not formatted with clang-format which really makes maintaining legacy code a PITA wrt formatting. Formatting should not be something that gets into a devs way that much (the : of bit fields are a good example what I mean).

Clang formatting is not enforced in zephyr in any way, so don't go that path. Even new code are mostly not clang formatted. I don't see where is the annoyance in that actually. See https://docs.zephyrproject.org/latest/contribute/coding_guidelines/index.html: no clang mention. https://docs.zephyrproject.org/latest/contribute/guidelines.html : clang-format is only mentioned as an helpful tool.

Documenting stuff on the other hand is good, I saw you did it for instance on NET_REQUEST_IEEE802154CMD*. That totally deserves dedicated commit.

fgrandel commented 1 year ago

@tbursztyka

That said, getting the FCS is most likely going to be a burden for the driver dev, not the other way.

That's why Linux allows drivers to set an appropriate HW capability and L2 will add a synthetic FCS where needed. I could do that but that further complicates the radio API. The feeling of those participating in #49775 was that letting the driver add a synthetic FCS where needed would be the right compromise as they have to do this anyway already to serve OpenThread L2. And I tend to agree with this: Its overall definitely a step in the right direction. In Zephyr it does make code less complicated (but burning additional CPU cycles in some non-OT scenarios - that's agreed!) as it does not add any additional complexity not already needed but removes complexity in almost all drivers (except for those that were incomplete/buggy wrt OpenThread before). If you still disagree, I really propose that we continue that discussion in #49775 where it is visible to all parties concerned.

So if you need space for PHY, get some variable named and dedicated to it, same for LL, FCS etc...

Yep, totally agreed.

Clang formatting is not enforced in zephyr in any way

Yes I know - that's what I'm ranting about. But it's a private opinion and not something I want to enforce. Just airing a bit of developer frustration. ;-)

atiselsts commented 1 year ago

Hi team, great to have found this discussion and to see this work. I'm one of the Contiki-NG maintainers, involved in the TCSH support for that platform. At the moment looking at the options to run TSCH on top of Zephyr (targetting nRF52840, nRF5340, and potentially some LoRa radios attached via SPI).

Do you have some timeline for this feature? When it becomes available for testing we can try to help, especially re: the integration testing part.

fgrandel commented 1 year ago

@atiselsts

I'm one of the Contiki-NG maintainers, involved in the TCSH support for that platform.

Nice. I'd love to have a f2f chat with you re TSCH on top of Zephyr. Are you on the Zephyr Discord? Or can I reach you somewhere else? Feel free to send me a private email to my github mail address if you're interested: jerico dot dev at gmail dot com.

Do you have some timeline for this feature?

Unfortunately: No. I'm doing this in my free time - so it's hard to plan reliably. I do hope, though, that before the end of 2023 we have a tested working version. If you or someone else from Contiki-NG wants to contribute, we could certainly co-ordinate some team work to speed this up.

Most Zephyr-specific refactoring plus all the IE creation/parsing stuff has been done already. What comes next is porting TSCH timing and operation code from Contiki-NG. Therefore, someone with Contiki-NG know-how would be in a good position to contribute.

When it becomes available for testing we can try to help, especially re: the integration testing part.

That sounds great. It would be very valuable for both projects, I guess, if we could announce tested Zephyr - Contiki-NG compatibility right from the start.

fgrandel commented 1 year ago

@nashif @MaureenHelm Thanks for giving this RFC visibility on the RFC Backlog. Status: "In Progress". I'm working regularly on this RFC, see https://github.com/zephyrproject-rtos/zephyr/compare/main...fgrandel:zephyr:feat/ieee802154-tsch-ie-support

Not sure what the "assignee" column means. Maybe it makes sense to assign myself while this is not in review?