jgromes / RadioLib

Universal wireless communication library for embedded devices
https://jgromes.github.io/RadioLib/
MIT License
1.48k stars 373 forks source link

Adding support for STM32WL integrated transceivers #588

Closed matthijskooijman closed 1 year ago

matthijskooijman commented 1 year ago

The STM32WL55 series combines an MCU with a LoRa-transceiver in a single module. I'm working together with STMicroelectronics to get some support for these chips in the Arduino ecosystem, and we're currently investigating implementing and contributing support in RadioLib.

I'm currently in the process of investigating feasibility and already looking at what changes would be needed. I'm writing this issue to 1) see if there is interest in accepting such a contribution and 2) getting some feedback on the implementation approach.

I've added my implementation plan/idea below, I welcome comments about this. One thing that I would like some specific feedback on, is whether it would be acceptable to add some virtual methods to the SX126x class, to allow overriding them with STM32WL-specific behavior, or if you have other ideas on how to achieve this behavior (see below on details on how I could imagin this).

The STM32WL55 is a dual-core (M4 & M0+) MCU (of which the Arduino core currently only uses the M4) paired up with a LoRa transceiver which is essentially an SX1261/SX1262 (it has the exact same SPI command interface, but has both the low-power amplifier of the SX1261 and the high power amplifier of the SX1262). This means that most of the SX126x code should be usable without modification, though some changes are needed (mostly related to power selection or special handling of the GPIO and interrrupt connection).

New subclass

To support the radio, a new STM32WL subclass of SX126x can be created. There is no point in subclassing either SX1261 or SX1262, since most if not all of that code must be overridden anyway.

Board-specific vs chip-specific

One point to note is that most customizations needed are specific to the way the STM32WL is wired, but some things are specific to the way the Nucleo board used is built.

The fact that an STM32WL chip is used can easily be detected using the various macros set by the STM32Duino core (based on the board selected for compilation), so that is known at compiletime.

What board is selected can also be detected using macros, so the code can contain some code mapping board macros to board-specific settings, but it is probably good to also allow these settings to be specified by the sketch to support custom boards as well without having to modify the library for such custom boars (configuration from the sketch is the approach used for other radio chips as well).

Board-specific configuration

SPI Communication

Since the STM32WL radio supports the same commands as the SX126x transceiver, the higher level communication code should be usable as-is. Work is underway in Stm32Duino to expose the SPI bus connected to the radio through the regular SPI libary, so its SPIClass instance can be passed to the Module constructor as normal.

Virtual GPIO pins

In the STM32WL chip, some signals (BUSY, IRQ, RESET, NSS) that are normally handled externally are routed internally. This means they are accessed through special registers instead of through the normal GPIO module. This means these pins cannot be accessed through e.g. digitalRead() and digitalWrite().

Some ways to handle this could be:

  1. Modify the STM32Duino core to expose virtual pin numbers for these that access the registers instead of the normal GPIO module. This is certainly the easiest option, requiring little or no modification of RadioLib. It does result in some special casing in the core and can have some performance impact, so this should be carefully considered.

  2. Define virtual pin numbers for these within RadioLib and replacing the the pinMode, digitalRead, and digitalWrite callbacks in Module with versions that handle these pin numbers specially (and call the regular function for other pins). Since this behavior is chip-specific and not board-specific, this behavior should be triggered from the STM32WL class. This is possible by simply letting the STM32WL constructor call various Module::setCB_...() methods.

    This approach is not really elegant and suggests that the used abstractions do not fit this usecase properly, but it is a very pragmatic and easy to implement option.

  3. Use virtual methods (or some other override mechanism) to let the STM32WL class replace the code that calls e.g. Module::digitalRead() for these pins. Since this code is currentlyy embedded in bigger methods (containg other code that should not be duplicated), this involves extracting all code that accesses these pins in new SX126x methods (e.g. getBusy(), getIrq(), setReset() and setNss()). Making these new methods virtual allows STM32WL to override them with the appropriate implementations.

If acceptable for the STM32duino core, option 1 is probably the easiest. If this must be handled in radiolib (which might be needed for the IRQ pin anyway, since that might not be readable without further changes, see below), option 3 is the most elegant, but adds some overhead by adding virtual methods. If that is a problem, option 2 is probably a pragmatic compromise.

For reference, these signals can be accessed as follows:

Interrupt attachment

In the STM32WL chip the three SX126x DIO pins are internally wired together into a single IRQ line connected to the CPU directly, which triggers a dedicated Radio IRQ handler. Since this IRQ is not part of the GPIO IRQs, it cannot be accessed using attachInterrupt().

It seems this can be solved in the same three ways as proposed for the GPIO handling:

  1. Modify STM32Duino to add a virtual RADIO_IRQ pin and modify attachInterrupt() to recognize it.
  2. Override the Module callback for attachInterrupt and enable the appropriate IRQ there (and define SUBGHZ_Radio_IRQHandler to handle the IRQ).
  3. Make Sx126x::setDio1Action() virtual and override it in STM32WL.

Again, option 1 would be easiest, but it does require the core to capture the radio ISR (by defining it unconditionally), this might have more compatibility implications than the GPIO case. Of the alternatives again I would prefer option 3.

Level vs edge interrupts

There does seem to be an additional issue: Documentation suggests that te MCU NVIC is level sensitive on its interrupt lines and the transceiver DIO lines are (probably in binary or configuration) directly connected to the NVIC (not through the EXTI module which is used for GPIO interrupts to make them edge sensitive).

In practice, this would mean that if the radio asserts one of its DIO pins to signal an interrupt and the interrupt handler does not clear the interrupt flags inside the radio (using SPI commands), the interrupt would indefinitely keep triggering, offering the main loop no chance to detect and process the interrupt.

If this is indeed the case (to be verified by tests), this is a tricky problem to solve. The ISR could (before or after calling the sketch callback) clear the radio interrupt flags, but that prevents the main loop code from reading and processing the interrupt flags to determine the result of an operation.

If the flags cannot be cleared, then the ISR must then disable the IRQ enable flag in the NVIC (before or after calling the sketch callback) to prevent the ISR from being called again. The ISR must then be enabled again as soon as the radio interrupt flags are cleared in SX126x::clearIrqStatus() (also clearing the pending flag to prevent triggering the ISR immediately based on the old pending flag).

This means the clearIrqStatus() method must be made virtual (or some other way of overriding must be provided) so the STM32WL subclass can add this additional behavior. Unlike for the GPIO overrides, it seems there is no obvious way to handle this by replacing a Module callback (unless you replace the SPIbeginTransaction, SPItransfer and SPIEndTransaction callbacks to deduce that a clear IRQ status command was sent, but that is horrible and fragile).

If interrupts handling is indeed temporary disabled like this, this also means that the NVIC pending flag can be used as a replacement for reading the Irq/DIO pin in blocking code (like SX126x::transmit()).

One completely different alternative would be to map DIO1 / IRQ0 onto a GPIO pin (this is possible using an "alternate function" on PB3) and then (if this is actually possible - to be confirmed) reading back that pin value and even attaching interrupts to that external pin (which would go through the EXTI module, so be edge-triggered as expected).

One big downside of this approach is that it actually configures PB3 as an output pin and puts the DIO signal on that pin, which prevents using this pin for other purposes (which is particular problematic when using Arduino shields that use it - it is connected to pin D4 on the Arduino headers on the Nucleo board).

vtjballeng commented 1 year ago

I am very interested and will put them in production @matthijskooijman .

jgromes commented 1 year ago

@matthijskooijman that's a lotta text - seems like you have this pretty well thought out. Here's a couple of thoughts:

New subclass

Agreed, that's probably the most straight-forward solution. Obviously, this class should only be accessible when the STM32 Arduino core is used.

Board-specific vs chip-specific

I don't think it's necessary to detect which type of board is used. As far as I can tell, there is only one Nucleo board with the MCU at the moment, the NUCLEO-WL55JC1. In that case, all default parameters of any STM32WL methods can be chosen to match the Nucleo board.

RF switch control

The last approach suggested (generalized RF switch mapping in Module class) is probably the cleanest, since the current approach assumes two pins are used for the switch control. To be fair, that seems to the case for most off-the-shelf modules.

Virtual GPIO pins

I can't really evaluate the first approach, I don't know what the implications for STM32duino core are. I would actually prefer option 2 by replacing the callbacks. This is the exact reason why the Arduino-level functions like digitalRead are interfaced in such a convoluted way - to allow them to be overriden. The issue with approach 3 is that it cannot be reused in case a chip like STM32WL (i.e. MCU with integrated RF transceiver).

Interrupt attachment

Again, I would prefere the second option for reasons described previously. If the attachInterrupt does not match the current platform implementation for whatever reason, it should be overriden. Though it could be argued it's up to STM32duino to be Arduino-compatible.

Level vs edge interrupts

Let's start from the end - I don't really like the idea of routing the internal interrupt to an external pin and then using that as an external interrupt routed via EXTI (seems like something I would come up with at 2 AM out of desperation). Why on Earth would they not route the DIO signals through the EXTI module baffles my mind (it is an external interrupt after all, the SX126x devie is external to the MCU as far as I'm concerned). The only solution I see is to disable the interrupt in NVIC from the ISR, to prevent it from triggering further interrupts. Since this is a completely new behavior, it's not covered in the Module class, and so I'm fine with any overrides and/or custom methods in STM32WL class.

Good luck with the implementation, when it's done and tested I will happily accept the PR. I had my eye on the STM32WL series for a while now. Hadn't had the chance to use them yet unfortunately.

matthijskooijman commented 1 year ago

Board-specific vs chip-specific

I don't think it's necessary to detect which type of board is used. As far as I can tell, there is only one Nucleo board with the MCU at the moment, the NUCLEO-WL55JC1. In that case, all default parameters of any STM32WL methods can be chosen to match the Nucleo board.

Setting defaults to the current nucleo board indeed makes sense, yeah. But the board- vs chip-specific distinction is still relevant, since board-specific values should ideally be changeable by the sketch to allow supporting other boards in the future without needing changes in the library.

The last approach suggested (generalized RF switch mapping in Module class) is probably the cleanest, since the current approach assumes two pins are used for the switch control. To be fair, that seems to the case for most off-the-shelf modules.

Yeah, this is the first board I've seen that uses three pins. I'll see if I can come up with something generic and clean.

The issue with approach 3 is that it cannot be reused in case a chip like STM32WL (i.e. MCU with integrated RF transceiver).

I'm not sure I understand this, the sentence looks unfinished?

Again, I would prefere the second option for reasons described previously. If the attachInterrupt does not match the current platform implementation for whatever reason, it should be overriden. Though it could be argued it's up to STM32duino to be Arduino-compatible.

The issue is not that STM32duino's attachInterrupt is Arduino-incompatible, but it simply does not apply to the radio IRQ at all (since that is an internal IRQ, not routed through the GPIO module). But maybe we can add an attachInterrupt-like function to the STM32duino core for this, I'll see.

Let's start from the end - I don't really like the idea of routing the internal interrupt to an external pin and then using that as an external interrupt routed via EXTI (seems like something I would come up with at 2 AM out of desperation). Why on Earth would they not route the DIO signals through the EXTI module baffles my mind (it is an external interrupt after all, the SX126x devie is external to the MCU as far as I'm concerned).

Yeah, agreed.

The only solution I see is to disable the interrupt in NVIC from the ISR, to prevent it from triggering further interrupts. Since this is a completely new behavior, it's not covered in the Module class, and so I'm fine with any overrides and/or custom methods in STM32WL class.

Ok, I'll do some tests to see how this pans out :-)

matthijskooijman commented 1 year ago

TL;DR: I think that fixing the ISR issue is possible by disabling the IRQ in the NVIC as previously suggested, it just needs that clearIrqStatus() is made virtual. Would that be acceptable?

I did a bit more testing, and the behavior is indeed as previously discussed: The radio IRQ is level-triggered and triggers indefinitely when not cleared. I've tested disabling it in the NVIC (and re-enabling it in clearIrqStatus(), which seems to work as expected in a quick test). In addition to this approach, I've found (but not tested) two other (probably less ideal) approaches, which I'll detail below (feel free to ignore this, but I was writing this down anyway for process documentation, so I thought to also include it here).

All of these would be triggered from the ISR itself:

  1. Clear the pending IRQ flag in the radio.

    This option clears away the IRQ status, which is later read by other code that wants to know the status. To keep such code working, the IRQ status should be queried and saved before clearing it, and getIrqStatus() should be modified to look at the saved value. In addition, clearIrqStatus() should be modified to clear the saved status.

  2. Disable the IRQ mask in the radio.

    This option preservers the IRQ status register, just disconnects it from the DIO signal (and thus interrupt signal), preventing the IRQ from triggering again. However, after this the pending status can no longer be read within the MCU (from the NVIC pending bit), so any "wait until IRQ line set" loops in the code will have to poll the IRQ status register in the radio instead. This can be problematic, since it might expose/increase race conditions (when the sketch IRQ handler calls into the library and starts SPI transfers while the loop is already in a polling SPI transfer - could be fixed by disabling interrupts during polls), and is also slower than polling a GPIO pin (less time for yield() to do something useful).

    One alternative is to set a flag when the mask is cleared by the ISR, and have the IRQ-pin-polling-loop look at that flag and the IRQ pin (or NVIC pending flag on STM32WL). This is reasonable, but would require modifying clearIrqStatus() to also clear this flag.

  3. Disable the IRQ in the MCU NVIC.

    This option keep the IRQ status register and keeps the DIO line active as normal (so it can still be polled), just suppresses the actual IRQ in the MCU NVIC. This is probably the least invasive option, except that it does require modifying clearIrqStatus() (or some other function to be called at the start of an operation) to re-enable the interrupt.

Option 2 and 3 have the upside that no SPI transfers are needed in the ISR. They also have the downside that they need to modify clearIrqStatus() (probably by making it virtual) to clear a flag or re-enable an IRQ mask that was set or disabled by STM32WL-specific code, so this probably requires some open-ended hook in the SX126x class (virtual clearIrqStatus() method or other callback) to be used by the STM32WL class.

Note that implementing this clearing in SX126x unconditionally (as opposed to in the STM32WL subclass) is probably not viable, since not all platforms have attachInterrupt implementations that take additional context with the callback function (e.g. often only static functions that cannot access the SX126x state unless it is made a singleton). Also, temporarily disabling an interrupt using detachInterrupt and attachInterrupt is probably not portable (some platforms clear pending interrupts on attach, which might cause interrupts to be missed (see also this and linked issues).

IRQ handling in radio

I also investigated some more details on the IRQ handling in the radio itself.

The radio has 10 IRQ sources, a global IRQ mask (one bit per source), an IRQ mask for each DIO (one bit per source) and an IRQ status register (one bit per source).

Documentation is a bit vague on this, but testing shows that:

In other words: The value of the status register is set independently of the DIO masks, and the DIO masks can be cleared without affecting the status register (only the DIO signal changes).

jgromes commented 1 year ago

The issue with approach 3 is that it cannot be reused in case a chip like STM32WL (i.e. MCU with integrated RF transceiver). I'm not sure I understand this, the sentence looks unfinished?

What I meant is that this approach is not easily reusable for other chips like STM32WL that may appear in the future. Probably doesn't require huge consideration, just a thought though.

I think that fixing the ISR issue is possible by disabling the IRQ in the NVIC as previously suggested, it just needs that clearIrqStatus() is made virtual. Would that be acceptable?

I don't see nay issues that could be caused by this, so it should be fine. Definitely better than SPI transactions from within ISR.

matthijskooijman commented 1 year ago

RF switch control

I had a closer look at this, and am considering to implement the following.

Defining the table

In the sketch, a board-specific table can be defined as follows:

  // This is the RF switch table that specifies the value for the listed pins
  // to get on-board circuitry in the right state for each mode listed.
  // This table is for the Nucleo-WL55JC1 board.
  //
  // Either MODE_TX_LP or MODE_TX_HP can be omitted if that mode is not
  // supported by the board.
  const uint32_t RFSWITCH_TABLE[][4] = {
    {0,                    PC3,  PC4,  PC5},
    {STM32WLx::MODE_IDLE,  LOW,  LOW,  LOW},
    {STM32WLx::MODE_RX,    HIGH, HIGH, LOW},
    {STM32WLx::MODE_TX_LP, HIGH, HIGH, HIGH},
    {STM32WLx::MODE_TX_HP, HIGH, LOW,  HIGH},
  };

The table has the pin numbers to control on the first row, and then each subsequent row maps some mode to the values to be set on each pin. This produces a compact and, I think, easy to read table. However, there might be some typing issues here. I'll discuss those below, after showing the rest of the plan.

Then, to pass the table to RadioLib:

radio.setRfSwitchTable(RFSWITCH_TABLE);

This method could use template autodeduction to extract the dimensions of the table (i.e. number of defined modes and pins) and store those in separate variables, along with a pointer to the table, to be used later. This method would just forward the data to a method by the same name in Module that would actually store the data.

Mode constants

As for the mode constants, I can imagine that Module defines an enum with MODE_IDLE, MODE_RX and MODE_TX that can be used by all current radios. Radios that need more modes can define their own constants (optionally copying the values for the default modes from Module, though they are free to use whatever values they like).

Simpler API for simple radios

For simple radios, radio.SetRfSwitchPins(rx_pin, tx_pin) can be used as before. For this, Module could get a private member like:

  uint32_t RFSWITCH_TABLE[][3] = {
    {0,                    /* rx_pin */ 0, /* tx_pin */ 0},
    {MODE_IDLE,  LOW,  LOW},
    {MODE_RX,    HIGH, LOW},
    {MODE_TX,    LOW,  HIGH},
  };

When SetRfSwitchPins() is called, the pin numbers passed are put into this table, and the table is passed to SetRfSwitchTable() so new table-based code can be used here as well (without requiring all sketches to spell out the complete table for simple cases).

Applying modes

Then, to actually uses these modes, where the radio modules now call e.g. _mod->setRfSwitchState(LOW, HIGH); they would call _mod->setRfSwitchState(Module::MODE_TX); (no need to keep the old method for compatibility, I think, we can just do a bulk replace in all radio modules.

More complex radios can pass their own mode constants instead of the module constants.

For STM32WL, this is a bit more complicated, because it needs to split the TX state into a HP and LP state, but the code that actually calls setRfSwitchState() is all in the SX126x superclass that does not know anything about this. The easiest way to solve this would be to give SX126x a protected tx_mode variable that it initializes to Module::MODE_TX and passes to setRfSwitchState, but which can be modified by STM32WL::setOutputPower().

Better table definition

It is a bit weird that the table stores three types of data (pin numbers, modes and pin values) in the same array, but using uint32_t should ensure they all fit, so this is just a pragmatic approach (we could also use RADIOLIB_PIN_TYPE which should be big enough to fit values and modes).

While writing this, I realize that this approach might not work on some architectures, like megaavr and others that use ArduinoCore-API, because LOW/HIGH are now their own PinStatus type, and pin numbers might also have their own type in some architectures maybe...

Some alternatives:

Even though I often resort to class templates to create a nice and compact API, I was hoping to just use a plain array here to keep things simple, but I'm afraid that architectures with different pin number and value types might mean that this will be required...

What's your stance on using templates? And do you happen to know if any of the platforms you (want to) support still lack C++11?

jgromes commented 1 year ago

I like the overall approach, especially the fact that the API remains the same for RF switches with just two pins, which covers most of the use-cases. Mode constants and

Regarding templates, I'm not entirely sure if any of the platforms use something older than C++11. However, so far I managed to keep everything template-free. RadioLib provides macros RADIOLIB_PIN_TYPE and RADIOLIB_PIN_STATUS. I would stick to a more C approach, i.e. defining pun numbers separately, and then have a RFSwitchMode_t struct that would contain the structure definition for a table entry.

matthijskooijman commented 1 year ago

I like the overall approach, especially the fact that the API remains the same for RF switches with just two pins, which covers most of the use-cases. Mode constants and

Cool, thanks. Your sentence seems unfinished, though.

I would stick to a more C approach, i.e. defining pun numbers separately, and then have a RFSwitchMode_t struct that would contain the structure definition for a table entry.

Yeah, I was trying something like that, but that is tricky to get right for an arbitrary amount of pins and modes, because then you get an array of some variable-sized struct (since that contains an array of one element per pin) which you can really only do with templates (so the number of pins is a template argument and the size of the inner struct then becomes known at compile time again). But even then, you have to somehow generalize this when you pass the table to the STM32WL/Module class (you lose the templated type then).

Anyway, that was probably a bit hard to understand, but I guess that an arbitrary number of pins and modes is not really needed anyway, we can just hardcode e.g. up to 3 pins and 4 modes. The downside of that is that you always end up allocating the maximum size, even when you do not use it, but it's not that much memory. Also, I think you could hardcode 3 pins and still leave the number of modes free, saving just a little memory maybe.

Writing this, I think things became a bit more clear in my head :-) I'll see if I can convert that into code in the coming days, and if it really works as I think it should. Thanks for your thoughts!

acceleroto commented 1 year ago

Sorry this isn’t a helpful post, but I’m really glad to see your work on this. Thanks for making this happen. I’m looking forward to trying this out on a LoRa-E5/Wio-E5 amateur rocketry project of mine.

matthijskooijman commented 1 year ago

I've created a PR with an implementation for the STM32WL, see #649. Feedback is welcome.

@acceleroto, if you want to test this PR on your E5 module, that would be awesome. Looking at the seed wiki for the LoRa-E5 it looks like this would be a very interesting testcase:

Let me know if you want to try testing this on your module, I'll be happy to talk you through what's needed in terms of PRs to apply and changes to the sketch for the rfswitch (but if you can figure that out yourself, that could be a nice test of the documentation I wrote ;-p).

Also, you might be interested in the https://github.com/stm32duino/STM32LoRaWAN library that offfers LoRaWAN connectivity on the STM32WL platform, and was made public two weeks ago :-)

acceleroto commented 1 year ago

@matthijskooijman Excellent, thanks. I'm busy with family & Christmas over the next couple weeks, but I'll try to sneak in some dev time with it. I just finished the hardware testing for my current prototype PCB with the Wio-E5, so it's ready for some firmware work.

Yeah, I'd guess 1.6 or 1.7V settings would work fine. The datasheet for the STM32WLE5JC says it's happy from 1.6 to 3.3 with 1.7 as typical. I haven't had any problems with the STM32CubeWL examples I've tried.

My project with the Wio-E5 just uses point-to-point LoRa, not full LoRaWAN. Thanks for pointing out the STM32LoRaWAN project - I didn't know about that. Looking forward to trying everything out!

matthijskooijman commented 1 year ago

@matthijskooijman Excellent, thanks. I'm busy with family & Christmas over the next couple weeks, but I'll try to sneak in some dev time with it. I just finished the hardware testing for my current prototype PCB with the Wio-E5, so it's ready for some firmware work.

Cool!

Yeah, I'd guess 1.6 or 1.7V settings would work fine. The datasheet for the STM32WLE5JC says it's happy from 1.6 to 3.3 with 1.7 as typical. I haven't had any problems with the STM32CubeWL examples I've tried.

Uh, I think that datasheet only tells you that the STM32 can output 1.6V up to 3.3V, which is used to power an external TCXO. The STM32 datasheet cannot really tell you what the specs of that TCXO are. The one on the nucleo board, for example, is NT2016SF-32M-END5875A which IIRC supports 1.7V-3.3V (which is why I changed the voltage in the RadioLib example for this board to 1.7V, matching the cube examples, even though it seems to work at 1.6V as well).

Note that the changes needed in the STM32 Arduino core have been merged and released, so if you install version 2.4.0 of that and apply the RadioLib PR, you should be able to make it work :-)

acceleroto commented 1 year ago

Note that the changes needed in the STM32 Arduino core have been merged and released, so if you install version 2.4.0 of that and apply the RadioLib PR, you should be able to make it work :-)

Good point on the TCXO. I just got the latest set of my prototype PCBs in today. I'll start digging into this soon for my Wio-E5 project.

nmaas87 commented 9 months ago

Dear @acceleroto - does the STM32WL work with TCXO? I was thinking of getting a RAK3172-T which uses a STM32WLE5CC with TCXO and was wondering if this would work? :)