jgromes / RadioLib

Universal wireless communication library for embedded devices
https://jgromes.github.io/RadioLib/
MIT License
1.55k stars 387 forks source link

intermittent failure to initialise cubecell #709

Closed gasagna closed 1 year ago

gasagna commented 1 year ago

Hi,

quite often (~75% of the time), i get a -2 return code when initialising my CubeCell V2. I have enabled the debug and verbose outputs and this is the results in one case where the module errors on begin:

Copyright @2019-2020 Heltec Automation.All rights reserved.
CMD     80
DATW    0       AA

CMD     80
DATW    0       A2

CMD     C0
DATR    0       A2      0       22

CMD     1D      3       20
DATR    0       A2      0       A2      0       53      0       58      0       31      0       32      0       36      0       31      0       20      0       56      032      0       44      0    20      0       32      0       44      0       30      0       32

SX126x not found! (1 of 10 tries) RADIOLIB_SX126X_REG_VERSION_STRING:
0000320 a2 53 58 31 32 36 31 20 56 32 44 20 32 44 30 32 | .SX1261 V2D 2D02
Expected string: SX1261
CMD     80
DATW    0       AA

CMD     80
DATW    0       A2

CMD     C0
DATR    0       A2      0       22

CMD     1D      3       20
DATR    0       A2      0       A2      0       53      0       58      0       31      0       32      0       36      0       31      0       20      0       56      032      0       44      0    20      0       32      0       44      0       30      0       32

SX126x not found! (2 of 10 tries) RADIOLIB_SX126X_REG_VERSION_STRING:
0000320 a2 53 58 31 32 36 31 20 56 32 44 20 32 44 30 32 | .SX1261 V2D 2D02
Expected string: SX1261
CMD     80
DATW    0       AA

CMD     80
DATW    0       A2

CMD     C0
DATR    0       A2      0       22

CMD     1D      3       20
DATR    0       A2      0       A2      0       53      0       58      0       31      0       32      0       36      0       31      0       20      0       56      032      0       44      0    20      0       32      0       44      0       30      0       32

SX126x not found! (3 of 10 tries) RADIOLIB_SX126X_REG_VERSION_STRING:
0000320 a2 53 58 31 32 36 31 20 56 32 44 20 32 44 30 32 | .SX1261 V2D 2D02
Expected string: SX1261
CMD     80
DATW    0       AA

CMD     80
DATW    0       A2

CMD     C0
DATR    0       A2      0       22

CMD     1D      3       20
DATR    0       A2      0       A2      0       53      0       58      0       31      0       32      0       36      0       31      0       20      0       56      032      0       44      0    20      0       32      0       44      0       30      0       32

SX126x not found! (4 of 10 tries) RADIOLIB_SX126X_REG_VERSION_STRING:
0000320 a2 53 58 31 32 36 31 20 56 32 44 20 32 44 30 32 | .SX1261 V2D 2D02
Expected string: SX1261
CMD     80
DATW    0       AA

CMD     80
DATW    0       A2

CMD     C0
DATR    0       A2      0       22

CMD     1D      3       20
DATR    0       A2      0       A2      0       53      0       58      0       31      0       32      0       36      0       31      0       20      0       56      032      0       44      0    20      0       32      0       44      0       30      0       32

SX126x not found! (5 of 10 tries) RADIOLIB_SX126X_REG_VERSION_STRING:
0000320 a2 53 58 31 32 36 31 20 56 32 44 20 32 44 30 32 | .SX1261 V2D 2D02
Expected string: SX1261
CMD     80
DATW    0       AA

CMD     80
DATW    0       A2

CMD     C0
DATR    0       A2      0       22

CMD     1D      3       20
DATR    0       A2      0       A2      0       53      0       58      0       31      0       32      0       36      0       31      0       20      0       56      032      0       44      0    20      0       32      0       44      0       30      0       32

SX126x not found! (6 of 10 tries) RADIOLIB_SX126X_REG_VERSION_STRING:
0000320 a2 53 58 31 32 36 31 20 56 32 44 20 32 44 30 32 | .SX1261 V2D 2D02
Expected string: SX1261
CMD     80
DATW    0       AA

CMD     80
DATW    0       A2

CMD     C0
DATR    0       A2      0       22

CMD     1D      3       20
DATR    0       A2      0       A2      0       53      0       58      0       31      0       32      0       36      0       31      0       20      0       56      032      0       44      0    20      0       32      0       44      0       30      0       32

SX126x not found! (7 of 10 tries) RADIOLIB_SX126X_REG_VERSION_STRING:
0000320 a2 53 58 31 32 36 31 20 56 32 44 20 32 44 30 32 | .SX1261 V2D 2D02
Expected string: SX1261
CMD     80
DATW    0       AA

CMD     80
DATW    0       A2

CMD     C0
DATR    0       A2      0       22

CMD     1D      3       20
DATR    0       A2      0       A2      0       53      0       58      0       31      0       32      0       36      0       31      0       20      0       56      032      0       44      0    20      0       32      0       44      0       30      0       32

SX126x not found! (8 of 10 tries) RADIOLIB_SX126X_REG_VERSION_STRING:
0000320 a2 53 58 31 32 36 31 20 56 32 44 20 32 44 30 32 | .SX1261 V2D 2D02
Expected string: SX1261
CMD     80
DATW    0       AA

CMD     80
DATW    0       A2

CMD     C0
DATR    0       A2      0       22

CMD     1D      3       20
DATR    0       A2      0       A2      0       53      0       58      0       31      0       32      0       36      0       31      0       20      0       56      032      0       44      0    20      0       32      0       44      0       30      0       32

SX126x not found! (9 of 10 tries) RADIOLIB_SX126X_REG_VERSION_STRING:
0000320 a2 53 58 31 32 36 31 20 56 32 44 20 32 44 30 32 | .SX1261 V2D 2D02
Expected string: SX1261
CMD     80
DATW    0       AA

CMD     80
DATW    0       A2

CMD     C0
DATR    0       A2      0       22

CMD     1D      3       20
DATR    0       A2      0       A2      0       53      0       58      0       31      0       32      0       36      0       31      0       20      0       56      032      0       44      0    20      0       32      0       44      0       30      0       32

SX126x not found! (10 of 10 tries) RADIOLIB_SX126X_REG_VERSION_STRING:
0000320 a2 53 58 31 32 36 31 20 56 32 44 20 32 44 30 32 | .SX1261 V2D 2D02
Expected string: SX1261
No SX126x found!

It looks like the SPIreadRegisterBurst in findChip produces one extra character at the beginning of the string, so the strncmp fails.

This is on the latest master.

gasagna commented 1 year ago

And this is the first bit of the output in one of the successful initialisation cases:

CMD     80
DATW    0       AA

CMD     80
DATW    0       A2

CMD     C0
DATR    0       22      0       22

CMD     1D      3       20
DATR    0       A2      0       53      0       58      0       31      0       32      0       36      0       31      0       20      0       56      0       32      0       44      0       20  032      0       44      0       30      0       32      0       0

Found SX126x: RADIOLIB_SX126X_REG_VERSION_STRING:
0000320 53 58 31 32 36 31 20 56 32 44 20 32 44 30 32 00 | SX1261 V2D 2D02.
jgromes commented 1 year ago

It looks like the SPIreadRegisterBurst in findChip produces one extra character at the beginning of the string, so the strncmp fails.

You're right, however, the origin of that character is a bit strange:

CMD     1D      3       20
DATR    0       A2      0       A2      0       53      0       58      0       31      0       32      0       36      0       31      0       20      0       56      032      0       44      0    20      0       32      0       44      0       30      0       32
(...)
CMD     1D      3       20
DATR    0       A2      0       53      0       58      0       31      0       32      0       36      0       31      0       20      0       56      0       32      0       44      0       20  032      0       44      0       30      0       32      0       0

The first exchange is the one that fails, the first three bytes that come from the device are 0xA2 0xA2 0x53. In the second one, the first three are 0xA2 0x53 0x58. So it seems like the extra byte is a repeat of the status byte. Why is this byte there twice I have no idea, but it looks like it's coming directly from the device, or at least the CubeCell's SPI peripheral.

gasagna commented 1 year ago

Luckily sometimes the initialisation works. One ugly solution is to also do a strcmp with the expected string from the second bytes onwards, to catch the cases that would fail otherwise.

jgromes commented 1 year ago

One ugly solution is to also do a strcmp with the expected string from the second bytes onwards, to catch the cases that would fail otherwise.

Truly ugly, even by my own standards. The check could be reversed so that the string "SX1261" at any point in the registers is considered valid. But if it happened here I would assume it can happen for other registers as well, so failing would be the safer option.

I'd rather investigate where is this second byte coming from before we jump to ugly patches.

gasagna commented 1 year ago

How would you debug this issue? I am happy to help.

jgromes commented 1 year ago

I would start by putting together a minimum example. No RadioLib, just SPI register reading. If the issue persists, then it's something in the CubeCell core, or in hardware. If not, it's a problem in RadioLib.

gasagna commented 1 year ago

Hi, here is some comprehensive tests that I have done to understand what's wrong. No RadioLib, only SPI.

I have defined two versions of a function get_version_string where I have swapped the order or setting the CS pin low and beginTransaction. I have constructed multiple setup functions that show how complex and non-intuitive the behaviour is.
Comment/uncomment to test them, but I have summarised the behaviour at the top, at least for the cubecell board v2 i have. I believe there is something buggy in the CubeCell arduino core (strange?), but I can't pinpoint where the problem is.

#include "SPI.h"
#include "Arduino.h"
#include "board-config.h"

#define SX1262_REG_VERSION_STRING 0x0320
#define SX1262_CMD_READ           0x1D
#define SX1262_NO_OP              0x00

uint8_t cmd_version[3] = {SX1262_CMD_READ,
                          (uint8_t)((SX1262_REG_VERSION_STRING >> 8) & 0xFF),
                          (uint8_t)( SX1262_REG_VERSION_STRING       & 0xFF)};

char version[16];

void get_version_string_v1(char* version) {
    digitalWrite(RADIO_NSS, LOW); 
    while (digitalRead(RADIO_BUSY) == HIGH) {
        delay(10);
    }

    SPI.beginTransaction(SPISettings(20000000, MSBFIRST, SPI_MODE0));

    _inner(version);
}

// just swapped the order of the first bits of code
void get_version_string_v2(char* version) {
    SPI.beginTransaction(SPISettings(20000000, MSBFIRST, SPI_MODE0));

    digitalWrite(RADIO_NSS, LOW); 
    while (digitalRead(RADIO_BUSY) == HIGH) {
        delay(10);
    }

    _inner(version);
}

void _inner(char* version) {
    // send command
    for (uint8_t i = 0; i < sizeof(cmd_version); i++) {
        uint8_t val = SPI.transfer(cmd_version[i]);
    }

    // status byte
    uint8_t val = SPI.transfer(SX1262_NO_OP);

    // read
    for (uint8_t i = 0; i < 16; i++) {
        uint8_t val = SPI.transfer(SX1262_NO_OP);
        *version++ = val;
    }

    SPI.endTransaction();
    digitalWrite(RADIO_NSS, HIGH); 
}

// this always works and returns
// SX1261 V2D 2D02
void setup() {
    Serial.begin(115200); delay(100);
    SPI.begin();

    get_version_string_v2((char*)version);
    Serial.println(version);
}

// // this never works and returns
// // "SX1261 V2D 2D02
// void setup() {
//     Serial.begin(115200); delay(100);
//     SPI.begin();

//     get_version_string_v1((char*)version);
//     Serial.println(version);
// }

// // but this always works and returns
// // SX1261 V2D 2D02
// // SX1261 V2D 2D02
// void setup() {
//     Serial.begin(115200); delay(100);
//     SPI.begin();

//     get_version_string_v1((char*)version);
//     Serial.println(version);

//     get_version_string_v2((char*)version);
//     Serial.println(version);
// }

// // this also always works
// // SX1261 V2D 2D02
// // SX1261 V2D 2D02
// void setup() {
//     Serial.begin(115200); delay(100);
//     SPI.begin();

//     get_version_string_v1((char*)version);
//     Serial.println(version);

//     get_version_string_v1((char*)version);
//     Serial.println(version);
// }

// // but this never does and returns
// // "SX1261 V2D 2D02
// // �SX1261 V2D 2D02
// void setup() {
//     Serial.begin(115200); delay(100);
//     SPI.begin();

//     char local_version[16];

//     get_version_string_v1((char*)local_version);
//     Serial.println(local_version);

//     get_version_string_v1((char*)local_version);
//     Serial.println(local_version);
// }

// // but this always does and returns
// // SX1261 V2D 2D02
// // SX1261 V2D 2D02
// void setup() {
//     Serial.begin(115200); delay(100);
//     SPI.begin();

//     char local_version[16];

//     get_version_string_v2((char*)local_version);
//     Serial.println(local_version);

//     get_version_string_v2((char*)local_version);
//     Serial.println(local_version);
// }

void loop() {}
gasagna commented 1 year ago

I have also tested with a different CubeCell Board Plus based on the same chip, and the results are predictable but different. In order:

  1. always works
  2. always works
  3. never works
  4. never works
  5. never works
  6. always works

To recap the results in the previous post for the CubeCell board V2

  1. always works
  2. never works
  3. always works
  4. always works
  5. never works
  6. always works
jgromes commented 1 year ago

Wow, thanks for digging into this, you're right, this is very strange. Unfortunately I don't have any CubeCell board, so I cannot verify your findings.

To be perfectly honest, I wouldn't be surprised if there was some strange bug in the CubeCell SPI, as I've seen some very strange design decisions in their Arduino implementation. Although the 10ms delay wou have when checking BUSY line state is quite long.

Looking into beginTransaction implementation on CubeCell, it only sets the SPISettings and then pulls an _ss pin LOW. The _ss pin should be set to -1 as far as I can tell, unfortunately the implementation of digitalWrite on CubeCell seems to be hidden in .a library, so there's no way to tell if -1 is actually a valid "do not use" pin number. endTransaction seems to be doing nothing.

A theory - did you check whether the BUSY line is ever high when the program enters the _v1/_v2 functions? Maybe resetting the CubeCell when the program starts also resets the SX126x. Looking at the datasheet, the BUSY line is high during startup, and while the datasheet doesn't seem to specify this, it seems reasonable to assume that while the BUSY line is high, we should not be pulling NSS low. This is actually in a weird order in RadioLib.

Could you try first waiting for BUSY to go low, and only then doing the NSS/beginTransaction?

gasagna commented 1 year ago

Hi,

I have checked and BUSY is always HIGH when the program enters the _v1/_v2 functions and remains HIGH forever. Hence, i do a reset upon entering these functions

void reset() {
    pinMode(RADIO_RESET, OUTPUT);
    digitalWrite(RADIO_RESET, LOW);
    delay(1);
    digitalWrite(RADIO_RESET, HIGH);
}

After the reset, BUSY is high but goes low quite quickly.

Then, the previous cases have these outcomes for my cubecell board plus:

  1. works sometimes
  2. never works
  3. almost never works
  4. never works
  5. never works
  6. works sometimes

So, resetting explicitly is generally worse and leads to unpredictable behaviour in some cases.

jgromes commented 1 year ago

Are you waiting for BUSY after the reset? The BUSY line should be held high during startup. In any case, it's strange BUSY is held high forever without the reset, and seems to be pointing to something weird directly in CubeCell.

gasagna commented 1 year ago

This is the _v1 function I used.

void get_version_string_v1(char* version) {
    reset();

    while (digitalRead(RADIO_BUSY) == HIGH) {
        Serial.print("*");
        delay(10);
    }

    pinMode(RADIO_NSS, OUTPUT); 
    digitalWrite(RADIO_NSS, LOW); 
    while (digitalRead(RADIO_BUSY) == HIGH) {
        delay(10);
    }

    SPI.beginTransaction(SPISettings(20000000, MSBFIRST, SPI_MODE0));

    _inner(version);
}

Yes, BUSY remains high forever, unless I call reset at the beginning of this function, in which case it goes low quite quickly.

Could be something wrong with the cubecell. I will also try posting a message on Heltec's support discourse site.

In the interest or resolving this bug, would it make sense to check whether sx1261 appears anywhere in the string returned from the registers? Thanks.

gasagna commented 1 year ago

Cross-linking here for posterity...

jgromes commented 1 year ago

In the interest or resolving this bug, would it make sense to check whether sx1261 appears anywhere in the string returned from the registers? Thanks.

I'm wary of this for the simple reason that version check is not the only register read that's done. Adding this workaround would only mask the issue, not resolve it.

gasagna commented 1 year ago

I did a last test, and .... figured out a simple, simple way to fix this issue. In short, the clock speed set by RadioLib to 2000000 is too high. With the default clock used in the cubecell core (here) of 1000000 I get reliable results all the time.

This change fixes the examples I showed in this thread. Now, i just need to check if it also fixes the issue when using RadioLib.

jgromes commented 1 year ago

That is very interesting, so far I have not seen the default 2 MHz frequency being too fast - quite the contrary, it always seemed very conservative to me.

We can set the default speed to 1 MHz only on CubeCell, or even just on some of the CubeCell boards if this can be detected by some macro. Definitely smells of hardware issue now.