markqvist / RNode_Firmware

RNode is an open, free and flexible digital radio interface with many uses
https://unsigned.io/rnode
GNU General Public License v3.0
173 stars 62 forks source link

Add SX1280 driver and split LoRa driver in 3 parts #64

Closed jacobeva closed 7 months ago

jacobeva commented 7 months ago

Does what it says on the tin.

This one took bloody ages, far longer than I expected! Currently I've been testing the SX1280 functionality with the LoRa1280F27 board which I've turned into a WisBlock module to be used with the RAK4630.

I have not managed to get higher than 60 ish kbps on this modem, reason being that there seems to be an issue with the hardware. It has not achieved the 203kbps bitrate in the datasheet, but drops to around 10 on the highest bandwidth and lowest spreading factor. Very strange. Unfortunately also, the reps from NiceRF are currently on holiday for Chinese New Year, so I can't get hold of them to enquire what's wrong with it. So, the higher data rates are untested, and there is no fix for the LoRa1280F27 currently, but the driver works, and well at that!

I have split the LoRa driver into three parts. Things were becoming a nightmare with all the different preprocessor blocks, etc. This also opens the door to having dual modem functionality in the future. I will open an issue to ask for your input on how that problem should be approached Mark.

Everything has been tested on my NRF52 boards as per usual, but not ESP32 boards. Additional testing to ensure I haven't broken anything would be appreciated.

Any issues or anything that needs clarifying, please let me know.

markqvist commented 7 months ago

This is friggin' great. Awesome work man. I just verified all the work until now compiles and runs on the old boards, which is perfect, since we then have a working base to compare from if anything has gone amiss here. I'm checking out for today, but I can hopefully start testing this tomorrow at some point.

Strange with the speed issues, but no worries there, just the fact that it works is brilliant.

markqvist commented 7 months ago

The speed issues could potentially be packet collisions due to mistuned CSMA parameters at those high speeds. Or could be something completely different. Either way, we'll get it sorted :) Most important is to get it all incorporated and working now, and we can probably debug that easily.

jacobeva commented 7 months ago

mistuned CSMA parameters

I'm not so sure. I think it's actually a hardware thing. From my limited testing yesterday, I couldn't get a higher data rate out of the modem than the following config: SF7, 1625kHz BW, CR 5. With SF6, the bit rate tanked to like 1/4 of the value as compared to SF7 (!). This was in a separate arduino sketch I made just for testing this functionality. I will try the FLRC mode and see if it has the same issue. Maybe I've messed something up somewhere, but all the other bandwidths seem to work ok, which is suspicious. I've also tested it on two modems, so either there's a design issue from factory, or I've made mistake somewhere which is probably some kind of edge case, causing this behaviour on the SX1280.

jacobeva commented 7 months ago

I just verified all the work until now compiles and runs on the old boards, which is perfect, since we then have a working base to compare from if anything has gone amiss here. I'm checking out for today, but I can hopefully start testing this tomorrow at some point.

Great! Hopefully I didn't introduce too many bugs...

markqvist commented 7 months ago

Okay, I had a run around with this, and so far it's looking good. With a few minor modifications, all the previous ESP32-based boards compile and operate correctly, so that's really good :)

Unfortunately, I'm having a hard time getting the SX1262 driver running on an SX1262-based T-Beam device.

I've pushed an update just now with the initial support for SX1262-based T-Beam devices, and it compiles and runs fine. It also transmits something, visible on SDR and on another SX1276-based ESP32 RNode (as a signal on the waterfall display and RX LED), but nothing can actually decode the transmission. Not even another SX1262-based T-Beam running the same firmware ;)

Currently I'm not sure whether the problem is due to:

I'd really appreciate your input on this :)

I'm also a bit confused on which of the RX_ENABLE, BUSY and DIO pins are actually needed by the SX1262 driver? All of them, or will it adapt to whichever are connected on the specific module? Looks like it may use quite a different pin configuration than your RAK boards according to:

jacobeva commented 7 months ago

Thanks for the update Mark! A few notes: If you are seeing a waterfall, then the driver is probably working in some capacity. This rules out your worry about the rxen pin. There seems to be no antenna switch on that device, so it is not needed. Perhaps the name is misleading, the SX1262 model I had came with an antenna switch which only allowed for the total switch on or off of the antenna. I don't think the sync word should matter, but maybe it's possible (unlikely IMO). I don't think that device has a TCXO. The DIO pin is of course needed to trigger the packet received interrupt. The BUSY pin is used to check when the modem is ready to communicate, else errors will probably occur in SPI comms. rxen is not needed if there is no antenna switch on the device.

I hope that helps. You may want to try getting the device to print the packet it receives if possible, maybe check all the modem settings are correct as well (I know, had to say it...).

jacobeva commented 7 months ago

I will do some testing tomorrow to check if this all still functions on my hardware with the merge of the BLE branch too, I should hope it will.

markqvist commented 7 months ago

Thanks for the suggestions. I'm not getting any output on the waterfall - just completely empty.

Having a closer look at this without being way too tired definitely helps ;) Had the interrupt pin confused, but still no go. I'm beginning to think this is actually a TCXO issue, since the board does seem to have a TCXO that needs 1.8v supply. We'll probably need to add support for setting this in the config, which I'll have a go at now.

By the off chance you're reading this now, and know the correct register values for 1.8v TCXO setup, any input is much appreciated. Otherwise, off to the datasheet with me it is ;)

markqvist commented 7 months ago

I found it. I'm adding TCXO voltage selection to the config and will push something initial shortly.

markqvist commented 7 months ago

Just FYI before you start testing, I'm moving the board-specific config to a new file and doing some cleanup, will have this up shortly.

markqvist commented 7 months ago

Ok, quick update:

markqvist commented 7 months ago

Logic statements like these can be quite confusing, since they don't mean what you might think because of right-hand primacy:

    if (buf[1] & IRQ_PREAMBLE_DET_MASK_6X != 0) {
        byte = byte | 0x01 | 0x04;
        // clear register after reading
        clearbuf[1] = IRQ_PREAMBLE_DET_MASK_6X;
    }

To match the intention, it needs to be:

if ( (buf[1] & IRQ_PREAMBLE_DET_MASK_6X) != 0 )

I've fixed it, just wanted to let you know :)

jacobeva commented 7 months ago

Ah, apologies for structuring the if statement that way, I'll keep that in mind in the future. Glad you've gotten SX1262 to SX1262 working! Do the sync words actually matter? I didn't think they actually had to match? If that's the case, the functionality to support them will have to be added, but it's not rocket science, just writing to another register :)

markqvist commented 7 months ago

No worries. The sync words definitely do matter, and have to match, but it turns out it probably had nothing to do with the core problem.

I had to fix a few minor bugs to get to this point, but I don't think any of them had anything to do with the core of the problem, mostly just small things that made it hard to debug, like the waterfall not working and those logic errors that broke channel activity (and therefore receive LED) functionality.

Also, I removed your csma_r = random(20) hack. It has to be 255 for the CSMA algo to even make sense. There's other ways to increase throughput, but let's take that when things are working.

Anyway, after having looked some more at this, something very weird is going on.

Here's an SDR capture of an SX1276 and an SX1262 sending a packet of 28 bytes of 0x88 with BW 62.5KHz, SF8 and CR6:

packets_88

All channel and modulation settings match. The slight frequency mismatch is to be expected from the SX1276 not having a TCXO. The packet transmitted by the SX1262 is significantly longer for some reason, but the preambles seem to match. The SX1276 does not decode this, but does detect the preamble and header.

Going in the other direction, I've manually verified that when sending from an SX1276, the SX1262 will also detect the preamble and header, and try to decode the packet. The decode will fail however, and the IRQ register will show a CRC error. If I just ignore that CRC error and retrieve the packet anyways, this is what I get, over four different tries:

8b 8e d0 92 6d 51 7a d1 23 de 31 f2 66 05 13 ef c5 02 d6 24 40 89 a3 bf f3 de 43 39
88 8e d0 92 6d 51 7a d1 23 de 31 f2 66 05 13 ef c5 02 d6 24 50 4b a3 6f 23 c3 4c 00
8b 88 d0 ff bd 51 7a d7 23 de 31 f2 66 05 13 ef c5 02 d6 24 40 0c ab d7 02 2c fe ed
89 8e d0 fe 68 51 7a d7 23 de b1 f2 66 05 13 ef c5 03 d6 2c 40 83 ef bf 71 de 48 43

Instead of:

88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88

To me, this looks very much like a loss of receiver PLL sync or something like that, since data starts out almost correct, and then slides into semi-consistent gibberish.

Please note, that between two different SX1262 chips, everything works great.

I'm a bit stumped at this point. Have you got any clues? I've pushed my current fixes to the repo, and here's a few things I noticed, that may or may not have relevance:

jacobeva commented 7 months ago

Also, I removed your csma_r = random(20) hack. It has to be 255 for the CSMA algo to even make sense.

Sorry! I was messing about and forgot I even changed that lol! I should really use comments more...

That length is set to zero because we do not know the length of the packet at that point. It is probably unnecessary, and could be removed and I don't think it would affect functionality. The packet params being set once in beginPacket() and once again in endPacket() is pointless actually. I did not think of this until now. So lines 438 and 436 could probably be removed, but I personally think it unlikely to affect your issue.

It being a PLL issue is entirely possible indeed. What of the other way around? What does the SX1276 see from the SX1262?

markqvist commented 7 months ago

It gets a CRC error too, but I didn't read out the actual received data that way around. I'd expect it to be a similar story though.

Something is definitely very strange though, and why the two devices/drivers would exhibit so different behavior while transmitting is currently beyond me. I hope you can investigate it further.

It'd be really helpful to know if it's just an issue between the SX1262-based T-Beams I have been testing on, or whether the same thing happens on your RAK boards.

markqvist commented 7 months ago

The datasheet mentions that PLL, ADC, image an RC oscillator calibration must be done in RC_STDBY mode, so I modified the initialization procedure to do this, but still no success - the SX1262 driver still modulates and demodulates differently than the SX1276 one.

Unfortunately, I'm out of ideas at this point.

jacobeva commented 7 months ago

It'd be really helpful to know if it's just an issue between the SX1262-based T-Beams I have been testing on, or whether the same thing happens on your RAK boards.

Unfortunately I don't have any SX1276 based boards mate, so I wouldn't know :(

I would revert those changes you made for the calibration, idle() puts it in STANDBY mode anyway and the calibration was done thereafter, this wouldn't affect the issue :)

Have you set the sync words on both sides correctly? I thought just now that this may have caused the issue perhaps?

jacobeva commented 7 months ago

I can do some investigation tomorrow on this issue and see if I can figure out what may cause it through reading the datasheet again, but in terms of practical testing, I am unable to do so since my lack of an SX1276 board!

markqvist commented 7 months ago

I would revert those changes you made for the calibration, idle() puts it in STANDBY mode anyway

No, idle() puts the modem into STDBY_XOSC mode, not STDBY_RC (difference being the oscillator source). That's why I added it. See line 679.

Have you set the sync words on both sides correctly? I thought just now that this may have caused the issue perhaps?

Yes. The sync words are currently set to 0x12 for SX1276 and 0x1424 for SX1262, which is how it needs to be for interoperability.

I can do some investigation tomorrow on this issue and see if I can figure out what may cause it through reading the datasheet again

Awesome. You know the chip and datasheet a lot better than me at this point, so hopefully you can spot something.

We can't really release this until we can confirm interoperability between the SX1262 and SX1276 drivers. Even if it requires changing something in the SX1276 driver, that's totally acceptable, but it does need to interoperate.

My best bet bet currently is that there is indeed something weird going on in PLL calibration or something else oscillator, timing or crystal related, due to the very different packet timings. Could even be some sort of fault in the low data-rate optimisation configuration (but I don't know the SX1262 well enough to really speak on that).

jacobeva commented 7 months ago

My understanding is that when the modem enters STDBY_XOSC it has to be recalibrated anyway, given the oscillator source has changed. Perhaps I'm wrong, I'll check that.

The LDRO config could be the issue, I will check that. Thanks for the suggestions, I will get on this tomorrow mate!

On 11 February 2024 21:02:42 GMT, markqvist @.***> wrote:

I would revert those changes you made for the calibration, idle() puts it in STANDBY mode anyway

No, idle() puts the modem into STDBY_XOSC mode, not STDBY_RC (difference being the oscillator source). That's why I added it. See line 679.

Have you set the sync words on both sides correctly? I thought just now that this may have caused the issue perhaps?

Yes. The sync words are currently set to 0x12 for SX1276 and 0x1424 for SX1262, which is how it needs to be for interoperability.

I can do some investigation tomorrow on this issue and see if I can figure out what may cause it through reading the datasheet again

Awesome. You know the chip and datasheet a lot better than me at this point, so hopefully you can spot something.

We can't really release this until we can confirm interoperability between the SX1262 and SX1276 drivers. Even if it requires changing something in the SX1276 driver, that's totally acceptable, but it does need to interoperate.

My best bet bet currently is that there is indeed something weird going on in PLL calibration or something else oscillator, timing or crystal related, due to the very different packet timings. Could even be some sort of fault in the low data-rate optimisation configuration (but I don't know the SX1262 well enough to really speak on that).

-- Reply to this email directly or view it on GitHub: https://github.com/markqvist/RNode_Firmware/pull/64#issuecomment-1937868602 You are receiving this because you authored the thread.

Message ID: @.***>

jacobeva commented 7 months ago

Oh sorry, I just read your second to last reply again. I'll go over the data sheet once more, you may be right about it needing to be in STDBY_RC mode, I will check.

On 11 February 2024 21:02:42 GMT, markqvist @.***> wrote:

I would revert those changes you made for the calibration, idle() puts it in STANDBY mode anyway

No, idle() puts the modem into STDBY_XOSC mode, not STDBY_RC (difference being the oscillator source). That's why I added it. See line 679.

Have you set the sync words on both sides correctly? I thought just now that this may have caused the issue perhaps?

Yes. The sync words are currently set to 0x12 for SX1276 and 0x1424 for SX1262, which is how it needs to be for interoperability.

I can do some investigation tomorrow on this issue and see if I can figure out what may cause it through reading the datasheet again

Awesome. You know the chip and datasheet a lot better than me at this point, so hopefully you can spot something.

We can't really release this until we can confirm interoperability between the SX1262 and SX1276 drivers. Even if it requires changing something in the SX1276 driver, that's totally acceptable, but it does need to interoperate.

My best bet bet currently is that there is indeed something weird going on in PLL calibration or something else oscillator, timing or crystal related, due to the very different packet timings. Could even be some sort of fault in the low data-rate optimisation configuration (but I don't know the SX1262 well enough to really speak on that).

-- Reply to this email directly or view it on GitHub: https://github.com/markqvist/RNode_Firmware/pull/64#issuecomment-1937868602 You are receiving this because you authored the thread.

Message ID: @.***>

markqvist commented 7 months ago

To offer whatever help and insight I can here, I'v conducted a little further investigation, just to verify board integrity and cross-compatibility between the old SX1276 driver and SX1262/8 chips:

I tried using the RadioLib-based example found here to:

There is definitely something going sideways with the SX1262 driver :(

markqvist commented 7 months ago

Another idea that might help in debugging this would be to use this example with the RadioLib SX1262 driver, and see whether your driver can decode packets sent from that and also the other way around.

I'm fairly certain it will show the same behaviour as on ESP32, but even in the odd case that it does work when running on the nRF board, we'd have it pinned down further.

My best bet at this point though (although I'm definitely not at all certain), would be that the drivers startup (and calibration) sequence and/or timing is not right, leading to problems with RC-oscillator and PLL calibration. From the symptoms, it certainly looks like that could be the problem.

jacobeva commented 7 months ago

Thanks for the extra suggestions Mark. I'll take a look at this now. I'll try and fiddle around and see if I can figure out what the issue is :)

markqvist commented 7 months ago

Awesome! Really hope it's just a traditional case of something relatively minor that, once discovered turns out to be completely logical and easy to rectify :)

jacobeva commented 7 months ago

I have made some changes and will be pushing to my master branch in the next 15 mins once I can verify SX1262 -> SX1262 still works. Please bear with :)

jacobeva commented 7 months ago

Okay. Please clone my repo and try running that build on the SX1262 :+1:

Side note: when I was debugging the modem I didn't realise the RX LED was meant to be coming on from modemStatus, I thought it just never came on! So I was turning it on myself in the code (which I managed to remember to remove before pushing). Funny how right-hand primacy can mess with you huh!

That still needs to be fixed in the SX128x driver, along with a couple other changes I have stored locally. So, let me know if this build fixes anything (I pray it does!!), then I can put it all in a PR.

jacobeva commented 7 months ago

Additionally, if that build does not fix the issue, please try removing lines 379-381 and putting a call to idle(); in its place.

markqvist commented 7 months ago

Unfortunately, that didn't change anything. But...

Oh my dear flying spaghetti monster. I found the bug.

void sx126x::handleLowDataRate(){
    _ldro = 1;
    setModulationParams(_sf, _bw, _cr, _ldro);
}

That was naughty ;) Looks like you forgot to evaluate when the LDRO should actually be enabled ;)

It needs to be turned on only when symbol time is above 16ms. Don't worry about this one, I know the calculations for it, so I can implement that quickly, and I have a few other cleanups I did while hunting for this, so let me just push those as well.

jacobeva commented 7 months ago

Oops! Let me know if that fixes it :)

On 13 February 2024 11:05:44 GMT, markqvist @.***> wrote:

Unfortunately, that didn't change anything. But...

Oh my dear flying spaghetti monster. I found the bug.

void sx126x::handleLowDataRate(){
   _ldro = 1;
   setModulationParams(_sf, _bw, _cr, _ldro);
}

That was naughty ;) Looks like you forgot to evaluate when the LDRO should actually be enabled ;)

It needs to be turned on only when symbol time is above 16ms. Don't worry about this one, I know the calculations for it, so I can implement that quickly, and I have a few other cleanups I did while hunting for this, so let me just push those as well.

-- Reply to this email directly or view it on GitHub: https://github.com/markqvist/RNode_Firmware/pull/64#issuecomment-1941230224 You are receiving this because you authored the thread.

Message ID: @.***>

markqvist commented 7 months ago

Indeed it does, and everything seems to be working correctly now, across both old and new firmware versions, SX1276/8's and SX1262/8's. All is joy :)

I'm just running some final tests now, and will push the updates in a moment. I took the liberty of including your image calibration function too, since I wanted to include that in these tests as well.

jacobeva commented 7 months ago

Great! Sorry for letting that bug slip in there, glad everything is all working well now though 👍

On 13 February 2024 13:22:43 GMT, markqvist @.***> wrote:

Indeed it does, and everything seems to be working correctly now, across both old and new firmware versions, SX1276/8's and SX1262/8's. All is joy :)

I'm just running some final tests now, and will push the updates in a moment. I took the liberty of including your image calibration function too, since I wanted to include that in these tests as well.

-- Reply to this email directly or view it on GitHub: https://github.com/markqvist/RNode_Firmware/pull/64#issuecomment-1941506124 You are receiving this because you authored the thread.

Message ID: @.***>

markqvist commented 7 months ago

No worries, just nice that it was indeed something very simple, just a bit well hidden ;)

I've pushed the changes. Could you please verify that everything still works on nRF? And if I managed to break something, please PR it ;)

jacobeva commented 7 months ago

LGTM!