jgromes / RadioLib

Universal wireless communication library for embedded devices
https://jgromes.github.io/RadioLib/
MIT License
1.55k stars 387 forks source link

RADIOLIB_LORAWAN_NO_DOWNLINK when calling activateOTAA() with Tock setup #1140

Closed alistair23 closed 3 months ago

alistair23 commented 4 months ago

Describe the bug

When trying to activate a LoRaWAN device I see the following failure

[2024-06-26 22:25:59.867] [SX1261] Initialising Radio ... 
[2024-06-26 22:25:59.995] RLB_PRO: Setting up fixed channels (subband 2)
[2024-06-26 22:25:59.998] RLB_PRO: mask[7] = 0x0002
[2024-06-26 22:26:00.001] RLB_PRO: UL:  65 1  (0 - 1082960896) | DL:   6 6  (0 - 0)
[2024-06-26 22:26:00.007] RLB_PRO: mask[0] = 0xff00
[2024-06-26 22:26:00.009] RLB_PRO: UL:  65 1  (0 - 1082960896) | DL:   6 6  (0 - 0)
[2024-06-26 22:26:00.013] RLB_PRO: UL:   8 1  (1610612736 - 1082959462) | DL:   0 5  (0 - 0)
[2024-06-26 22:26:00.021] RLB_PRO: UL:   9 1  (0 - 1082959872) | DL:   0 5  (0 - 0)
[2024-06-26 22:26:00.025] RLB_PRO: UL:  10 1  (-1610612736 - 1082960281) | DL:   0 5  (0 - 0)
[2024-06-26 22:26:00.034] RLB_PRO: UL:  11 1  (1073741824 - 1082960691) | DL:   0 5  (0 - 0)
[2024-06-26 22:26:00.040] RLB_PRO: UL:  12 1  (-536870912 - 1082961100) | DL:   0 5  (0 - 0)
[2024-06-26 22:26:00.046] RLB_PRO: UL:  13 1  (1610612736 - 1082961510) | DL:   0 5  (0 - 0)
[2024-06-26 22:26:00.052] RLB_PRO: UL:  14 1  (0 - 1082961920) | DL:   0 5  (0 - 0)
[2024-06-26 22:26:00.058] RLB_PRO: UL:  15 1  (-1610612736 - 1082962329) | DL:   0 5  (0 - 0)
[2024-06-26 22:26:00.374] RLB_PRO: [MAC] 0x03
[2024-06-26 22:26:00.374] RLB_PRO: 0000000 00 00 00 80                                     | ....              
[2024-06-26 22:26:00.383] RLB_PRO: LinkADRReq: dataRate = 0, txSteps = 0, chMask = 0x0000, chMaskCntl = 0, nbTrans = 0
[2024-06-26 22:26:00.395] RLB_PRO: LinkADRAns: status = 0x07
[2024-06-26 22:26:00.397] RLB_PRO: [MAC] 0x04
[2024-06-26 22:26:00.397] RLB_PRO: 0000000 00                                              | .                 
[2024-06-26 22:26:00.406] RLB_PRO: DutyCycleReq: max duty cycle = 1/2^0
[2024-06-26 22:26:00.409] RLB_PRO: [MAC] 0x05
[2024-06-26 22:26:00.412] RLB_PRO: 0000000 08 68 e2 8c                                     | .h..              
[2024-06-26 22:26:00.420] RLB_PRO: RXParamSetupReq: rx1DrOffset = 0, rx2DataRate = 8, freq = 
[2024-06-26 22:26:00.433] RLB_PRO: RXParamSetupAns: status = 0x07
[2024-06-26 22:26:00.436] RLB_PRO: [MAC] 0x08
[2024-06-26 22:26:00.436] RLB_PRO: 0000000 01                                              | .                 
[2024-06-26 22:26:00.444] RLB_PRO: RXTimingSetupReq: delay = 1 sec
[2024-06-26 22:26:00.447] RLB_PRO: [MAC] 0x09
[2024-06-26 22:26:00.450] RLB_PRO: 0000000 0d                                              | .                 
[2024-06-26 22:26:00.459] RLB_PRO: TxParamSetupReq: dlDwell = 0, ulDwell = 0, maxEirp = 30 dBm
[2024-06-26 22:26:00.464] RLB_PRO: [MAC] 0x0C
[2024-06-26 22:26:00.467] RLB_PRO: 0000000 65                                              | e                 
[2024-06-26 22:26:00.473] RLB_PRO: ADRParamSetupReq: limitExp = 6, delayExp = 5
[2024-06-26 22:26:00.479] RLB_PRO: [MAC] 0x0F
[2024-06-26 22:26:00.481] RLB_PRO: 0000000 fa                                              | .                 
[2024-06-26 22:26:00.490] RLB_PRO: RejoinParamSetupReq: maxTime = 15, maxCount = 10
[2024-06-26 22:26:00.493] RLB_PRO: RejoinParamSetupAns: status = 0x03
[2024-06-26 22:26:00.809] RLB_PRO: 
[2024-06-26 22:26:00.809] RLB_PRO: PHY: Frequency UL =  MHz
[2024-06-26 22:26:00.846] RLB_PRO: PHY: SF = 12, TX = 22 dBm, BW =  kHz, CR = 4/0
[2024-06-26 22:26:02.403] RLB_PRO: JoinRequest sent (DevNonce = 0) <-- Rx Delay start
[2024-06-26 22:26:02.406] RLB_PRO: 
[2024-06-26 22:26:02.409] RLB_PRO: PHY: Frequency DL =  MHz
[2024-06-26 22:26:02.447] RLB_PRO: PHY: SF = 12, TX = 22 dBm, BW =  kHz, CR = 4/0
[2024-06-26 22:26:07.376] RLB_PRO: Opening Rx1 window (190 ms timeout)... <-- Rx Delay end 
[2024-06-26 22:26:07.570] RLB_PRO: Closing Rx1 window
[2024-06-26 22:26:07.573] RLB_PRO: PHY: Frequency DL =  MHz
[2024-06-26 22:26:13.471] RLB_PRO: Opening Rx2 window (190 ms timeout)... <-- Rx Delay end 
[2024-06-26 22:26:13.664] RLB_PRO: Closing Rx2 window
[2024-06-26 22:26:13.680] activateOTAA failed, code -1116

[OP edited this log to add the timestamps as requested below]

To Reproduce The full code is here: https://github.com/alistair23/libtock-c/blob/alistair/lora-wan/examples/lora/sensor-lorawan/main.cc

The main snippet is below

  // create a new instance of the HAL class
  TockHal* hal = new TockHal();
  int state;

  // now we can create the radio module
  // pinout corresponds to the SparkFun LoRa Thing Plus - expLoRaBLE
  // NSS pin:   0
  // DIO1 pin:  2
  // NRST pin:  4
  // BUSY pin:  1
  SX1262 tock_module = new Module(hal, RADIO_NSS, RADIO_DIO_1, RADIO_RESET, RADIO_BUSY);
  LoRaWANNode node(&tock_module, Region, subBand);

  // Setup the radio
  // The settings here work for the SparkFun LoRa Thing Plus - expLoRaBLE
  tock_module.XTAL = true;
  state = tock_module.begin(915.0);

  if (state != RADIOLIB_ERR_NONE) {
    printf("begin failed, code %d\r\n", state);
    return 1;
  }

  node.beginOTAA(joinEUI, devEUI, nwkKey, appKey);

  state = node.activateOTAA();

  if (state != RADIOLIB_LORAWAN_NEW_SESSION) {
    printf("activateOTAA failed, code %d\r\n", state);
    return 1;
  }

Expected behavior https://github.com/jgromes/RadioLib/wiki/Troubleshooting-Guide seems to indicate that no downlink (-1116) is ok to ignore, but the actual examples check for it when running activateOTAA().

I would expect activateOTAA() to return RADIOLIB_LORAWAN_NEW_SESSION

Screenshots

Additional info (please complete):

StevenCellist commented 4 months ago

Hi @alistair23, thanks for including logs, but in LW, timestamps on logs are almost even more crucial than the log contents itself. So please, retry with millisecond-precision timestamps enabled. What network server are you using? Are you sure there is a gateway within reach of the device? Can you show what is happening on the server's console? LW is (unfortunately, although it's the charm of the tech) a heck of a lot more complicated than just having a correct configuration on the device, so please be as complete as possible!

StevenCellist commented 4 months ago

By the way, I will provide no help whatsoever as long as you don't decrease your uplink interval by orders of magnitude - you are breaching laws, and let's not even start about FUP if you are using TTN.

alistair23 commented 4 months ago

Does LoRaWAN require more precise timing? It's possible that's an issue there. I'll get some more logs and update the PR.

How do I get ms level precision logs?

I'm connecting to a TTN network. The gatway is in the same room and I see events like this in the TTN console, so it is able to communicate with the gateway

image

alistair23 commented 4 months ago

By the way, I will provide no help whatsoever as long as you don't decrease your uplink interval by orders of magnitude - you are breaching laws, and let's not even start about FUP if you are using TTN.

I don't follow. I am not changing anything related to uplink intervals

HeadBoffin commented 4 months ago

Does LoRaWAN require more precise timing?

Yes, LW requires that the receive is turned on a few milliseconds before the Rx1 window which is 5 seconds from the end of the uplink. The receiver will listen for the preamble, if it misses it, game over.

But as @StevenCellist mentions, you are hammering a shared community resource provided free of charge once every 1 second which is not only illegal in most jurisdictions (because the law limits use of the shared spectrum) but is also in breach of the TTN Fair Use Policy. The ISM band is generally very busy - you are currently pretty much jamming parts of the airwaves so other people will likely see a degradation in their devices connectivity. No where in our examples do we send more than once every few minutes and we reference the FUP to boot.

And there is the no small matter that LoRaWAN works on the basis of an uplink then opening a window 5 seconds later and then again 1 second after that for a response. So apart from the law and FUP, LW can't work at the rate you are using.

StevenCellist commented 4 months ago

^^ thank you @HeadBoffin

How do I get ms level precision logs?

That depends on what terminal you are using. ArduinoIDE has a clock button, PlatformIO has a monitor_filter = time option, other terminals usually have some hidden option too.

alistair23 commented 4 months ago

Yes, LW requires that the receive is turned on a few milliseconds before the Rx1 window which is 5 seconds from the end of the uplink. The receiver will listen for the preamble, if it misses it, game over.

Hmmm... I'm probably missing that window.

Ah! For the delay in the loop I just checked in something to open this issue. It's just copied from something else.

The loop isn't currently being run, but I will be sure to increase the delay considerably

alistair23 commented 4 months ago

I have updated the original PR description to include timing information. The timing information comes from minicom and as the printing is async it isn't going to be 100% accurate.

I have also updated the WIP code at https://github.com/alistair23/libtock-c/blob/alistair/lora-wan/examples/lora/sensor-lorawan/main.cc to use a long delay. So it won't spam any thing

And there is the no small matter that LoRaWAN works on the basis of an uplink then opening a window 5 seconds later and then again 1 second after that for a response. So apart from the law and FUP, LW can't work at the rate you are usin

From a quick look I think I'm too slow for this window

StevenCellist commented 4 months ago

A few observations.

  1. You missed the details on sendReceive(): it is likely to return NO_DOWNLINK_RECEIVED as you observed from the troubleshooting guide, which is not ERR_NONE and accordingly your sketch still spams the airwaves at illegal intervals. "it is just copied from something else" - well, but as soon as it works and we don't say anything, it will end up like this in the Tock examples and we have folks all over the globe killing the free band as @HeadBoffin said.
  2. The Rx1 window is ~a bit early~ right on time, but something is up with Rx2 window being scheduled an additional six seconds later instead of after six seconds. Don't see why that would be the case.
  3. A mistake on my side is in the released 6.6.0 but solved in upstream: the JoinRequest datarate should be 2, not 0 (here). Won't solve the Rx2 window being scheduled incorrectly, but may result in a JoinAccept being transmitted which isn't clear from your screenshot which is cropped to unusable size. Anything visible on the TTN Messages tab can be safely shared - it shows none of the keys required to decrypt your messages. The other stuff is transmitted unencrypted over the air and visible to everyone anyway.
StevenCellist commented 4 months ago

Oh and by the way, please fix the printf not being able to handle floating values - it might not be the problem but definitely won't hurt.

HeadBoffin commented 4 months ago

Ah! For the delay in the loop I just checked in something to open this issue. It's just copied from something else.

Before TTI lose the will to live with this sort of stuff, can you tell us where you copied this from so we can try & purge it from the internet - it won't work with any LNS but it definitely hurts TTN. There's also trying to track down the how & why of having the Rx2 window extended by 5 seconds.

What is the RSSI & SNR of the Join Request? One of the standard gotchas is the gateway hearing the JR but the device not hearing the JA due to being too far away, or more often, too close!

Side note, you are mixing an issue with a PR - this is an issue, a PR would be you providing code to make a change - different tab on the main page!

alistair23 commented 4 months ago

I think there has been some confusion, so I'll try to clarify.

This is an issue. I am getting a -1116 error from activateOTAA(), which I shouldn't be getting, hence the issue. I suspect the problem is on my side, not a problem in RadioLib, but I was hoping to get some help.

The code at https://github.com/alistair23/libtock-c/blob/alistair/lora-wan/examples/lora/sensor-lorawan/main.cc is just my local testing changes. I only checked it in to create this issue. I appreciate you pointing out the excess communication. I have fixed the delay to ensure it is run on every iteration of the loop. That code has never been run though

The code is based on the LoRa example (not LoRaWAN) from here: https://github.com/jgromes/RadioLib/blob/master/examples/NonArduino/Tock/main.cpp#L61

As for the Rx1 window, the Tock syscall interface usually results in delays, as the process of acquiring the current time and delays can take a few extra ms to occur. So I suspect we miss a window somehow. The printing is async so the times might not be correct. I'll try and dig into what is going on.

I'll try the latest from upstream to see if that helps, this is the full console output from TTN

image

StevenCellist commented 4 months ago

I think there has been some confusion, so I'll try to clarify.

We understand perfectly fine what is going on, but ...

That code has never been run though

Yeah, sure, I believe that for now, but that says nothing about what you will be doing with it once it does work. And since this is all about a shared resource being provided free of service, we care about these details. We don't solve problems here, we educate on apparently lots of missing knowledge. TTN provides a great Learn section which might prove very useful...

The code is based on the LoRa example

Very interesting that this provided code also breaks laws. How's your knowledge on dutycycles?

So I suspect we miss a window somehow.

An easy way to check this is by widening the scanGuard in downlinkCommon to e.g. 50ms - effectively meaning the device will start listening 50ms earlier as well as 100ms longer in total.

this is the full console output from TTN

There is no way for us to confirm whether this is the full output - what are the next lines? Does TTN even schedule a JoinAccept? We can't see that... what is it that you are so actively trying to hide from us? A naughty DeviceID? And it also wouldn't hurt sending updated logs from the device after switching to upstream, just to make sure we're all on board.

Come on man - a little less coffee, a little more effort to work together and I'm sure we can get it working!

alistair23 commented 4 months ago

I tried a scanGuard of 50 and I get this (plus I think I fixed the delay). Note that I have added a few extra prints.

[2024-06-27 19:31:23.624] RLB_PRO: mask[7] = 0x0002
[2024-06-27 19:31:23.627] RLB_PRO: UL:  65 1  (0 - 1082960896) | DL:   6 6  (0 - 0)
[2024-06-27 19:31:23.633] RLB_PRO: mask[0] = 0xff00
[2024-06-27 19:31:23.635] RLB_PRO: UL:  65 1  (0 - 1082960896) | DL:   6 6  (0 - 0)
[2024-06-27 19:31:23.639] RLB_PRO: UL:   8 1  (1610612736 - 1082959462) | DL:   0 5  (0 - 0)
[2024-06-27 19:31:23.647] RLB_PRO: UL:   9 1  (0 - 1082959872) | DL:   0 5  (0 - 0)
[2024-06-27 19:31:23.651] RLB_PRO: UL:  10 1  (-1610612736 - 1082960281) | DL:   0 5  (0 - 0)
[2024-06-27 19:31:23.660] RLB_PRO: UL:  11 1  (1073741824 - 1082960691) | DL:   0 5  (0 - 0)
[2024-06-27 19:31:23.665] RLB_PRO: UL:  12 1  (-536870912 - 1082961100) | DL:   0 5  (0 - 0)
[2024-06-27 19:31:23.671] RLB_PRO: UL:  13 1  (1610612736 - 1082961510) | DL:   0 5  (0 - 0)
[2024-06-27 19:31:23.677] RLB_PRO: UL:  14 1  (0 - 1082961920) | DL:   0 5  (0 - 0)
[2024-06-27 19:31:23.683] RLB_PRO: UL:  15 1  (-1610612736 - 1082962329) | DL:   0 5  (0 - 0)
[2024-06-27 19:31:23.982] RLB_PRO: [MAC] 0x03
[2024-06-27 19:31:23.982] RLB_PRO: 0000000 00 00 00 80                                     | ....              
[2024-06-27 19:31:23.991] RLB_PRO: LinkADRReq: dataRate = 0, txSteps = 0, chMask = 0x0000, chMaskCntl = 0, nbTrans = 0
[2024-06-27 19:31:24.002] RLB_PRO: LinkADRAns: status = 0x07
[2024-06-27 19:31:24.005] RLB_PRO: [MAC] 0x04
[2024-06-27 19:31:24.005] RLB_PRO: 0000000 00                                              | .                 
[2024-06-27 19:31:24.013] RLB_PRO: DutyCycleReq: max duty cycle = 1/2^0
[2024-06-27 19:31:24.016] RLB_PRO: [MAC] 0x05
[2024-06-27 19:31:24.019] RLB_PRO: 0000000 08 68 e2 8c                                     | .h..              
[2024-06-27 19:31:24.027] RLB_PRO: RXParamSetupReq: rx1DrOffset = 0, rx2DataRate = 8, freq = 
[2024-06-27 19:31:24.039] RLB_PRO: RXParamSetupAns: status = 0x07
[2024-06-27 19:31:24.042] RLB_PRO: [MAC] 0x08
[2024-06-27 19:31:24.042] RLB_PRO: 0000000 01                                              | .                 
[2024-06-27 19:31:24.051] RLB_PRO: RXTimingSetupReq: delay = 1 sec
[2024-06-27 19:31:24.054] RLB_PRO: [MAC] 0x09
[2024-06-27 19:31:24.056] RLB_PRO: 0000000 0d                                              | .                 
[2024-06-27 19:31:24.065] RLB_PRO: TxParamSetupReq: dlDwell = 0, ulDwell = 0, maxEirp = 30 dBm
[2024-06-27 19:31:24.071] RLB_PRO: [MAC] 0x0C
[2024-06-27 19:31:24.073] RLB_PRO: 0000000 65                                              | e                 
[2024-06-27 19:31:24.080] RLB_PRO: ADRParamSetupReq: limitExp = 6, delayExp = 5
[2024-06-27 19:31:24.085] RLB_PRO: [MAC] 0x0F
[2024-06-27 19:31:24.088] RLB_PRO: 0000000 fa                                              | .                 
[2024-06-27 19:31:24.096] RLB_PRO: RejoinParamSetupReq: maxTime = 15, maxCount = 10
[2024-06-27 19:31:24.100] RLB_PRO: RejoinParamSetupAns: status = 0x03
[2024-06-27 19:31:24.397] RLB_PRO: 
[2024-06-27 19:31:24.397] RLB_PRO: PHY: Frequency UL =  MHz
[2024-06-27 19:31:24.432] RLB_PRO: PHY: SF = 12, TX = 22 dBm, BW =  kHz, CR = 4/0
[2024-06-27 19:31:25.985] RLB_PRO: JoinRequest sent (DevNonce = 0) <-- Rx Delay start
[2024-06-27 19:31:25.988] RLB_PRO: this->rxDelayStart: 18999544 ms
[2024-06-27 19:31:25.994] RLB_PRO: now: 18999553 ms
[2024-06-27 19:31:25.997] RLB_PRO: this->rxDelays[0]: 5000 ms
[2024-06-27 19:31:26.000] RLB_PRO: 
[2024-06-27 19:31:26.000] RLB_PRO: PHY: Frequency DL =  MHz
[2024-06-27 19:31:26.036] RLB_PRO: PHY: SF = 12, TX = 22 dBm, BW =  kHz, CR = 4/0
[2024-06-27 19:31:26.070] RLB_PRO: this->rxDelays[i]: 5000 ms
[2024-06-27 19:31:26.073] RLB_PRO: mod->hal->millis(): 18999634 ms
[2024-06-27 19:31:26.076] RLB_PRO: waitLen 4856 ms
[2024-06-27 19:31:30.839] RLB_PRO: Opening Rx1 window (290 ms timeout)... <-- Rx Delay end 
[2024-06-27 19:31:31.130] RLB_PRO: Closing Rx1 window
[2024-06-27 19:31:31.133] RLB_PRO: PHY: Frequency DL =  MHz
[2024-06-27 19:31:31.159] RLB_PRO: this->rxDelays[i]: 6000 ms
[2024-06-27 19:31:31.162] RLB_PRO: mod->hal->millis(): 19004845 ms
[2024-06-27 19:31:31.165] RLB_PRO: waitLen 645 ms
[2024-06-27 19:31:31.817] RLB_PRO: Opening Rx2 window (290 ms timeout)... <-- Rx Delay end 
[2024-06-27 19:31:32.108] RLB_PRO: Closing Rx2 window
[2024-06-27 19:31:32.123] activateOTAA failed, code -1116

I tried the latest mainline and that doesn't even get this far. Something has changed recently that causes me to get RADIOLIB_ERR_DWELL_TIME_EXCEEDED instead. I haven't had a chance to debug that yet.

TTN forwards the join-accept, as you can see in the log. That's all of the lines on the TTN side. The next line is the next attempt at a different time. I'm not hiding anything

image

StevenCellist commented 4 months ago

I haven't had a chance to debug that yet.

Please do so, as this is something crucial. The point is that you are not allowed to join at SF12 in AU915 region due to the dwell time limitations of 400ms that I did not see earlier. I added the dwell time checks but also corrected the join-request to be sent at SF10 instead, which should work. I might have made a mistake there in some way, but we do need to fix that. Sending the JR at an appropriate SF may just as well be a reason why this is not working correctly. I am not sure how TTN handles this as Australia is pretty much other side of the world!

That's all of the lines on the TTN side

Sorry, I was mistaken on that one, I thought there was an extra line to show that a downlink occurred with a downward arrow. I stand corrected.

I think I fixed the delay

Definitely looks much better now, but the Rx1 and Rx2 windows are shifting forwards; how sure are you that your clock is stable and not drifting? CubeCells for example drift by 16 ms every 1000 ms, and the logs indicate something similar here. There are some instructions in BuildOpt here to figure this out. Please let me know what you find! And I like Big Logs, I cannot lie...

HeadBoffin commented 4 months ago

I am not sure how TTN handles this as Australia is pretty much other side of the world!

But you have access to the location warper 915 TTIG - so let the testing commence.

From the logs, and yes, precise timing issues etc etc, the Rx1 window was opened 4.854s - with a scan guard of 50ms that means it closed at 4.954s (50ms listening for preamble, 50ms early as per new setting) - you need to hit 5s. It may be time for an oscilloscope or digital signal logger to be used.

It would be prudent to get something working on a known good device / platform so you can see RadioLib in action with the logs showing what is good. Along the way it may fill in some of the details you need to wrestle with the whole LoRaWAN thing - it's like ~Shrek~ an onion, soooo many layers. The TTN Learn section is pretty much mandatory reading to make good progress - that's why they are mentioned so early in the most amazing starter notes ever.

StevenCellist commented 4 months ago

50ms listening for preamble

That's not exactly the case ;) it listens for as long as a preamble can take given the SF and BW.

[2024-06-27 19:31:30.839] RLB_PRO: Opening Rx1 window (290 ms timeout)... <-- Rx Delay end 
[2024-06-27 19:31:31.130] RLB_PRO: Closing Rx1 window

^ that's a 291 ms window of which 100 ms is due to scanGuard but 191 ms due to preamble.

alistair23 commented 4 months ago

For context this is using the latest (commit ce8e6fdfb031b92acf020cef201a9607fc56be31)

[2024-06-27 20:06:58.216] RLB_PRO: mask[7] = 0x0002
[2024-06-27 20:06:58.219] RLB_PRO: UL:  65 1  (0 - 1082960896) | DL:   6 6  (0 - 0)
[2024-06-27 20:06:58.224] RLB_PRO: mask[0] = 0xff00
[2024-06-27 20:06:58.227] RLB_PRO: UL:  65 1  (0 - 1082960896) | DL:   6 6  (0 - 0)
[2024-06-27 20:06:58.230] RLB_PRO: UL:   8 1  (1610612736 - 1082959462) | DL:   0 5  (0 - 0)
[2024-06-27 20:06:58.239] RLB_PRO: UL:   9 1  (0 - 1082959872) | DL:   0 5  (0 - 0)
[2024-06-27 20:06:58.243] RLB_PRO: UL:  10 1  (-1610612736 - 1082960281) | DL:   0 5  (0 - 0)
[2024-06-27 20:06:58.251] RLB_PRO: UL:  11 1  (1073741824 - 1082960691) | DL:   0 5  (0 - 0)
[2024-06-27 20:06:58.258] RLB_PRO: UL:  12 1  (-536870912 - 1082961100) | DL:   0 5  (0 - 0)
[2024-06-27 20:06:58.264] RLB_PRO: UL:  13 1  (1610612736 - 1082961510) | DL:   0 5  (0 - 0)
[2024-06-27 20:06:58.270] RLB_PRO: UL:  14 1  (0 - 1082961920) | DL:   0 5  (0 - 0)
[2024-06-27 20:06:58.276] RLB_PRO: UL:  15 1  (-1610612736 - 1082962329) | DL:   0 5  (0 - 0)
[2024-06-27 20:06:58.574] RLB_PRO: [MAC] 0x03
[2024-06-27 20:06:58.574] RLB_PRO: 0000000 20 00 00 80                                     |  ...              
[2024-06-27 20:06:58.582] RLB_PRO: LinkADRReq: dataRate = 2, txSteps = 0, chMask = 0x0000, chMaskCntl = 0, nbTrans = 0
[2024-06-27 20:06:58.593] RLB_PRO: LinkADRAns: status = 0x07
[2024-06-27 20:06:58.596] RLB_PRO: [MAC] 0x04
[2024-06-27 20:06:58.596] RLB_PRO: 0000000 00                                              | .                 
[2024-06-27 20:06:58.605] RLB_PRO: DutyCycleReq: max duty cycle = 1/2^0
[2024-06-27 20:06:58.608] RLB_PRO: [MAC] 0x05
[2024-06-27 20:06:58.611] RLB_PRO: 0000000 08 68 e2 8c                                     | .h..              
[2024-06-27 20:06:58.619] RLB_PRO: RXParamSetupReq: rx1DrOffset = 0, rx2DataRate = 8, freq = 
[2024-06-27 20:06:58.631] RLB_PRO: RXParamSetupAns: status = 0x07
[2024-06-27 20:06:58.635] RLB_PRO: [MAC] 0x08
[2024-06-27 20:06:58.635] RLB_PRO: 0000000 01                                              | .                 
[2024-06-27 20:06:58.643] RLB_PRO: RXTimingSetupReq: delay = 1 sec
[2024-06-27 20:06:58.646] RLB_PRO: [MAC] 0x09
[2024-06-27 20:06:58.649] RLB_PRO: 0000000 1d                                              | .                 
[2024-06-27 20:06:58.658] RLB_PRO: TxParamSetupReq: dlDwell = 0, ulDwell = 1, maxEirp = 30 dBm
[2024-06-27 20:06:58.663] RLB_PRO: [MAC] 0x0C
[2024-06-27 20:06:58.666] RLB_PRO: 0000000 65                                              | e                 
[2024-06-27 20:06:58.671] RLB_PRO: ADRParamSetupReq: limitExp = 6, delayExp = 5
[2024-06-27 20:06:58.677] RLB_PRO: [MAC] 0x0F
[2024-06-27 20:06:58.680] RLB_PRO: 0000000 fa                                              | .                 
[2024-06-27 20:06:58.689] RLB_PRO: RejoinParamSetupReq: maxTime = 15, maxCount = 10
[2024-06-27 20:06:58.691] RLB_PRO: RejoinParamSetupAns: status = 0x03
[2024-06-27 20:06:58.988] RLB_PRO: 
[2024-06-27 20:06:58.988] RLB_PRO: PHY: Frequency UL =  MHz
[2024-06-27 20:06:59.024] RLB_PRO: PHY: SF = 10, TX = 22 dBm, BW =  kHz, CR = 4/0
[2024-06-27 20:06:59.047] activateOTAA failed, code -1114
StevenCellist commented 4 months ago

Thanks for the log, will have to figure out why that is the case... because the ToA is calculated for 23 bytes, which as per the air-time calculator (set at 13+10=23 bytes) should not exceed 400ms.. but that will require to dig deeper and see what values are being calculated.

If you can figure out what the stability of your clock is, I will check what is up with the dwell time calculation tomorrow.

HeadBoffin commented 4 months ago

What is the RSSI and SNR of the Join Request?

StevenCellist commented 4 months ago

Please pull from upstream - latest commit is tested on AU915/2 and joins perfectly fine on SF10 under dwell time limitations (forgot to convert microseconds to milliseconds).

alistair23 commented 4 months ago

Please pull from upstream - latest commit is tested on AU915/2 and joins perfectly fine on SF10 under dwell time limitations (forgot to convert microseconds to milliseconds).

Great! Thanks. That fixes the dwell time issue. Now back to the activateOTAA failed, code -1116 problem

I think the next step is to try and hookup a logic analyser and see if that provides any hints. I'll let you know when I have my breakout board setup

StevenCellist commented 4 months ago

The experts think there are two open questions awaiting your response, namely:

What is the RSSI and SNR of the Join Request?

... and ...

If you can figure out what the stability of your clock is

... which are both questions that easily supersede connecting a logic analyzer.

alistair23 commented 4 months ago

What is the RSSI and SNR of the Join Request?

I see this in the TTN logs

    "settings": {
      "data_rate": {
        "lora": {
          "bandwidth": 125000,
          "spreading_factor": 10,
          "coding_rate": "4/5"
        }
      },
      "frequency": "918200000",
      "timestamp": 605731571,
      "time": "2024-06-28T12:13:26.750392913Z"
...
    "rx_metadata": [
      {
...
        "time": "2024-06-28T12:13:26.750392913Z",
        "timestamp": 605731571,
        "rssi": -49,
        "channel_rssi": -49,
        "snr": 13.5,
   ...
        "received_at": "2024-06-28T12:13:26.768107401Z"
      }
    ],
...
    "device_channel_index": 15,
    "consumed_airtime": "0.370688s"

and this in the gateway logs

Fri Jun 28 12:13:26 2024 daemon.info station[24842]: [S2E:VERB] RX 918.2MHz DR2 SF10/BW125 snr=13.5 rssi=-49 xtime=0xDB0006241ABAF3 - jreq MHdr=00 JoinEui=::0 DevEui=70b3:d57e:d006:8a59 DevNonce=0 MIC=860356858
Fri Jun 28 12:13:28 2024 daemon.info station[24842]: [S2E:DEBU] ::1 diid=35076 [ant#0] - next TX start ahead by 3s155ms (12:13:31.742494)
Fri Jun 28 12:13:32 2024 daemon.info station[24842]: [S2E:VERB] ::1 diid=35076 [ant#0] - starting TX in 19ms835us: 927.5MHz 30.0dBm ant#0(0) DR10 SF10/BW500 frame=20ABFD19E6BFC963DBEC895D..0842BEE4 (33 bytes)
Fri Jun 28 12:13:32 2024 daemon.info station[24842]: [S2E:INFO] TX ::1 diid=35076 [ant#0] - dntxed: 927.5MHz 30.0dBm ant#0(0) DR10 SF10/BW500 frame=20ABFD19E6BFC963DBEC895D..0842BEE4 (33 bytes)
Fri Jun 28 12:13:32 2024 daemon.info station[24842]: [S2E:DEBU] Tx done diid=35076

If you can figure out what the stability of your clock is

The hardware clock is stable, at least as accurately that I can measure. I suspect there are timing errors though as the print statements are async. I'm hoping I can get a better idea of the timing via a LA to see if I'm missing the window

StevenCellist commented 4 months ago
 "rssi": -49,
 "snr": 13.5,

A wall in between the device & gateway wouldn't hurt (and actually be better), but not completely catastrophical.

Regarding the async: yes I got that. But what I see in your logs, is that your clock is running too fast, because the Rx1 and Rx2 window are opening and closing increasingly earlier. And given the timeout that is printed being 290 ms and the prints occurring with 291 ms between them, that async stuff doesn't appear to impact timing that much. Really, I recommend investing whether your clock is running at the same speed as the rest of the world. Ideally, as instructed in BuildOpt.h, let your device print a character at a 1000ms interval for like a minute long, and report back what you see with timestamped logs.

alistair23 commented 4 months ago

A wall in between the device & gateway wouldn't hurt (and actually be better), but not completely catastrophical.

Yeah they are in the same room.

Regarding the async: yes I got that. But what I see in your logs, is that your clock is running too fast, because the Rx1 and Rx2 window are opening and closing increasingly earlier. And given the timeout that is printed being 290 ms and the prints occurring with 291 ms between them, that async stuff doesn't appear to impact timing that much. Really, I recommend investing whether your clock is running at the same speed as the rest of the world. Ideally, as instructed in BuildOpt.h, let your device print a character at a 1000ms interval for like a minute long, and report back what you see with timestamped logs.

Good point. I'll test and let you know

alistair23 commented 3 months ago

This is what I am seeing

[2024-07-01 09:42:21.547] Starting
[2024-07-01 09:42:22.525] One second
[2024-07-01 09:42:23.502] One second
[2024-07-01 09:42:24.480] One second
[2024-07-01 09:42:25.457] One second
[2024-07-01 09:42:26.435] One second
[2024-07-01 09:42:27.412] One second
[2024-07-01 09:42:28.390] One second
[2024-07-01 09:42:29.367] One second
[2024-07-01 09:42:30.346] One second
[2024-07-01 09:42:31.323] One second
[2024-07-01 09:42:32.301] One second
[2024-07-01 09:42:33.278] One second
[2024-07-01 09:42:34.256] One second
[2024-07-01 09:42:35.234] One second
[2024-07-01 09:42:36.211] One second
[2024-07-01 09:42:37.189] One second
[2024-07-01 09:42:38.166] One second
[2024-07-01 09:42:39.144] One second
[2024-07-01 09:42:40.121] One second
[2024-07-01 09:42:41.099] One second
[2024-07-01 09:42:42.077] One second
[2024-07-01 09:42:43.054] One second
[2024-07-01 09:42:44.032] One second
[2024-07-01 09:42:45.009] One second
[2024-07-01 09:42:45.987] One second
[2024-07-01 09:42:46.964] One second
[2024-07-01 09:42:47.942] One second
[2024-07-01 09:42:48.919] One second
[2024-07-01 09:42:49.897] One second
[2024-07-01 09:42:50.874] One second
[2024-07-01 09:42:51.852] One second
[2024-07-01 09:42:52.830] One second
[2024-07-01 09:42:53.808] One second
[2024-07-01 09:42:54.785] One second
[2024-07-01 09:42:55.762] One second
[2024-07-01 09:42:56.740] One second
[2024-07-01 09:42:57.717] One second
[2024-07-01 09:42:58.695] One second
[2024-07-01 09:42:59.673] One second
[2024-07-01 09:43:00.650] One second
[2024-07-01 09:43:01.628] One second
[2024-07-01 09:43:02.605] One second
[2024-07-01 09:43:03.583] One second
[2024-07-01 09:43:04.560] One second
[2024-07-01 09:43:05.538] One second
[2024-07-01 09:43:06.516] One second
[2024-07-01 09:43:07.493] One second
[2024-07-01 09:43:08.471] One second
[2024-07-01 09:43:09.449] One second
[2024-07-01 09:43:10.427] One second
[2024-07-01 09:43:11.404] One second
[2024-07-01 09:43:12.382] One second
[2024-07-01 09:43:13.359] One second
[2024-07-01 09:43:14.337] One second
[2024-07-01 09:43:15.314] One second
[2024-07-01 09:43:16.292] One second
[2024-07-01 09:43:17.270] One second
[2024-07-01 09:43:18.247] One second
[2024-07-01 09:43:19.224] One second
[2024-07-01 09:43:20.202] One second
[2024-07-01 09:43:21.179] One second
[2024-07-01 09:43:22.157] One second

So that's a clock drift of about -23. I surprised that it's negative!

Setting RADIOLIB_CLOCK_DRIFT_MS to (-23) still gives me this

[2024-07-01 09:52:15.857] RLB_PRO: PHY: Frequency DL =  MHz
[2024-07-01 09:52:15.893] RLB_PRO: PHY: SF = 10, TX = 22 dBm, BW =  kHz, CR = 4/0
[2024-07-01 09:52:15.927] RLB_PRO: this->rxDelays[i]: 5000 ms
[2024-07-01 09:52:15.930] RLB_PRO: mod->hal->millis(): 164857 ms
[2024-07-01 09:52:15.933] RLB_PRO: waitLen 4857 ms
[2024-07-01 09:52:20.697] RLB_PRO: Opening Rx1 window (176 ms timeout)... <-- Rx Delay end 
[2024-07-01 09:52:20.877] RLB_PRO: Closing Rx1 window
[2024-07-01 09:52:20.880] RLB_PRO: PHY: Frequency DL =  MHz
[2024-07-01 09:52:20.906] RLB_PRO: this->rxDelays[i]: 6000 ms
[2024-07-01 09:52:20.909] RLB_PRO: mod->hal->millis(): 169956 ms
[2024-07-01 09:52:20.912] RLB_PRO: waitLen 758 ms
[2024-07-01 09:52:21.674] RLB_PRO: Opening Rx2 window (290 ms timeout)... <-- Rx Delay end 
[2024-07-01 09:52:21.965] RLB_PRO: Closing Rx2 window
[2024-07-01 09:52:21.979] activateOTAA failed, code -1116
StevenCellist commented 3 months ago

Ahw, you're missing one line in the final log: the Rx Delay Start! Meanwhile, I'll have to check whether it should be positive or negative.. given that Rx2 now starts 25ms earlier than Rx1 so might have meant for the value to be positive this way round...

alistair23 commented 3 months ago

Sorry, full log is

[2024-07-01 21:17:43.386] RLB_PRO: JoinRequest sent (DevNonce = 0) <-- Rx Delay start
[2024-07-01 21:17:43.390] RLB_PRO: this->rxDelayStart: 120917 ms
[2024-07-01 21:17:43.395] RLB_PRO: now: 120927 ms
[2024-07-01 21:17:43.395] RLB_PRO: this->rxDelays[0]: 5000 ms
[2024-07-01 21:17:43.401] RLB_PRO: 
[2024-07-01 21:17:43.401] RLB_PRO: PHY: Frequency DL =  MHz
[2024-07-01 21:17:43.436] RLB_PRO: PHY: SF = 7, TX = 22 dBm, BW =  kHz, CR = 4/0
[2024-07-01 21:17:43.470] RLB_PRO: this->rxDelays[i]: 5000 ms
[2024-07-01 21:17:43.473] RLB_PRO: mod->hal->millis(): 121007 ms
[2024-07-01 21:17:43.476] RLB_PRO: waitLen 4856 ms
[2024-07-01 21:17:48.239] RLB_PRO: Opening Rx1 window (131 ms timeout)... <-- Rx Delay end 
[2024-07-01 21:17:48.376] RLB_PRO: Closing Rx1 window
[2024-07-01 21:17:48.378] RLB_PRO: PHY: Frequency DL =  MHz
[2024-07-01 21:17:48.404] RLB_PRO: this->rxDelays[i]: 6000 ms
[2024-07-01 21:17:48.407] RLB_PRO: mod->hal->millis(): 126059 ms
[2024-07-01 21:17:48.410] RLB_PRO: waitLen 804 ms
[2024-07-01 21:17:49.216] RLB_PRO: Opening Rx2 window (290 ms timeout)... <-- Rx Delay end 
[2024-07-01 21:17:49.508] RLB_PRO: Closing Rx2 window
[2024-07-01 21:17:49.523] activateOTAA failed, code -1116
alistair23 commented 3 months ago

This is the capture from a Saleae

LoRaWAN-Capture.zip

You should be able to open it with https://www.saleae.com/pages/downloads if you are interested

StevenCellist commented 3 months ago

I see that the clock drift compensation is not used, as you are using a different HAL. Please refer to the ArduinoHAL and implement these calculations in your own HAL (all four of them). Hopefully then the clock drift compensation works nicely.

I downloaded Saleae but it says I need a .sal file which is not in your zip.

alistair23 commented 3 months ago

Aw, sorry. Just rename the .zip to .sal. GitHub limits the extensions of uploads

alistair23 commented 3 months ago

Woo! Now getting so much further with the drift compensation working and a larger scanGuard

[2024-07-02 16:32:08.219] RLB_PRO: LinkADRAns: status = 0x07
[2024-07-02 16:32:08.241] success!
[2024-07-02 16:32:08.241] [SX1261] Transmitting 'Temp: 0, Hum: 0' 
[2024-07-02 16:32:08.455] RLB_PRO: 
[2024-07-02 16:32:08.455] RLB_PRO: PHY: Frequency UL =  MHz
[2024-07-02 16:32:08.482] RLB_PRO: PHY: SF = 10, TX = 22 dBm, BW =  kHz, CR = 4/0
[2024-07-02 16:32:08.501] failed, code -1114
StevenCellist commented 3 months ago

Congrats, good to see you got it working. Please try and see how tight you can get the scanGuard, as smaller is better. 50ms is quite large already, <=10ms would be great.

Also, now you must learn about not sending strings, but just the values in binary, i.e. like in the LW examples. That'll make you a much friendlier neighbour and help you not hitting dwell time limitations.