lorabasics / basicstation

LoRa Basics™ Station - The LoRaWAN Gateway Software
https://doc.sm.tc/station
Other
357 stars 183 forks source link

Confirmed Downlink Failure with Basic Station Setup Using corecell with SX1302 concentrator #194

Closed kroghm96 closed 4 months ago

kroghm96 commented 4 months ago

The problem

I've been debugging this issue for 2 weeks without a solution. I belive there must be a problem with basic station. The problem occurs when sending confirmed downlinks.

  1. I send a confirmed downlink with random data and port.
  2. It gets queued.
  3. On the first uplink, the downlink is sent as expected.
  4. On the next uplink, the LNS fails the confirmed downlink with the error: "Downlink was not acknowledged on the next uplink". IF the downlink was confirmed correctly it would not send this error.

Initially, I thought the end device was not confirming the downlink. I tested another end device that I know confirms downlinks, but the problem persisted. Tested with a Legacy Packet Forwarder and confirmed that the confirmed downlinks works without issues. Thus, the problem appears to be with the Basic Station setup.

The end device can join normally without failure so downlinks must work in some way.

Hardware setup

Basic Station setup

Using the corecell example station.conf for basic station configuration:

Log file

Logfile with full XDEBUG log info: station.log

Example of the error

Here’s an example from the LNS log:

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Time | Gateway Time | Gateway | Origin | Type | Join Id | Seq No | RSSI | SNR | Data Rate | Frequency | Channel | Port | Bytes (PHY/APP) | App Payload | MIC | Too Late | Details -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 05/22/2024 11:50:54.492 AM | N/A |   |   | Downlink Failed |   |   |   |   |   | -1 |   |   |   |   |   |   | DnMsgID=138984032, FailureReason=Downlink was not acknowledged on the next uplink, DnMsgID=138984032, DnSeqNo=1, UpSeqNo=7 05/22/2024 11:50:54.491 AM | 05/22/24 11:43:08.644.1841 AM | B827EBFFFE6835B7 |   | Uplink | 95 | 7 | -36 | 12 | 5 | 868,3 | 1 | 1 | 26 (13) | 48656C6C6F2C20776F726C6421 | A3549279 | false | FCtrl [ADR=True ADRAckReq=False ACK=False] 05/22/2024 11:50:22.297 AM | 05/22/24 11:42:36.266.7898 AM | B827EBFFFE6835B7 |   | Downlink Tx | 95 | 1 |   |   | 5 | 868,1 | 0 | 12 |   |   |   |   | DnMsgID=138984032, Status=Successful 05/22/2024 11:50:21.303 AM | 05/22/24 11:42:36.266.7898 AM | B827EBFFFE6835B7 |   | Conf Downlink | 95 | 1 |   |   | 5 | 868,1 | 0 | 12 | 14 (1) | 32 | ECE485B9 |   | [UpSeqNo=5], DnMsgID=138984032, Confirmed=True, AppPldLen=1 05/22/2024 11:50:21.048 AM | 05/22/24 11:42:35.266.7898 AM | B827EBFFFE6835B7 |   | Uplink | 95 | 5 | -37 | 13 | 5 | 868,1 | 0 | 1 | 26 (13) | 48656C6C6F2C20776F726C6421 | 73BDA67F | false | FCtrl [ADR=True ADRAckReq=False ACK=False] 05/22/2024 11:48:09.869 AM | N/A |   |   | Downlink Queued |   |   |   |   |   | -1 |   |   | 1 (0) | 32 |   |   | Timeout=5 Min DnMsgID=138984032, FPort=12, Confirmed=True, AppPldLen=1

An error I see in the log file sometimes is: [S2E:ERRO] ::0 diid=0 [ant#0] - radio is not emitting frame - abandoning TX, trying alternative

Don't know if this could be related

Debugging attempts

I'm in a loss for more testing scenarios. What do you think could be wrong? Could it be SPI speed? Timing issues? If you need more information let me know

beitler commented 4 months ago

Hi @kroghm96 ,

thank you for your diligence in drafting this complete report.

Based on your description I would be surprised if LoRa Basics Station systematically fails to deliver Confirmed Downlinks while Unconfirmed Downlinks are delivered correctly.

In short, it looks like the acknowledgement to the confirmed downlink is in Uplink with FCnt=6. It is received by the gateway. Why does it not appear in your LNS log?

I analyzed the gateway logs. The example log of your LNS is reflected in the gateway logs like this:

2024-05-22 09:43:12.266 - Uplink Devaddr=02000250 Fcnt=5

2024-05-22 09:43:12.264 [RAL:XDEB] [CRC OK] 868.100MHz 13.00/-37.2 SF7/BW125 (mod=16/dr=7/bw=4) xtick=01b9693c (28928316) 26 bytes: 405002000280050001C27FA29EF6D746C2142F65C8A873BDA67F
2024-05-22 09:43:12.266 [S2E:VERB] RX 868.1MHz DR5 SF7/BW125 snr=13.0 rssi=-37 xtime=0xBD000001B9693C - updf mhdr=40 DevAddr=02000250 FCtrl=80 FCnt=5 FOpts=[] 01C27FA2..C8A8 mic=2141633907 (26 bytes)
2024-05-22 09:43:12.266 [AIO:XDEB] [4|WS] > {"msgtype":"updf","MHdr":64,"DevAddr":33555024,"FCtrl":128,"FCnt":5,"FOpts":"","FPort":1,"FRMPayload":"C27FA29EF6D746C2142F65C8A8","MIC":2141633907,"RefTime":0.000000,"DR":5,"Freq":868100000,"upinfo":{"rctx":0,"xtime":53198770627242300,"gpstime":0,"fts":-1,"rssi":-37,"snr":13,"rxtime":1716370992.266790}}

Regular uplink from device DevAddr=02000250 with Fcnt=5

2024-05-22 09:43:13.257 - Downlink DevAddr=02000250

2024-05-22 09:43:13.141 [AIO:XDEB] [4|WS] < {"dC":0,"DevEui":"70-B3-D5-7E-D0-06-79-2F","diid":2824,"msgtype":"dnmsg","MuxTime":1.716371421303E9,"pdu":"A0500200028001000CB7ECE485B9","priority":0,"rctx":0,"RX1DR":5,"RX1Freq":868100000,"RX2DR":0,"RX2Freq":869525000,"RxDelay":1,"xtime":53198770627242300}
2024-05-22 09:43:13.142 [S2E:DEBU] 70b3:d57e:d006:792f diid=2824 [ant#0] - next TX start ahead by 109ms789us (09:43:13.252083)
2024-05-22 09:43:13.232 [S2E:VERB] 70b3:d57e:d006:792f diid=2824 [ant#0] - starting TX in 19ms880us: 868.1MHz 16.0dBm ant#0(0) DR5 SF7/BW125 frame=A0500200028001000CB7ECE485B9 (14 bytes)
2024-05-22 09:43:13.257 [S2E:XDEB] DC EU band 100 blocked until 09:43:17.373 (txtime=09:43:13.252 airtime=41ms216us)
2024-05-22 09:43:13.257 [S2E:XDEB] DC dnchnl 0 blocked until 09:43:14.735 (txtime=09:43:13.252 airtime=41ms216us)
2024-05-22 09:43:13.257 [AIO:XDEB] [4|WS] > {"msgtype":"dntxed","seqno":2824,"diid":2824,"DR":5,"Freq":868100000,"DevEui":"70-B3-D5-7E-D0-06-79-2F","rctx":0,"xtime":53198770628242300,"txtime":335.060154,"gpstime":0}
2024-05-22 09:43:13.257 [S2E:INFO] TX 70b3:d57e:d006:792f diid=2824 [ant#0] - dntxed: 868.1MHz 16.0dBm ant#0(0) DR5 SF7/BW125 frame=A0500200028001000CB7ECE485B9 (14 bytes)
2024-05-22 09:43:13.258 [AIO:XDEB] [4] socket write bytes=179
2024-05-22 09:43:13.293 [S2E:DEBU] Tx done diid=2824

Downlink sent to device DevAddr=02000250 in response to uplink Fcnt=5. The frame is A0500200028001000CB7ECE485B9. The header xA0 indicates that this is a confirmed downlink.

image

LoRaWAN L2 1.04

2024-05-22 09:43:13.344 - Uplink DevAddr=02000250 FCnt=6

2024-05-22 09:43:13.343 [RAL:XDEB] [CRC OK] 867.500MHz 11.50/-40.2 SF7/BW125 (mod=16/dr=7/bw=4) xtick=01c9f621 (30012961) 12 bytes: 4050020002A0060024F71294
2024-05-22 09:43:13.344 [S2E:VERB] RX 867.5MHz DR5 SF7/BW125 snr=11.5 rssi=-40 xtime=0xBD000001C9F621 - updf mhdr=40 DevAddr=02000250 FCtrl=A0 FCnt=6 FOpts=[]  mic=-1810696412 (12 bytes)
2024-05-22 09:43:13.344 [AIO:XDEB] [4|WS] > {"msgtype":"updf","MHdr":64,"DevAddr":33555024,"FCtrl":160,"FCnt":6,"FOpts":"","FPort":-1,"FRMPayload":"","MIC":-1810696412,"RefTime":1716371421.505621,"DR":5,"Freq":867500000,"upinfo":{"rctx":0,"xtime":53198770628326945,"gpstime":0,"fts":-1,"rssi":-40,"snr":11.5,"rxtime":1716370993.344707}}
2024-05-22 09:43:13.345 [AIO:XDEB] [4] socket write bytes=300

Uplink from device DevAddr=02000250 with Fcnt=6 sent only ~90ms after the downlink. This is likely direct response to the confirmed downlink. The FCtrl=A0 has the ACK bit set. So this is the acknowledgement to the confirmed downlink (LoRaWAN L2 1.04 Section 4.3.1.2)

image

LoRaWAN L2 1.04 Section 4.3.1

Why does this uplink not appear in your LNS log?

2024-05-22 09:43:45.643 - Uplink Devaddr=02000250 Fcnt=7

2024-05-22 09:43:45.643 [RAL:XDEB] [CRC OK] 868.300MHz 11.50/-36.2 SF7/BW125 (mod=16/dr=7/bw=4) xtick=03b6b624 (62305828) 26 bytes: 405002000280070001F18667503CFDE125707BB9CD19A3549279
2024-05-22 09:43:45.643 [S2E:VERB] RX 868.3MHz DR5 SF7/BW125 snr=11.5 rssi=-36 xtime=0xBD000003B6B624 - updf mhdr=40 DevAddr=02000250 FCtrl=80 FCnt=7 FOpts=[] 01F18667..CD19 mic=2039633059 (26 bytes)
2024-05-22 09:43:45.644 [AIO:XDEB] [4|WS] > {"msgtype":"updf","MHdr":64,"DevAddr":33555024,"FCtrl":128,"FCnt":7,"FOpts":"","FPort":1,"FRMPayload":"F18667503CFDE125707BB9CD19","MIC":2039633059,"RefTime":1716371453.801327,"DR":5,"Freq":868300000,"upinfo":{"rctx":0,"xtime":53198770660619812,"gpstime":0,"fts":-1,"rssi":-36,"snr":11.5,"rxtime":1716371025.644184}}
2024-05-22 09:43:45.644 [AIO:XDEB] [4] socket write bytes=324

Uplink from device DevAddr=02000250 with Fcnt=7. The ACK bit is not set. Mistakenly, this seems to be interpreted by your LNS as a missing acknowledgement to the confirmed downlink, because it has not considered Uplink with FCnt=6.

kroghm96 commented 4 months ago

Hi @beitler,

Thanks a lot for looking into my issue, greatly appreciated!

Just for your information, the end device is running LoRaWAN 1.0.3. I'm not sure if this makes a difference compared to version 1.0.4.

Update

I have tried to add a new end device which I found to work with the Basic Station. This end device acknowledges the second uplink as expected. This end device only sends an uplink when manually moving a magnet towards it.

Here's what I observed:

  1. We send a confirmed downlink.
  2. Manually trigger the uplink.
  3. The device acknowledges the downlink.
  4. Manually trigger the next uplink, FCtrl is equal to A0, and it is accepted and seen in the LNS.

You can see this in the log: good_log.txt

It corresponds with the LNS log: <html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Time | Gateway Time | Gateway | Origin | Type | Join Id | Seq No | RSSI | SNR | Data Rate | Frequency | Channel | Port | Bytes (PHY/APP) | App Payload | MIC | Too Late | Details -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 05/23/2024 02:22:41.288 PM | 05/23/24 02:22:41.083.7319 PM | B827EBFFFE6835B7 |   | Uplink | 5 | 13 | -52 | 14 | 2 | 868,1 | 0 | 1 | 17 (4) | FFFF0900 | 7C62DC38 | false | DownlinkACK [DnSeqNo=6 DnMsgID=139328346], FCtrl [ADR=True ADRAckReq=False ACK=True] 05/23/2024 02:22:39.250 PM | 05/23/24 02:22:39.083.9412 PM | B827EBFFFE6835B7 |   | Downlink Tx | 5 | 6 |   |   | 2 | 867,9 | 7 | 14 |   |   |   |   | DnMsgID=139328346, Status=Successful 05/23/2024 02:22:38.343 PM | 05/23/24 02:22:39.083.9412 PM | B827EBFFFE6835B7 |   | Conf Downlink | 5 | 6 |   |   | 2 | 867,9 | 7 | 14 | 14 (1) | 67 | 9EBB8D05 |   | [UpSeqNo=12], DnMsgID=139328346, Confirmed=True, AppPldLen=1 05/23/2024 02:22:38.087 PM | 05/23/24 02:22:38.083.9412 PM | B827EBFFFE6835B7 |   | Uplink | 5 | 12 | -53 | 14 | 2 | 867,9 | 7 | 1 | 17 (4) | FFFF0901 | 8F84DC5D | false | FCtrl [ADR=True ADRAckReq=False ACK=False] 05/23/2024 02:22:24.957 PM | N/A |   |   | Downlink Queued |   |   |   |   |   | -1 |   |   | 1 (0) | 67 |   |   | Timeout=5 Min DnMsgID=139328346, FPort=14, Confirmed=True, AppPldLen=1

So we know it works now. But what could be wrong with the other end device? The main difference is that it seems the end device, which does not work, sends an automatic acknowledged uplink right after it has received the information that it has to confirm the downlink, whereas the one that works just sends it up with the next manual uplink.

Observations

Successful Log

In a successful log: in the acknowledged uplink 2024-05-23 12:22:40.962 [AIO:XDEB] [4|WS] > {"msgtype":"updf","MHdr":64,"DevAddr":33555046,"FCtrl":160,"FCnt":13,"FOpts":"","FPort":1,"FRMPayload":"39A1FB70","MIC":953967228,"RefTime":1716466960.797669,"DR":2,"Freq":868100000,"upinfo":{"rctx":0,"xtime":40250923014845522,"gpstime":1400502179083732,"fts":-1,"rssi":-52,"snr":13.75,"rxtime":1716466960.962827}} We can see that FPort=1.

Unsuccessful Log

Whereas in the unsuccessful log: 2024-05-22 09:43:13.344 [AIO:XDEB] [4|WS] > {"msgtype":"updf","MHdr":64,"DevAddr":33555024,"FCtrl":160,"FCnt":6,"FOpts":"","FPort":-1,"FRMPayload":"","MIC":-1810696412,"RefTime":1716371421.505621,"DR":5,"Freq":867500000,"upinfo":{"rctx":0,"xtime":53198770628326945,"gpstime":0,"fts":-1,"rssi":-40,"snr":11.5,"rxtime":1716370993.344707}}

Questions and Hypotheses

What does it mean when FPort is equal to -1? It does not seem to be documented in the LoRaWAN 1.0.3 Specification. Could it be a problem that the end-device automatically sends an acknowledged uplink right after the first uplink? While debugging the end-device (a simple LilyGo T-Beam board running original T-Beam code from the LMIC library), I observed that the uplink package from the end-device has FPort=1 and not -1.

Does Basic Station read it wrong? It might be so, since we know this problem does not occur in Legacy Packet Forwarder.

Thanks again for your help!

Best regards, kroghm96

beitler commented 4 months ago

Hi @kroghm96 ,

thanks for doing additional tests with another device. This could be a decisive observation.

The 'working' PHYPayload on air is: 40 | 66020002 | A0 | 0D00 | [01 39A1FB70] | 7C62DC38 The 'failing' PHYPayload on air is: 40 | 50020002 | A0 | 0600 | 24F71294

I broke up the PHYPayload into MHDR | DevAddr | FCtrl | [FPort FRMPayload] | MIC according to LoRaWAN L2 1.0.4 Section 4.

As you can see, the 'failing' frame has no FPort | FRMPayload. This is why for that frame, the Basics Station LNS protocol populates FPort=-1 to encode the missing FPort into an INT32, as described in the LoRa Basics Station LNS Protocol Spec:

image

⚠ I have the suspicion that your LNS provider might have overlooked this part of the Basics Station LNS Protocol, and fails to interpret FPort=-1 correctly.

kroghm96 commented 4 months ago

Hi @beitler,

Very interesting!

Update

This morning, I tried to send a confirmed downlink using a TTN-LNS, and it worked as expected. TTN accepted the PHYPayload, and in that message, it was acknowledged. It seems like an LNS problem.

Before I contact the provider with all the new information, do you know if there is a way to simulate the PHYPayload to have a port and FRMPayload to see if the LNS accepts it if it looks like a normal payload with data and port?

Additionally, do you know if there is a difference between the Legacy Packet Forwarder and Basic Station in how they parse the PHYPayload? I'm confused why the Legacy Packet Forwarder works with the provider while Basic Station does not. Basic Station is, of course, newer and might not have been fully supported by the provider. But it would be nice to know if there is a key difference in how the port is parsed between the two systems.

Thanks again for your help!

Best regards,
@kroghm96

beitler commented 4 months ago

Hi @kroghm96 ,

your observations with TTN are consistent with the analysis so far. So, indeed I would encourage you to reach out the LNS provider where you see the unexpected behavior.

For testing purposes, 'faking' FPort and FRMPayload for messages where they don't exist requires changes in lora.c probably around here.

The difference in your LNS's behavior compared to the Legacy Packet Forwarder is likely due to the fact that the affected code path is specific to the LoRa Basics Station protocol. The Legacy Packet Forwarder simply forwards the whole PHYPayload to the LNS. In Basics Station, we do additional sanity checks, filtering, etc.. Hence, we lift the FPort and FRMPayload out of the PHYPayload and encode the FHDR content as explicit fields with LoRaWAN semantics. This means that on the LNS side, the server-side Basics Station protocol implementation requires dedicated code paths to interpret those FHDR fields when parsing Basics Station LNS protocol messages. In these places, I assume, the LNS needs some fixing.

kroghm96 commented 4 months ago

Hi @beitler,

Okay will contact them and give an update later then.

I tried to fake the FPort and FRMPayload but the function you reffered to seems to only parse an rx payload and print it(At least it doesn't help with making the LNS understand the payload). Maybe I'm wrong?

Thanks for clearing up Basic Station vs. Legacy.

kroghm96 commented 4 months ago

Hi @beitler,

It turns out the issue was on the LNS side due to a mismatch in the calculation of the MIC.

Thank you very much for all your help, Anton. Your assistance has been greatly appreciated.

Best regards, Mikkel

beitler commented 4 months ago

Thanks for coming back with the confirmation. Glad I could help.