helium / router

router combines a LoRaWAN Network Server with an API for console, and provides a proxy to the Helium blockchain
Apache License 2.0
69 stars 32 forks source link

Several users report many many link_adr_req down link messages and slow joins #437

Closed leroyle closed 3 years ago

leroyle commented 3 years ago

Using US915 Hotspot { "channel": 114, "frequency": 925.1, "id": "112DnKkS5qVvRoK1TFcbhT7ReezwtNTVmE2zTB2ZnAa3UMZYJ8kN", "lat": 32.9550615760182, "long": -96.75233778326856, "name": "harsh-sepia-gerbil", "rssi": 27, "snr": 0, "spreading": "SF10BW500" }

We are seeing very slow join acceptance at the device even though I can see multiple JOIN_REC/JOIN_ACCEPT messages within the console device debug view We are seeing almost 100% consistently unconfirmed uplink messages being followed by downlink messages. The down links are the link_adr_req MAC command message.

Debugging a WisBlock device shows the device runtime never does see/process the link_adr_req MAC command. I'm seeing RX1 timeouts reported at the device when I believe it should be receiving/processing the downlink MAC command. I very seldom with get a call to the OnRadioRxDone() call back. But will to the timeout callback instead. This even though the console is showing many downlink messages sent.

I have been anywhere from 3 feet to 30 feet from my local Helium hot spot.

I cannot say when this first appeared as a problem as I've only had very few/simple interactions with my device and the network since just prior to the 2.0 cut over. Others are thinking this has been somewhat of a problem after the 2.0 cut over.

I do not know why the RX1 timeout is occurring, though I do see slow updates within my browser console window but have chalked some of that up to a slow PC machine at my end. I've also seen the occasional late packet messages as well.

mfalkvidd commented 3 years ago

3 feet is usually too close, there is a high risk the hotspot saturates the radio on your device. 30 feet should work whough, especially if there is a wall or two between the hotspot and the device.

thedyerman commented 3 years ago

We are seeing very slow join acceptance at the device even though I can see multiple JOIN_REC/JOIN_ACCEPT messages within the console device debug view We are seeing almost 100% consistently unconfirmed uplink messages being followed by downlink messages. The down links are the link_adr_req MAC command message.

I can confirm the same behaviour - So many uplinks missing/ignored image

The downlinks are using the "router" integration which I don't have but I assume is the console image

And the downlinks are MAC command messages (link_adr_req) image

pdvaneijk commented 3 years ago

I am seeing the same thing, lots of LINK_ADR_REQ MAC commands even though I have ADR turned off. I recently started testing a tracker again that I had not used in 2 months. I turned ADR OFF and have CF List ON. I should not be seeing any LINK_ADR_REQ MAC command frames at all!

Oliv4945 commented 3 years ago

Hi,

This behavior seems to be related to ADR:

  1. Patrick's device has ADR deactivated (FCtrl ADR bit = 0): NS still tries to downlink a LINK_ADR_REQ
  2. The devices drops the request as ADR is deactivated
  3. The NS is unhappy to not receive the ACK, so tries again
  4. Loop on 2->3
  5. I remotely disabled ADR on the device
  6. Device ACK the LINK_ADR_REQ with a LINK_ADR_ANS
  7. NS is now happy and does not send any new LINK_ADR_REQ

Prior to LoRaWAN v1.0.4, LinkADRReq behavior if device's ADR is disabled is not defined, so Helium NS should not rely on receiving LinkADRAns if ADR bit in FCtrl is disabled. For LoRaWAN v1.0.4, it is specified that the device shall answer even if ADR is disabled

I wonder if it is related to 0c6e1bbda84a785d8a1d01e60e1c7cecaaf21645 where we can read:

They will still get channel mask updates in the first packet after a successful join. [...]

  • Make sure channel correction is coming down in first downlink
leroyle commented 3 years ago

I'm seeing interesting behavior, noticed today. After starting my device I am still getting the repeating LINK_ADR_REQ sent from the console for each uplink but am not seeing the MAC command processing at the device.

Unless: If I send a user initiated downlink message via the console device view, the LINK_ADR_REQ is delivered to the device, handled, (presumably ACK'd but I did not confirm) from then on I do not see "any" LINK_ADR_REQ downlink message following an uplink message, ( as seen within the console device debug view).

I guess, but have not confirmed, the MAC command rides along with the user initiated downlink message. Apparently the MAC command gets lost if there is not a user packet along with it ????

lthiery commented 3 years ago

@leroyle that's a very interesting insight. A LoRaWAN frame has two areas where a MAC Command might be and I believe our network server only uses the area that allows it to piggy-back with a downlink (FHDR). When there's no application payload, the MAC command is allowed to be in the area where the application payload normally is. My best guess, is that your device stack really wants it to be there and not in the FHDR when there's no application payload. Unsure if it's technically out of spec, I can review that later.

leroyle commented 3 years ago

I am using the Wisblock LoRaWan implementation, while there have been some updates since before all this I really doubt it would change to that extent.

I would expect my radio layer to at least try to process the message regardless of where the MAC command might be, I'm getting receive timeouts where I would expect the MAC downlink to be processed, rather than trying to process anything received.

This was all working fine up to about the time scaling issue was mentioned, don't know if that related or not.

Perhaps the MAC command is being lost when no user payload is present??

macpie commented 3 years ago

The team has identified the issue and is working on a fix.

macpie commented 3 years ago

This should be fix by https://github.com/helium/router/pull/446 and has been merged into master so that it will make it on next release

hkicko commented 2 years ago

Issue seems to be back image

macpie commented 2 years ago

This is not release yet on Production, it will be on test for a week.

Oliv4945 commented 2 years ago

Thanks @macpie . Could you please confirm if it is related to dowlinks sent too late to the gateway or I should open a new issue?

JSON up: {"rxpk":[{"jver":1,"tmst":70374622,"chan":3,"rfch":0,"freq":904.500000,"mid": 8,"stat":1,"modu":"LORA","datr":"SF10BW125","codr":"4/5","rssis":-77,"lsnr":12.5,"foff":-1354,"rssi":-77,"size":17,"data":"QC4AAEiAPgACjDd8uhRABIw="}]}

JSON down: {"txpk":{"imme":false,"tmst":71374622,"freq":925.0999755859375,"rfch":0,"powe":27,"modu":"LORA","datr":"SF10BW500","codr":"4/5","ipol":true,"prea":null,"size":22,"data":"YC4AAEgKKgADNAAAcAM0AP8A+H8fww=="}}
src/jitqueue.c:249:jit_enqueue(): ERROR: Packet REJECTED, timestamp seems wrong, too much in advance (current=73129156, packet=71374622, type=0)
ERROR: Packet REJECTED (jit error=2)
hkicko commented 2 years ago

Must have been different issue then, because today it is gone - no extra Downlinks.

thedyerman commented 2 years ago

I tried to confirm your findings yesterday but was not receiving the same constant downlink as before. I checked again today and all is good. Must have been a node with old code somewhere ¯_(ツ)_/¯