kersing / packet_forwarder

Multi protocol packet forwarder supporting the TTN gateway-connector protocol.
Other
86 stars 56 forks source link

Reconnection issue #33

Open etix opened 4 years ago

etix commented 4 years ago

Hi,

My gateway (raspberry pi zero + rak831) seems to have issues with reconnections to the TTN bridge.

At first, everything works perfectly well for weeks usually.

Aug 26 19:59:10 loragw loragw[468]: ### [ CONNECTIONS ] ###
Aug 26 19:59:10 loragw loragw[468]: # bridge.eu.thethings.network: Connected
Aug 26 19:59:10 loragw loragw[468]: # Semtech status report sent.
Aug 26 19:59:10 loragw loragw[468]: ##### END #####

Then suddenly, the connection breaks for some reason:

Aug 26 19:59:10 loragw loragw[468]: 19:59:10  INFO: [TTN] bridge.eu.thethings.network RTT 37
Aug 26 19:59:12 loragw loragw[468]: 19:59:12  ERROR: [TTN] send status failed for bridge.eu.thethings.network
Aug 26 19:59:12 loragw loragw[468]: 19:59:12  INFO: [TTN] Reconnecting bridge.eu.thethings.network
Aug 26 19:59:12 loragw loragw[468]: 19:59:12  ERROR: [up] TTN lora send to server "bridge.eu.thethings.network" failed
Aug 26 19:59:12 loragw loragw[468]: 19:59:12  INFO: [TTN] reconnect called while reconnecting
Aug 26 19:59:12 loragw loragw[468]: 19:59:12  INFO: [TTN] server "bridge.eu.thethings.network" connected
Aug 26 19:59:12 loragw loragw[468]: 19:59:12  INFO: [up] TTN lora packet sent to server "bridge.eu.thethings.network"
Aug 26 19:59:12 loragw loragw[468]: 19:59:12  INFO: [up] TTN lora packet sent to server "bridge.eu.thethings.network"
Aug 26 19:59:14 loragw loragw[468]: 19:59:14  ERROR: [up] TTN lora send to server "bridge.eu.thethings.network" failed
Aug 26 19:59:14 loragw loragw[468]: 19:59:14  INFO: [TTN] Reconnecting bridge.eu.thethings.network

From now on, the gateway is stuck and won't send any more messages to TTN until I restart the packet forwarder. The connection status is always "Connecting" and there's no other log related to the connection again. Only the usual reports are printed.

Aug 26 20:00:09 loragw loragw[468]: ### [ CONNECTIONS ] ###
Aug 26 20:00:09 loragw loragw[468]: # bridge.eu.thethings.network: Connecting
Aug 26 20:00:09 loragw loragw[468]: # Semtech status report sent.
Aug 26 20:00:09 loragw loragw[468]: ##### END #####

It feels like there's a deadlock somewhere in the reconnection process but so far I was unable to pinpoint it.

etix commented 4 years ago

I should also mention that during these events sending SIGTERM doesn't exit the process, the only option is to send a SIGKILL.

etix commented 4 years ago

Happened again today.

Aug 29 23:02:34 loragw loragw[17580]: ### [ CONNECTIONS ] ###
Aug 29 23:02:34 loragw loragw[17580]: # bridge.eu.thethings.network: Connected
Aug 29 23:02:34 loragw loragw[17580]: # Semtech status report sent.
Aug 29 23:02:34 loragw loragw[17580]: ##### END #####
Aug 29 23:02:34 loragw loragw[17580]: 23:02:34  INFO: [TTN] bridge.eu.thethings.network RTT 28
Aug 29 23:02:36 loragw loragw[17580]: 23:02:36  ERROR: [TTN] send status failed for bridge.eu.thethings.network
Aug 29 23:02:36 loragw loragw[17580]: 23:02:36  INFO: [TTN] Reconnecting bridge.eu.thethings.network
Aug 29 23:02:36 loragw loragw[17580]: 23:02:36  ERROR: [up] TTN lora send to server "bridge.eu.thethings.network" failed
Aug 29 23:02:36 loragw loragw[17580]: 23:02:36  INFO: [TTN] reconnect called while reconnecting
Aug 29 23:02:36 loragw loragw[17580]: 23:02:36  INFO: [TTN] server "bridge.eu.thethings.network" connected
Aug 29 23:02:36 loragw loragw[17580]: 23:02:36  INFO: [up] TTN lora packet sent to server "bridge.eu.thethings.network"
Aug 29 23:02:36 loragw loragw[17580]: 23:02:36  INFO: [up] TTN lora packet sent to server "bridge.eu.thethings.network"
Aug 29 23:02:36 loragw loragw[17580]: 23:02:36  INFO: [up] TTN lora packet sent to server "bridge.eu.thethings.network"
Aug 29 23:02:36 loragw loragw[17580]: 23:02:36  INFO: [up] TTN lora packet sent to server "bridge.eu.thethings.network"
Aug 29 23:02:36 loragw loragw[17580]: 23:02:36  INFO: [up] TTN lora packet sent to server "bridge.eu.thethings.network"
Aug 29 23:02:37 loragw loragw[17580]: 23:02:37  INFO: [up] TTN lora packet sent to server "bridge.eu.thethings.network"
Aug 29 23:02:38 loragw loragw[17580]: 23:02:38  ERROR: [up] TTN lora send to server "bridge.eu.thethings.network" failed
Aug 29 23:02:38 loragw loragw[17580]: 23:02:38  INFO: [TTN] Reconnecting bridge.eu.thethings.network
Aug 29 23:02:59 loragw loragw[17580]: 23:02:59  INFO: Disabling GPS mode for concentrator's counter...
Aug 29 23:02:59 loragw loragw[17580]: 23:02:59  INFO: host/sx1301 time offset=(1598733930s:161786µs) - drift=-13µs
Aug 29 23:02:59 loragw loragw[17580]: 23:02:59  INFO: Enabling GPS mode for concentrator's counter.
Aug 29 23:03:59 loragw loragw[17580]: ##### 2020-08-29 21:03:08 GMT #####
Aug 29 23:03:59 loragw loragw[17580]: ### [UPSTREAM] ###
Aug 29 23:03:59 loragw loragw[17580]: # RF packets received by concentrator: 92
Aug 29 23:03:59 loragw loragw[17580]: # CRC_OK: 80.43%, CRC_FAIL: 19.57%, NO_CRC: 0.00%
Aug 29 23:03:59 loragw loragw[17580]: # RF packets forwarded: 74 (1255 bytes)
Aug 29 23:03:59 loragw loragw[17580]: # PUSH_DATA datagrams sent: 0 (0 bytes)
Aug 29 23:03:59 loragw loragw[17580]: # PUSH_DATA acknowledged: 0.00%
Aug 29 23:03:59 loragw loragw[17580]: ### [DOWNSTREAM] ###
Aug 29 23:03:59 loragw loragw[17580]: # PULL_DATA sent: 0 (0.00% acknowledged)
Aug 29 23:03:59 loragw loragw[17580]: # PULL_RESP(onse) datagrams received: 0 (0 bytes)
Aug 29 23:03:59 loragw loragw[17580]: # RF packets sent to concentrator: 0 (0 bytes)
Aug 29 23:03:59 loragw loragw[17580]: # TX errors: 0
Aug 29 23:03:59 loragw loragw[17580]: ### BEACON IS DISABLED!
Aug 29 23:03:59 loragw loragw[17580]: ### [JIT] ###
Aug 29 23:03:59 loragw loragw[17580]: # INFO: JIT queue contains 0 packets.
Aug 29 23:03:59 loragw loragw[17580]: # INFO: JIT queue contains 0 beacons.
Aug 29 23:03:59 loragw loragw[17580]: ### [GPS] ###
Aug 29 23:03:59 loragw loragw[17580]: # No time keeping possible due to fake gps.
Aug 29 23:03:59 loragw loragw[17580]: # Manual GPS coordinates: latitude <redacted>, longitude <redacted>, altitude 66 m
Aug 29 23:03:59 loragw loragw[17580]: ### [PERFORMANCE] ###
Aug 29 23:03:59 loragw loragw[17580]: # Upstream radio packet quality: 80.43%.
Aug 29 23:03:59 loragw loragw[17580]: ### [ CONNECTIONS ] ###
Aug 29 23:03:59 loragw loragw[17580]: # bridge.eu.thethings.network: Connecting
Aug 29 23:03:59 loragw loragw[17580]: # Semtech status report sent.
Aug 29 23:03:59 loragw loragw[17580]: ##### END #####