Closed 1082008 closed 5 years ago
Thanks @Kodest!
But I have a question: do you know how this impacts the systemd service restart behavior?
The TTN Gateway systemd service is defined as start.sh
with Restart=on-failure
; and I have the feeling this change will break that since now the start.sh
process will be terminated immediately and then, if poly_pkt_fwd
fail, it will just die alone.
Without my change there are two processes: start.sh
and poly_pkt_fwd
. systemd
should handle them together and I think it does that with process groups.
With my change there will be only one process, poly_pkt_fwd
so there is no ambiguity from now on, systemd will have to handle only this single process.
I have just tested restart behavior:
pi@ttngw-bp16-szenas:~ $ sudo systemctl status ttn-gateway
● ttn-gateway.service - The Things Network Gateway
Loaded: loaded (/lib/systemd/system/ttn-gateway.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2019-05-05 18:18:42 CEST; 17h ago
Main PID: 276 (poly_pkt_fwd)
CGroup: /system.slice/ttn-gateway.service
└─276 ./poly_pkt_fwd
May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: ##### END #####
May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: INFO: [up] PUSH_ACK for server router.eu.thethings.network received in 43 ms
May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: WARNING: [gps] GPS out of sync, keeping previous time reference
May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: INFO: [up] PUSH_ACK for server router.eu.thethings.network received in 47 ms
May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: INFO: [down] for server router.eu.thethings.network PULL_ACK received in 45 ms
May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: WARNING: [gps] GPS out of sync, keeping previous time reference
May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: WARNING: [gps] GPS out of sync, keeping previous time reference
May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: ##### 2019-05-06 10:17:08 GMT #####
May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: ### [UPSTREAM] ###
pi@ttngw-bp16-szenas:~ $ ps axu | grep poly
root 276 1.4 0.3 37340 1772 ? Ssl May05 15:07 ./poly_pkt_fwd
pi 3573 0.0 0.4 4336 1944 pts/1 S+ 12:18 0:00 grep --color=auto poly
pi@ttngw-bp16-szenas:~ $ sudo systemctl restart ttn-gateway
pi@ttngw-bp16-szenas:~ $ ps axu | grep poly
root 3583 2.3 0.3 2508 1748 ? Ss 12:18 0:00 ./poly_pkt_fwd
pi 3607 0.0 0.4 4336 1876 pts/1 S+ 12:18 0:00 grep --color=auto poly
pi@ttngw-bp16-szenas:~ $ sudo systemctl status ttn-gateway
● ttn-gateway.service - The Things Network Gateway
Loaded: loaded (/lib/systemd/system/ttn-gateway.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2019-05-06 12:18:54 CEST; 17s ago
Main PID: 3583 (poly_pkt_fwd)
CGroup: /system.slice/ttn-gateway.service
└─3583 ./poly_pkt_fwd
May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: redefined parameters will overwrite global parameters
May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: local_conf.json does not contain a JSON object named SX1301_conf
May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: local_conf.json does contain a JSON object named gateway_conf, parsing gateway parameters
May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: gateway MAC address is configured to B827EBFFFE133337
May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: Found 1 servers in array.
May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: Server 0 configured to "router.eu.thethings.network", with port up "1700" and port down "1700"
May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: packets received with a valid CRC will be forwarded
May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: packets received with a CRC error will NOT be forwarded
May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: packets received with no CRC will NOT be forwarded
You can see that the PID has changed and ps
shows that the old process is not there.
Ah, that is nice 👍
The restart I meant is "on failure", i.e when the poly_pkt_fwd
dies, not when the service is restarted manually. Could you please do one test where the poly_pkt_fwd fails to start, for instance, if you boot without the concentrator board, and try to start, it should continuously re-try to start the service.
I am not there at the moment, the only error I could inject is killing the lora_pkt_fwd
process. Unfortunately systemd hasn't started it again.
The same happens if I revert my change though.
The service returns success exit code both ways so Restart=on-failure
is insufficient to actually keep the service alive. I think we should use Restart=always
instead of Restart=on-failure
in /lib/systemd/system/ttn-gateway.service
.
Do you think I should set Restart=always
in this pull request?
I issued a dedicated preq for Restart=always
, see #53.
The shell spawned for start.sh is not required to run after the packet forwared started running. This patch saves the resources consumed by this extra shell process.