ttn-zh / ic880a-gateway

Reference setup for iC880a gateways running The Things Network
GNU General Public License v3.0
456 stars 210 forks source link

Fix long running shell process #52

Closed 1082008 closed 5 years ago

1082008 commented 5 years ago

The shell spawned for start.sh is not required to run after the packet forwared started running. This patch saves the resources consumed by this extra shell process.

gonzalocasas commented 5 years ago

Thanks @Kodest!

But I have a question: do you know how this impacts the systemd service restart behavior?

The TTN Gateway systemd service is defined as start.sh with Restart=on-failure; and I have the feeling this change will break that since now the start.sh process will be terminated immediately and then, if poly_pkt_fwd fail, it will just die alone.

1082008 commented 5 years ago

Without my change there are two processes: start.sh and poly_pkt_fwd. systemd should handle them together and I think it does that with process groups.

With my change there will be only one process, poly_pkt_fwd so there is no ambiguity from now on, systemd will have to handle only this single process.

I have just tested restart behavior:

pi@ttngw-bp16-szenas:~ $ sudo systemctl status ttn-gateway
● ttn-gateway.service - The Things Network Gateway
   Loaded: loaded (/lib/systemd/system/ttn-gateway.service; enabled; vendor preset: enabled)
   Active: active (running) since Sun 2019-05-05 18:18:42 CEST; 17h ago
 Main PID: 276 (poly_pkt_fwd)
   CGroup: /system.slice/ttn-gateway.service
           └─276 ./poly_pkt_fwd

May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: ##### END #####
May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: INFO: [up] PUSH_ACK for server router.eu.thethings.network received in 43 ms
May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: WARNING: [gps] GPS out of sync, keeping previous time reference
May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: INFO: [up] PUSH_ACK for server router.eu.thethings.network received in 47 ms
May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: INFO: [down] for server router.eu.thethings.network PULL_ACK received in 45 ms
May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: WARNING: [gps] GPS out of sync, keeping previous time reference
May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: WARNING: [gps] GPS out of sync, keeping previous time reference
May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: ##### 2019-05-06 10:17:08 GMT #####
May 06 12:17:08 ttngw-bp16-szenas ttn-gateway[276]: ### [UPSTREAM] ###
pi@ttngw-bp16-szenas:~ $ ps axu | grep poly
root       276  1.4  0.3  37340  1772 ?        Ssl  May05  15:07 ./poly_pkt_fwd
pi        3573  0.0  0.4   4336  1944 pts/1    S+   12:18   0:00 grep --color=auto poly
pi@ttngw-bp16-szenas:~ $ sudo systemctl restart ttn-gateway
pi@ttngw-bp16-szenas:~ $ ps axu | grep poly
root      3583  2.3  0.3   2508  1748 ?        Ss   12:18   0:00 ./poly_pkt_fwd
pi        3607  0.0  0.4   4336  1876 pts/1    S+   12:18   0:00 grep --color=auto poly
pi@ttngw-bp16-szenas:~ $ sudo systemctl status ttn-gateway
● ttn-gateway.service - The Things Network Gateway
   Loaded: loaded (/lib/systemd/system/ttn-gateway.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2019-05-06 12:18:54 CEST; 17s ago
 Main PID: 3583 (poly_pkt_fwd)
   CGroup: /system.slice/ttn-gateway.service
           └─3583 ./poly_pkt_fwd

May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: redefined parameters will overwrite global parameters
May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: local_conf.json does not contain a JSON object named SX1301_conf
May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: local_conf.json does contain a JSON object named gateway_conf, parsing gateway parameters
May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: gateway MAC address is configured to B827EBFFFE133337
May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: Found 1 servers in array.
May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: Server 0 configured to "router.eu.thethings.network", with port up "1700" and port down "1700"
May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: packets received with a valid CRC will be forwarded
May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: packets received with a CRC error will NOT be forwarded
May 06 12:18:54 ttngw-bp16-szenas ttn-gateway[3583]: INFO: packets received with no CRC will NOT be forwarded

You can see that the PID has changed and ps shows that the old process is not there.

gonzalocasas commented 5 years ago

Ah, that is nice 👍
The restart I meant is "on failure", i.e when the poly_pkt_fwd dies, not when the service is restarted manually. Could you please do one test where the poly_pkt_fwd fails to start, for instance, if you boot without the concentrator board, and try to start, it should continuously re-try to start the service.

1082008 commented 5 years ago

I am not there at the moment, the only error I could inject is killing the lora_pkt_fwd process. Unfortunately systemd hasn't started it again.

The same happens if I revert my change though.

The service returns success exit code both ways so Restart=on-failure is insufficient to actually keep the service alive. I think we should use Restart=always instead of Restart=on-failure in /lib/systemd/system/ttn-gateway.service.

1082008 commented 5 years ago

Do you think I should set Restart=always in this pull request?

1082008 commented 5 years ago

I issued a dedicated preq for Restart=always, see #53.