xoseperez / basicstation-docker

Basics™ Station Packet Forward protocol using Docker
42 stars 9 forks source link

Basic Station kills itself after a container restart #13

Open gilleswaeber opened 10 months ago

gilleswaeber commented 10 months ago

Hi, I had an issue where the Basic Station process would die immediately after a container restart with the message

Killing process 29

This seem to be caused by killOldPid (https://github.com/lorabasics/basicstation/blob/master/src-linux/sys_linux.c#L366) which read the PID from a file (/var/temp/station.pid) and kills it if it's still running to avoid having multiple processes running at the same time.

Since the start.sh always does the same, the process will always get the same PID and there is no check for that in the code.

It might be better to fix it in the lorabasics repo somehow, but for now adding the following in the start script fixed it.

rm -r /var/tmp/ 2> /dev/null
xoseperez commented 10 months ago

Hi! Thank you for reporting the problem. I have not yet tested it, but have a doubt: even thou the PID of the process inside the container might be always the same... how does the PID file persist across reboots?

gilleswaeber commented 10 months ago

Thanks for your reply. For the PID, Linux assigns them in ascending order starting with 1 for the first process, so while the exact PID may vary between different configurations (the start.sh script is the only thing starting new processes), if the same number of processes are started before the station, the station will have the same PID. On another device, station has consistently the PID 30.

For the persisting data, this seems to be a weird thing of docker compose when restarting (afaict not in the doc, but it's mentioned e.g. here https://stackoverflow.com/questions/69369205/how-to-return-one-container-to-a-clean-state-in-docker-compose). Data seems to persist when doing a docker compose restart or when it restarts because of a failure, but not when doing down and up or after a full system reboot.

The easiest way I found to reproduce the issue is to observe the logs with docker logs basicstation --since 2m -f and then type docker exec basicstation pkill station in another tab (with restart: unless-stopped or always in docker-compose.yml).

xoseperez commented 10 months ago

I have tested the setup as you suggest and it worked as expected, no error after the reboot. Can it be specific to a certain docker version?

Also, I don't see how the issue you linked applies here, there is no persistent storage in the image except that defined on the docker compose file (if defined).

Anyway adding the line you suggest will not harm so I'm ok with it. But I fear the issue might not be due to something different...

gilleswaeber commented 10 months ago

That's weird. I checked the debian Dockerfiles for balena and it doesn't seem like they define a volume explicitely anywhere. It could be specific to the host OS then, or the docker version, or something else... but it would seem reasonable to assume that this is a docker issue, I'll see if I can make a simpler repro and submit a report there.

My setup, for the record: Debian 11 (buster), Docker 24.0 (tested with both a normal setup and rootless), armhf architecture (tested with a RPi 3B and a BeagleBone Green)