Closed tgruenert closed 4 months ago
test on shell inside container
timeout 2 sleep 3
gives also a zombie
just as a further observation: this happens not only on large datasets. a question that came up: what part of process get a timeout? is our backup still complete?
Timeout is used for pruning: https://github.com/evermind/docker-restic-backupclient/blob/master/backup_client.py#L366
Can you check if the latest build (master) at https://github.com/micw/docker-restic-backupclient/pkgs/container/restic-backupclient solves the issue for you?
Independent from zombies - timeout should not occure there:
2024-05-13 22:31:55,752 INFO: Cleanup finished.
2024-05-13 22:31:55,752 INFO: Using extra config from /config/backup.yaml
2024-05-13 22:31:55,753 INFO: Initializing repository
2024-05-13 22:31:56,001 INFO: Repository was already initialized.
2024-05-13 22:31:56,001 INFO: Unlocking repository
2024-05-13 22:31:56,626 INFO: Pruning repository (timeout 12h)
loading indexes...
loading all snapshots...
finding data that is still in use for 16 snapshots
[0:00] 100.00% 16 / 16 snapshots
searching used packs...
collecting packs for deletion and repacking
[0:00] 100.00% 1350 / 1350 packs processed
to repack: 769 blobs / 572.866 MiB
this removes: 694 blobs / 514.378 MiB
to delete: 799 blobs / 558.760 MiB
total prune: 1493 blobs / 1.048 GiB
remaining: 28525 blobs / 21.095 GiB
unused size after prune: 1.049 GiB (4.97% of remaining size)
repacking packs
[0:04] 100.00% 35 / 35 packs repacked
rebuilding index
[0:01] 100.00% 1287 / 1287 packs processed
deleting obsolete index files
[0:00] 100.00% 3 / 3 files deleted
removing 68 old packs
[0:03] 100.00% 68 / 68 files deleted
done
2024-05-13 22:32:09,407 INFO: Prune finished.
2024-05-13 22:32:09,407 INFO: Scheduling next backup at 2024-05-14 22:00:00
and
/usr/bin# printenv | grep TIMEOUT
RESTIC_PRUNE_TIMEOUT=12h
Correct. If a timeout occurs, you'll see "Terminated" in the logs
testing your solution was successful. no more zombie after backup. would you give me an PR please?
thank you! pr is merged.
After some days of working (huge dataset to backup) there are zombie processes.
ps faxu
in Container:ps faxu
at hostI have absolut no idea what happen there and how to solve this. Anybody else?