Closed TimJones closed 3 years ago
I believe the problem is that with docker /system
volume survives a "reboot" (container restart) and apid can't bind anymore.
Just for those that might need it before the fix is released, a work-around is:
# rm $(docker inspect talos-docker-cluster-restart-test-master-1 --format '{{ range .Mounts }}{{ if eq .Destination "/system" }}{{ .Source }}{{ end }}{{ end }}')/run/apid/{apid,runtime}.sock
Remember to change the talos-docker-cluster-restart-test-master-1
container name for the container(s) running talos you want/need to restart.
Bug Report
Description
When creating a local talos cluster with docker, a restart of the host or containers causes the containers not to be able to start.
Logs
Create cluster:
Everything working & correct:
Restart the container
Note that we never reach
boot sequence: done
status after stop/startAfter timeout (approx. 15mins) cluster no longer viable
Full logs of the container after restart
It looks as if
apid
is failing to start because the socket wasn't cleaned up on stop.Environment
v0.13.0