moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.85k stars 18.67k forks source link

[BUG] ALL containers which use NFS volumes do not start after reboot #47153

Open the-hotmann opened 10 months ago

the-hotmann commented 10 months ago

Description

This is the ported Bug-Report from: https://github.com/docker/compose/issues/11354

I am currently facing an issue with my Intel NUC running Debian SID that has persisted for about a year. Despite trying various solutions, I have been unable to resolve it satisfactorily.

My configuration is as follows:

NUC ==(NFS - docker-compose volume)==> SYNO

I run numerous containers within my docker-compose stack, all of which are set to restart with the restart unless-stopped policy. However, upon system reboot, all containers with NFS volumes mapped fail to start automatically. They remain inactive unless manually initiated. Interestingly, initiating a manual start or restart at any other time works seamlessly, and everything functions as expected.

I anticipate that all containers should initiate during a system reboot, and suspect there may be an underlying hidden race condition that eludes my detection.

Also it seems this is the very same issue as the issues mentioned here:

  1. LINK1
  2. LINK2
  3. LINK3
  4. LINK4

Reproduce

  1. use docker-compose (or docker cli as proven in the old Bug-Report https://github.com/docker/compose/issues/11354#issuecomment-1902235822)
  2. configure any container
  3. configure a NFS Volume like this:

    volumes:
    
    share:
    name: share
    driver_opts:
      type: "nfs"
      o: "addr=192.168.178.2,nfsvers=4"
      device: ":/volume1/NFS_SHARE/"
  4. use the named NFS-Volume in the configured container
  5. do the restart test (docker restart container_name) after restart (docker ps)
  6. do the reboot test (reboot) after reboots (docker ps)

Expected behavior

There shall not be any race-condition and all containers (also the ones having NFS-based volumes mounted to) shall restart.

docker version

Client: Docker Engine - Community
 Version:           25.0.0
 API version:       1.44
 Go version:        go1.21.6
 Git commit:        e758fe5
 Built:             Thu Jan 18 17:09:59 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          25.0.0
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.6
  Git commit:       615dfdf
  Built:            Thu Jan 18 17:09:59 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.27
  GitCommit:        a1496014c916f9e62104b33d1bb5bd03b0858e59
 runc:
  Version:          1.1.11
  GitCommit:        v1.1.11-0-g4bccb38
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
 Version:    25.0.0
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.12.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.24.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 5
  Running: 5
  Paused: 0
  Stopped: 0
 Images: 5
 Server Version: 25.0.0
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: a1496014c916f9e62104b33d1bb5bd03b0858e59
 runc version: v1.1.11-0-g4bccb38
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.6.11-amd64
 Operating System: Debian GNU/Linux trixie/sid
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 30.88GiB
 Name: h0tmann
 ID: 670157fc-30a9-4aa4-807c-4fa28aec7ec7
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional Info

  1. LINK1
  2. LINK2
  3. LINK3
  4. LINK4
thaJeztah commented 10 months ago

Are these NFS devices available before the docker service is started? I wonder if the systemd unit needs a custom "After" condition added πŸ€”

the-hotmann commented 10 months ago

Yes the SYNO runs 24/7 and is not rebooted at all.

Thats what I read somewhere else aswell. It waits for network, but somehow there is a race-condition when it comes to:

But in this regards, I am not an expert, so do not qoute me :)

thaJeztah commented 10 months ago

Hm, right, so looking at the default systemd unit for the docker service; https://github.com/moby/moby/blob/5a3a101af2ff6fae24605107b1fbcf53fbb5c38e/contrib/init/systemd/docker.service#L4-L5

It currently waits for;

The containerd service already has a local-fs.target to make sure local filesystems are mounted; https://github.com/containerd/containerd/blob/b66830ff6f8375ce8c7a583eaa03549eaa6707c4/containerd.service#L18

Which means that (because the docker service has After=containerd.service), local filesystems at least should be mounted.

I think what's needed in your setup is to have the remote-fs.target;

remote-fs.target

Similar to local-fs.target, but for remote mount points.

systemd automatically adds dependencies of type After= for this target unit to all SysV init script service units with an LSB header referring to the "$remote_fs" facility.

Given that remove filesystems are not something that's used by default by the Docker Engine, I don't think we should add this to the default systemd unit; doing so likely would delay startup of the service, which would be a regression for setups that don't use remove filesystems, but are running on a system that does have them (but perhaps can be discussed).

To add that target, you can use systemctl edit docker.service. This will create an override file that allows you to extend or override properties of the default systemd unit (Flatcar has a great page on describing this in more depth);

sudo systemctl edit docker.service

That command will create a systemd "override" (or "drop-in") file, and open it in your default editor. You can add your overrides in the file, and save it. By default, the After you specify in your override file is appended to the existing list of options in the After of the default systemd unit (which should not be edited).

## Editing /etc/systemd/system/docker.service.d/override.conf
### Anything between here and the comment below will become the new contents of the file
[Unit]
After=remote-fs.target

### Lines below this comment will be discarded

### /lib/systemd/system/docker.service
# [Unit]
# ...

After you edited and saved, you need to reload systemd to make it re-read the configuration;

sudo systemctl daemon-reload

You can check the new settings using systemctl show, which should now show the remote-fs.target included;

sudo systemctl show docker.service | grep ^After
After=containerd.service systemd-journald.socket docker.socket sysinit.target time-set.target network-online.target remote-fs.target system.slice firewalld.service basic.target
the-hotmann commented 10 months ago

Thanks for the detailed explanation.

I have followed your commands. Here is the check-command:

$ systemctl show docker.service | grep ^After
After=network-online.target basic.target sysinit.target firewalld.service docker.socket system.slice time-set.target containerd.service remote-fs.target systemd-journald.socket

I also reloaded the systemd daemon (this should automatically be done by reboot - but did it manually anyway) and rebooted the server.

Again all containers with NFS Volumes are down. They do not start up again.

Given that remove filesystems are not something that's used by default by the Docker Engine, I don't think we should add this to the default systemd unit; doing so likely would delay startup of the service, which would be a regression for setups that don't use remove filesystems, but are running on a system that does have them (but perhaps can be discussed).

Is there a possibility to add this just if NFS (any remoteFS) is getting used anywhere in any docker container?

But like mentioned above, this apparently did not fix the issue. Thanks for your help :)

thaJeztah commented 10 months ago

But like mentioned above, this apparently did not fix the issue.

😒 that's a shame; thanks for trying! I was hoping this would make sure that those remote filesystem mounts were up-and-running.

Possibly it requires a stronger dependency defined; more than After πŸ€”

Reading the documentation for After https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html#Before=

Note that those settings are independent of and orthogonal to the requirement dependencies as configured by Requires=, Wants=, Requisite=, or BindsTo=.

It is a common pattern to include a unit name in both the After= and Wants= options, in which case the unit listed will be started before the unit that is configured with these options.

Perhaps that second line applies here; might be worth trying if adding remote-fs.service to Wants helps πŸ€”

the-hotmann commented 10 months ago

I did the following:

  1. systemctl edit docker.service
  2. added remote-fs.service also to Wants:
### Editing /etc/systemd/system/docker.service.d/override.conf
### Anything between here and the comment below will become the contents of the drop-in file

[Unit]
After=remote-fs.target
Wants=remote-fs.target

### Edits below this comment will be discarded
  1. systemctl daemon-reload
  2. systemctl show docker.service | grep ^After:
    After=docker.socket network-online.target system.slice sysinit.target containerd.service systemd-journald.socket remote-fs.target basic.target firewalld.service time-set.target
  3. systemctl show docker.service | grep ^Wants
    Wants=network-online.target remote-fs.target containerd.service
  4. reboot

Still - the containers with NFS Volumes do not start automatically.

the-hotmann commented 10 months ago

@thaJeztah are there any news, or is there a specific user to tag on this one?

Thanks in advance! :)

vvoland commented 10 months ago

Is there any related error message in dockerd log?

sudo journalctl -e --no-pager  -u docker -g ' error while mounting volume '
# or if the above doesn't yield anything useful, try searching for the NFS server address you use
sudo journalctl -e --no-pager  -u docker -g '192.168.178.2'
the-hotmann commented 10 months ago

@vvoland thanks - I will reply, once I am at home and executed the commands.

the-hotmann commented 10 months ago

@vvoland thanks, the search for the IP itself returned something:

Jan 25 18:41:19 hostname dockerd[705]: time="2024-01-25T18:41:19.637798586+01:00" level=error msg="failed to start container" container=822210342a705a345accd6bfa16b69507b832bf01aec77ea4439f4b6d375c390 error="error while mounting volume '/var/lib/docker/volumes/share/_data': failed to mount local volume: mount :/volume1/NFS_SHARE/:/var/lib/docker/volumes/share/_data, data: addr=192.168.178.2,nfsvers=4,hard,timeo=600,retrans=3: network is unreachable"

basically network is unreachable, but yet it is a standard debian installation, following the official docs: https://docs.docker.com/engine/install/debian/

and this is my systemctl file:

# /usr/lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target docker.socket firewalld.service containerd.service time-set.target
Wants=network-online.target containerd.service
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutStartSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity

# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

# kill only the docker process, not all processes in the cgroup
KillMode=process
OOMScoreAdjust=-500

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/docker.service.d/override.conf
[Unit]
After=remote-fs.target
Wants=remote-fs.target

Hope this helps debugging it :)

the-hotmann commented 10 months ago

Just wanted to add:

the very same also happens when using SMB/CIFS mounts/volumes. I assume this applies to all network-based mounts/volumes.

I also confirmed the very same on another server. Just to make sure, it is not because of any special config on my end.

thaJeztah commented 10 months ago

Hm... so I just realised that my suggestion of using systemd for this would work if the host had NFS mounts for these filesystems, but if the host does not have those, systemd would not be aware of them, so won't take them into account.

Does this work if your host has a mount from these server(s)? (also see https://geraldonit.com/2023/02/25/auto-mount-nfs-share-using-systemd/)

In that case, it's also worth considering setting up the NFS mount on the host, and instead of using an NFS mount for the container;

    driver_opts:
      type: "nfs"
      o: "addr=192.168.178.2,nfsvers=4"
      device: ":/volume1/NFS_SHARE/"

To use a "bind" mount, using the NFS mounted path from the host. This could be a regular bind-mount, or a volume with the relevant mount-options set https://github.com/moby/moby/issues/19990#issuecomment-248955005, something like;

    driver_opts:
      type: "none"
      o: "bind"
      device: "/path/to/nfs-mount/on-host/"
the-hotmann commented 10 months ago

Does this work if your host has a mount from these server(s)?

Sorry I don't understand this question.

But here something I have tried before and it worked:

  1. setting up NFS-Mount to a volume /mnt/NFS_MOUNT/ (with /etc/fstab)
  2. mapping it into the container just like any other folder.

This works, but this is not what I desire, since I want the mount and the whole connection also be transferable via docker compose etc.

I feel like this volumes in docker do have a general/structural problem of not waiting for the mount to be active. Can btw anyone of you confirm AND replicate this bug on your side?

mblanco4x4 commented 10 months ago

Same problem here. In my case, Proxmox with a Debian VM (docker/portainer) connecting to a Synology NAS over NFSv4. After reboot, 8 of 30 containers fail to start and unsurprisingly they're all the ones with NFS mounts. The containers spin right up when I click Start in portainer though.

So far I've tried setting different restart: settings and depends-on:, neither of which is working. I really don't want to touch the host, much rather get it working in Docker alone.

Still scouring the internet for a solution :)

the-hotmann commented 9 months ago

@thaJeztah @vvoland are there any news on this, or is there something I can do to help?

the-hotmann commented 3 months ago

This problem still exists till this day.

vvoland commented 3 months ago

We're already depending on the network-online.target systemd target for the daemon to start.

If that doesn't work out of the box on your system, you might need to adjust your network-online conditions to suit your configuration: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

Note that:

network-online.target will ensure that all configured network devices are up and have an IP address assigned before the service is started. ... The right "wait" service must be enabled too (NetworkManager-wait-online.service if NetworkManager is used to configure the network, systemd-networkd-wait-online.service if systemd-networkd is used, etc.)

seamuslowry commented 3 months ago

I've also run into this issue. And am frustrated a restart of always will still not cause the container to attempt a restart after this failure. I gave up on expecting a solution and added a @reboot to crontab.

@reboot cd /path/to/directory/ && sleep 120 && docker compose up -d

I'm running into this with compose, but you can replace that with whatever command you need to bring your container(s) up after a reboot. It's not an elegant solution but it does the trick.

mrstux commented 3 months ago

I have this issue too. and have had it for some time.

the-hotmann commented 2 months ago

It's not an elegant solution

Yes - this is not elegant. For me this also is no solution. It is concealing the programm. I also fix the issue like this, but it is not solved. This happens on ANY standard debian installation and yet it is not fixed. Instead you are required to modify your system to support NFS-Shares out of the box. For me, deriving from a clean untouched system is not an option! I dont like having to modify much to get docker running properly.

Also I am astonished how few people seem to have this problem - maybe they just dont use nfs shares from within a compose, but work around this by using shares from the host, directly, which I dont like.

I dont know anymore exactly, but I gues with docker v18 this was no problem, now it is.. Also it seems no one that could be working on this can reproduce the issue (or they never tried themself).

akerouanton commented 2 months ago

@the-hotmann network is unreachable means your server has no route available to reach your NFS server.

If you take a look at the link @vvoland sent above, this is how network-online.target is described:

LSB init scripts know the $network facility. As this facility is defined only very unprecisely people tend to have different ideas what it is supposed to mean. Here are a couple of ideas people came up with so far: ... All these are valid approaches to the question "When is the network up?", but none of them would be useful to be good as generic default.

I bet systemd's definition of "When is the network up?" doesn't match your expectations here.

network-online.target ... its primary purpose is network client software that cannot operate without network.

That's your case. Your containers can't start if your server doesn't have proper connectivity (ie. your dockerd server can talk to your NFS server). The daemon won't dare to restart your containers because the issue happens before the containerized process starts, and that could be due to an invalid container spec. Containers' restart property is only about failures of containerized processes (ie. when we're sure the container spec is valid).

On that same docs page, if you look at the section Cut the crap! How do I make sure that my service starts after the network is really online?:

The right "wait" service must be enabled too (NetworkManager-wait-online.service if NetworkManager is used to configure the network, systemd-networkd-wait-online.service if systemd-networkd is used, etc.). ...

systemctl is-enabled NetworkManager-wait-online.service systemd-networkd-wait-online.service

Since you're running the Engine on a server, I guess you're using systemd-networkd. Did you check that /usr/lib/systemd/systemd-networkd-wait-online, the binary executed by systemd-networkd-wait-online.service, properly detects when the network is up? My hunch is that it's not the case.

I see this binary supports a few flags, like -4, etc...

$ /usr/lib/systemd/systemd-networkd-wait-online --help
...
  -4 --ipv4                 Requires at least one IPv4 address

Did you try to customize the definition of systemd-networkd-wait-online.service to add this flag?

Also it seems no one that could be working on this can reproduce the issue (or they never tried themself).

Yeah, we've many many things on our plates and that bug is a non-obvious one to reproduce. We're all pretty convinced the issue is not on our side, and is 'just' a matter of a few tweaks in systemd's service definitions. We're happy to provide help but we need you (and others facing the same issue) do the extra mile.

the-hotmann commented 2 months ago

@akerouanton

Thanks for you response here aswell :)

Just to clear things out upfront: I have fixed the problem (without changing anything docker-wise), but yet I guess it is a incompatibility from dockers side - let me elaborate :)

I had this setup:

I installed latest Debian (from HERE) then changed the repos to "sid/unstable" and updated everything. I always do this, I perfer Debian over everything. Will pretty sure never really use ubuntu or something else - I want a clean/minimal/slim host with all the other stuff dockerized.

Anyway - Debian (even these days comes preinstalled with "Networking" (with ifupdown) and most other Distros use NetPlan.io - Debian claims to do, but indeed does not (at least not the image I mentioned above).

My Installation was with the iso debian-12.6.0-amd64-netinst.iso. So not the currently very newest, but at that time. Against the claim from the above mentioned claim from debian of using netplan as default network manager, it does not use it out of the box, and even does not have it preinstalled!

It used ifupdown & Networking (config files are in /etc/network/). So I installed NetPlan.io and converted to it, the disabled the old networking service and not after a reboot everything is working fine :)

I will leave a snippet of code here, which I used to convert - so others have it more easy, as they might find this issue.

(PLEASE MAKE BACKUPS/SNAPSHOTS!!!)

apt update && apt install netplan.io systemd-resolved -y
systemctl unmask systemd-networkd.service
systemctl unmask systemd-resolved.service
ENABLE_TEST_COMMANDS=1 netplan migrate && chmod 600 /etc/netplan/*
netplan generate

Please fix "gateway4" deprecation and other warnings! and continue...

netplan --debug apply
systemctl enable --now systemd-networkd
systemctl disable --now networking
systemctl mask networking

now reboot - it now will take about 3s longer (at least for me) but everything should come back up again. Now proceed with:

apt purge ifupdown
rm -r /etc/network/
ln -sfn /run/systemd/resolve/resolv.conf /etc/resolv.conf

to delete the old files, which are not needed anymore. Now reboot again for good measurement and check if all containers are comming up by themselfs.

Besides the fact, that I had to change something NOT docker related to make it work, I guess this points out, that Docker might not be fully compatible with ifupdown or Networking. I also want to mention, that the binary /usr/lib/systemd/systemd-networkd-wait-online NEVER came back when before the migration to NetPlan.io, now it does immeidately - so maybe the problem could also originate from here.

Anyway, I want to thank you @akerouanton for your support and hints!

akerouanton commented 2 months ago

You welcome! πŸ™‚

FWIW, we've a section in the Troubleshoot doc page about network managers (networkd, NetworkManager and netplan) that could cause some docker networks to disappear. That's not related to this issue, but since you mentioned netplan... https://docs.docker.com/engine/daemon/troubleshoot/#docker-networks-disappearing

the-hotmann commented 1 month ago

Anyway - Debian (even these days comes preinstalled with "Networking" (with ifupdown) and most other Distros use NetPlan.io - Debian claims to do, but indeed does not (at least not the image I mentioned above).

@tillea, is there something I'm missing that's causing Netplan not to be installed as the default network manager, or am I using the wrong ISO? It would be great if you could shed some light on this :)

tillea commented 1 month ago

Anyway - Debian (even these days comes preinstalled with "Networking" (with ifupdown) and most other Distros use NetPlan.io - Debian claims to do, but indeed does not (at least not the image I mentioned above).

@tillea, is there something I'm missing that's causing Netplan not to be installed as the default network manager, or am I using the wrong ISO? It would be great if you could shed some light on this :)

Any reason you consider me competent to answer this question?

thaJeztah commented 1 month ago

Linking https://github.com/docker/for-linux/issues/293#issuecomment-2404723756, which is a similar topic, but related to filesystems being mounted;

Given containers may use /var/lib/docker/containers as well, and that directory structure might remain empty if storage is not populated, adding the below on top of ConditionDirectoryNotEmpty might resolve this issue.

[Unit]
RequiresMountsFor=/var/lib/docker
ConditionDirectoryNotEmpty=/var/lib/docker/containers /var/lib/docker/volumes/