sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
736 stars 1.42k forks source link

Docker in docker builds do not work in build container (Arch linux host) #9919

Open jboomer opened 2 years ago

jboomer commented 2 years ago

Description

Can not build target/sonic-barefoot.bin , the build fails at the docker-start step with error : "Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"

If I enter the build container with make sonic-slave-bash, and try to start docker with sudo dockerd --experimental --storage-driver=vfs -D , I see the following errors:

WARN[2022-02-10T14:08:08.808357967Z] Your kernel does not support cgroup memory limit
WARN[2022-02-10T14:08:08.808371693Z] Unable to find cpu cgroup in mounts
WARN[2022-02-10T14:08:08.808380990Z] Unable to find blkio cgroup in mounts
WARN[2022-02-10T14:08:08.808389807Z] Unable to find cpuset cgroup in mounts
WARN[2022-02-10T14:08:08.808422528Z] mountpoint for pids not found
...
Error starting daemon: Devices cgroup isn't mounted

This goes for both 'stretch' and 'buster' build containers. Using 'bullseye' I can start docker in the build container however.

The host in this case is Arch, with docker version "20.10.12, build e91ed5707e".

Steps to reproduce the issue:

  1. make init
  2. make configure PLATFORM=barefoot
  3. make target/sonic-barefoot.bin

Describe the results you received:

+++ --- Making target/sonic-barefoot.bin --- +++
EXTRA_DOCKER_TARGETS=sonic-barefoot.bin BLDENV=buster make -f Makefile.work buster
make[1]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
make[1]: Entering directory '/mnt/ssd1/home/jasper/git/sonic-buildimage'
~/git/sonic-buildimage/src/sonic-build-hooks ~/git/sonic-buildimage
make[2]: Entering directory '/mnt/ssd1/home/jasper/git/sonic-buildimage/src/sonic-build-hooks'
SSL_INIT
SSL_INIT
dpkg-deb: building package 'sonic-build-hooks' in 'buildinfo/sonic-build-hooks_1.0_all.deb'.
make[2]: Leaving directory '/mnt/ssd1/home/jasper/git/sonic-buildimage/src/sonic-build-hooks'
~/git/sonic-buildimage
SONiC Build System

Build Configuration
"CONFIGURED_PLATFORM"             : "barefoot"
"CONFIGURED_ARCH"                 : "amd64"
"SONIC_CONFIG_PRINT_DEPENDENCIES" : ""
"SONIC_BUILD_JOBS"                : "1"
"SONIC_CONFIG_MAKE_JOBS"          : "16"
"SONIC_USE_DOCKER_BUILDKIT"       : ""
"USERNAME"                        : "admin"
"PASSWORD"                        : "YourPaSsWoRd"
"ENABLE_DHCP_GRAPH_SERVICE"       : ""
"SHUTDOWN_BGP_ON_START"           : ""
"ENABLE_PFCWD_ON_START"           : ""
"SONIC_BUFFER_MODEL"              : ""
"INSTALL_DEBUG_TOOLS"             : ""
"ROUTING_STACK"                   : "frr"
"FRR_USER_UID"                    : "300"
"FRR_USER_GID"                    : "300"
"ENABLE_SYNCD_RPC"                : ""
"ENABLE_ORGANIZATION_EXTENSIONS"  : "y"
"HTTP_PROXY"                      : ""
"HTTPS_PROXY"                     : ""
"NO_PROXY"                        : ""
"ENABLE_ZTP"                      : ""
"INCLUDE_PDE"                     : ""
"SONIC_DEBUGGING_ON"              : ""
"SONIC_PROFILING_ON"              : ""
"KERNEL_PROCURE_METHOD"           : "build"
"BUILD_TIMESTAMP"                 : "20220204.161539"
"BUILD_LOG_TIMESTAMP"             : "none"
"SONIC_IMAGE_VERSION"             : "master.0-dirty-20220204.161539"
"BLDENV"                          : "buster"
"VS_PREPARE_MEM"                  : "yes"
"INCLUDE_MGMT_FRAMEWORK"          : "y"
"INCLUDE_ICCPD"                   : "n"
"INCLUDE_SYSTEM_TELEMETRY"        : "y"
"ENABLE_HOST_SERVICE_ON_START"    : "n"
"INCLUDE_RESTAPI"                 : "n"
"INCLUDE_SFLOW"                   : "y"
"INCLUDE_NAT"                     : "y"
"INCLUDE_DHCP_RELAY"              : "y"
"INCLUDE_P4RT"                    : "y"
"INCLUDE_KUBERNETES"              : "n"
"INCLUDE_MACSEC"                  : "y"
"INCLUDE_MUX"                     : "y"
"TELEMETRY_WRITABLE"              : ""
"ENABLE_AUTO_TECH_SUPPORT"        : "y"
"PDDF_SUPPORT"                    : "y"
"MULTIARCH_QEMU_ENVIRON"          : "n"
"SONIC_VERSION_CONTROL_COMPONENTS": "none"

"SONIC_DPKG_CACHE_METHOD"         : "none"

slave.mk:785: target 'target/docker-syncd-bfn.gz' given more than once in the same rule
slave.mk:929: target 'target/docker-syncd-bfn.gz-load' given more than once in the same rule
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

... repeated 60 times

make: *** [slave.mk:708: docker-start] Error 1
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.
make[1]: *** [Makefile.work:311: buster] Error 2
make[1]: Leaving directory '/mnt/ssd1/home/jasper/git/sonic-buildimage'
make: *** [Makefile:31: target/sonic-barefoot.bin] Error 2

Describe the results you expected:

Image is built

Output of show version:

N.A.

Output of show techsupport:

N.A.

Additional information you deem important (e.g. issue happens only occasionally):

jboomer commented 2 years ago

Additional info : I've also tried with the option SONIC_CONFIG_USE_NATIVE_DOCKERD_FOR_BUILD set to y. This will pass through the /var/run/docker.sock to the container. However, this also doesn't work because the docker GID on the host is not the same as the docker GID in the container (999).

jboomer commented 2 years ago

So I've think I finally found the root of the problem : arch mounts the cgroups as cgroup v2, while in Ubuntu 20.04 it is still mounted as cgroup v1. This causes problems with the docker version in stretch and buster, but is supported in the docker supplied with bullseye.

The above workaround with SONIC_CONFIG_USE_NATIVE_DOCKERD_FOR_BUILD set to y works for me if I pass uid and gid of docker by adding:

SONIC_BUILDER_EXTRA_CMDLINE="-u $(id -u):$(getent group docker | cut -d: -f3)"

However, in this case it still fails when creating the final installer image, as a chroot inside the buster environment is done and docker commands are called there.

So the only real way to make it work is to make sure the host mounts the cgroupfs as cgroup v1 by passing the kernel parameter systemd.unified_cgroup_hierarchy=0 in the host. I don't know which other distros use cgroup v2 by default but I think they will have the same problem.

zhangyanzhao commented 2 years ago

@jboomer looks like you are doing the investigation pretty well, any help do you need?

bluecmd commented 2 years ago

https://github.com/Azure/sonic-buildimage/issues/7354 might be related?

jboomer commented 2 years ago

@bluecmd Yes that seems to be the same thing, debian bullseye also mounts as v2 by default.

@zhangyanzhao For me the issue is solved, but maybe this info should be included in a readme? Also while the build works on Ubuntu 20.04, starting from 21.10 the build will fail in the same way on Ubuntu I think.

ashwin-h commented 1 year ago

So I've think I finally found the root of the problem : arch mounts the cgroups as cgroup v2, while in Ubuntu 20.04 it is still mounted as cgroup v1. This causes problems with the docker version in stretch and buster, but is supported in the docker supplied with bullseye.

The above workaround with SONIC_CONFIG_USE_NATIVE_DOCKERD_FOR_BUILD set to y works for me if I pass uid and gid of docker by adding:

SONIC_BUILDER_EXTRA_CMDLINE="-u $(id -u):$(getent group docker | cut -d: -f3)"

However, in this case it still fails when creating the final installer image, as a chroot inside the buster environment is done and docker commands are called there.

So the only real way to make it work is to make sure the host mounts the cgroupfs as cgroup v1 by passing the kernel parameter systemd.unified_cgroup_hierarchy=0 in the host. I don't know which other distros use cgroup v2 by default but I think they will have the same problem.

i am facing the same issue.. After adding "systemd.unified_cgroup_hierarchy=0" to kernel command line, build is successful.

whitej6 commented 5 months ago

Easy work around is to update the /etc/docker/daemon.json to include this key. Resolves most issues when running docker in docker {"default-cgroupns-mode": "host"}