threefoldtech / cloud-container

A builder for a simple initramfs image to run container over virtiofs inside cloud-hypervisor
Apache License 2.0
0 stars 1 forks source link

docker fail-over to VFS storage driver which has very bad performance. #11

Open sameh-farouk opened 2 years ago

sameh-farouk commented 2 years ago

the slowness of building docker images, pulling and creating containers on grid3 VM was recently noticed. debugging this I can see that docker fail-over to use the VFS storage driver which has very bad performance.

The VFS storage driver is not a union filesystem; instead, each layer is a directory on disk, and there is no copy-on-write support. To create a new layer, a “deep copy” is done of the previous layer. This leads to lower performance and more space used on disk than other storage drivers.

from docker daemon logs

ERRO[2022-01-13T23:48:11.926502868Z] failed to mount overlay: invalid argument     storage-driver=overlay2
ERRO[2022-01-13T23:48:11.932707818Z] exec: "fuse-overlayfs": executable file not found in $PATH  storage-driver=fuse-overlayfs
ERRO[2022-01-13T23:48:11.940858396Z] AUFS was not found in /proc/filesystems       storage-driver=aufs
ERRO[2022-01-13T23:48:11.960695378Z] failed to mount overlay: invalid argument     storage-driver=overlay
ERRO[2022-01-13T23:48:11.961941321Z] Udev sync is not supported. This will lead to data loss and unexpected behavior. Install a more recent version of libdevmapper or select a different storage driver. For more information, see https://docs.docker.com/engine/reference/commandline/dockerd/#storage-driver-options  storage-driver=devicemapper
WARN[2022-01-13T23:48:12.047938770Z] Your kernel does not support CPU realtime scheduler 
WARN[2022-01-13T23:48:12.048006987Z] Your kernel does not support cgroup blkio weight 
WARN[2022-01-13T23:48:12.048037515Z] Your kernel does not support cgroup blkio weight_device 
INFO[2022-01-13T23:48:12.048787188Z] Loading containers: start.                   
INFO[2022-01-13T23:48:15.279372876Z] Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address 
INFO[2022-01-13T23:48:16.814482532Z] Loading containers: done.                    
INFO[2022-01-13T23:48:16.948672799Z] Docker daemon                                 commit=459d0df graphdriver(s)=vfs version=20.10.12
root@VM2824e02d:/var/lib/docker# docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.7.1-docker)
  scan: Docker Scan (Docker Inc., v0.12.0)

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 8
 Server Version: 20.10.12
 Storage Driver: vfs
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.12.9
 Operating System: Ubuntu 20.04 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 7.766GiB
 Name: VM2824e02d
 ID: WIFL:MDKL:LTHT:2D4V:OAYW:2YKU:NIFO:G2YW:BEWT:7FSP:MJXS:VRY5
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
muhamadazmy commented 2 years ago

Can you try to install overlayfs since vfs was used as a fallback when other drivers were not found

muhamadazmy commented 2 years ago

The kernel might also missing the driver for overlay2

sameh-farouk commented 2 years ago

Can you try to install overlayfs since vfs was used as a fallback when other drivers were not found

Not sure how to install this. All I know is that it needs to be enabled in the kernel.

sameh-farouk commented 2 years ago

another update: I tried to mount a disk on /var/lib/docker/ instead of having a big root fs, with that Docker daemon errors are gone and now it will pick btrfs driver which has excellent performance.

the mounted disk meets the btrfs prerequisites for the docker btrfs storage driver.

  • btrfs requires a dedicated block storage device such as a physical disk. This block device must be formatted for Btrfs and mounted into /var/lib/docker/.
  • btrfs support must exist in your kernel.

I checked the types of mounted filesystems. while the mounted disk is btrfs, the root fs is virtiofs, so maybe it has compatibility issues with the overlayfs?

root@VM06866a79:~/docker-builder-flist# df -Th
Filesystem     Type      Size  Used Avail Use% Mounted on
dev            devtmpfs  3.9G     0  3.9G   0% /dev
run            tmpfs     3.9G   12K  3.9G   1% /run
tmpfs          tmpfs     3.9G     0  3.9G   0% /dev/shm
/dev/root      virtiofs  477G   91G  386G  19% /
/dev/vda       btrfs      50G  2.3G   46G   5% /var/lib/docker
dockerd: time="2022-01-14T16:25:48.776617548Z" level=info msg="Docker daemon" commit=459d0df graphdriver(s)=btrfs version=20.10.12
root@VM06866a79:~/docker-builder-flist# docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.7.1-docker)
  scan: Docker Scan (Docker Inc., v0.12.0)

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 21
 Server Version: 20.10.12
 Storage Driver: btrfs
  Build Version: Btrfs v5.4.1 
  Library Version: 102
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.12.9
 Operating System: Ubuntu 20.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 7.766GiB
 Name: VM06866a79
 ID: ZYJH:FU2Q:BCST:P5RC:J7CW:3HTG:KHGT:VZQF:YKFW:CDVY:EIZQ:EFH5
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: abouelsaad
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

if the issue between the docker engine and the virtiofs isn't just a missing module on the kernel that we can add, then it should be a known limitation for now that docker performance on grid3 VMs without a disk mounted is a really very bad.

OmarElawady commented 2 years ago

Trying with taiga with a vm flow, container flow with a disk, and container flow with rootfs. The time is measured between container start and a fixed log indicating that taiga is started.

vm: 1:43 container with disk: 2:21 container with rootfs: 16:57

TODO: proper io testing to pinpoint the issue (docker driver or writes in general are slow).

sameh-farouk commented 2 years ago

did some tests to help, results show that writes, in general, are slow

Sequential reads - container - on rootfs (virtiofs) : 157MiB/s (165MB/s) Sequential writes - container - on rootfs (virtiofs) : 2313KiB/s (2369kB/s) Random reads - container - on rootfs (virtiofs) : 20.5MiB/s (21.4MB/s) Random writes - container - on rootfs (virtiofs) : 2271KiB/s (2325kB/s)

Sequential reads - container - on mounted disk (btrfs) : 105MiB/s (110MB/s) Sequential writes - container - on mounted disk (btrfs) : 11.2MiB/s (11.8MB/s) Random reads - container - on mounted disk (btrfs) : 18.4MiB/s (19.3MB/s) Random writes - container - on mounted disk (btrfs) : 4194KiB/s (4295kB/s)

read is a bit better on virtiofs than btrfs, but write speed, specially Sequential, was about 4~5x slower. so the docker vfs driver (although it could affect the performance) is not the main cause of the bad fs performance on the container.

i used fio to run these tests.

sameh-farouk commented 2 years ago

I think we should track two separated issues now

1- docker falling back to vfs driver, with virtiofs as a backing filesystem, instead of using overlay2 driver. this, based on the docs, is not generally ideal since vfs driver will use significantly more space and be slower at creating containers.

2- slow fs write performance on grid3 VM roofs compared to the write performance on mounted disks (Sequential writes are ~5x slower). this is indeed more important.

@muhamadazmy @OmarElawady should I open a new issue for no.2, and leave this one for the vfs driver as the title imply, instead of having a mixed git issue, or do you prefer to keep both tracked here as it is?