rancher / os

Tiny Linux distro that runs the entire OS as Docker containers
https://rancher.com/docs/os/v1.x/en/
Apache License 2.0
6.44k stars 655 forks source link

Improve zfs CLI response speed #2331

Open Negashev opened 6 years ago

Negashev commented 6 years ago

RancherOS Version: (ros os version) 1.3.0

Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.) (drilled in different virtual environments) proxmox (with ceph) and vsphere 5.0 (only SATA)

use 8 core and 8gb memory

configure docker with zfs https://rancher.com/docs/os/v1.2/en/storage/using-zfs/

docker it's very slow!!!

rancher@rancher:~$ time docker run -it --rm alpine hostname
6c002f5347a9

real    0m12.343s
user    0m0.068s
sys 0m0.052s

i try to create (pool raid0 with 3 disk)


Next, I ran it on my machine with ssd in the rancheros virtual machine

[rancher@node-1 ~]$ time docker run -it --rm alpine hostname
bcd821579152

real    0m6.787s
user    0m0.012s
sys 0m0.004s
niusmallnan commented 6 years ago

@kingsd041 When you have a chance, please help to confirm this issue.

Negashev commented 6 years ago

4-5 seconds on ssd with 6 cores and 24 gb (proxmox)

kingsd041 commented 6 years ago

@niusmallnan I reproduced this issue. I tested using different docker versions (17.09.1-ce, 17.12.1-ce) and different ros versions (1.1.0, 1.2.0, 1.3.0)

kingsd041 commented 6 years ago

This is indeed an issue, I use ZFS as the default storage in rancheros v1.4.1, the test results are as follows:

root@ip-172-31-3-37:~# ros -v
version v1.4.1 from os image rancher/os:v1.4.1

root@ip-172-31-3-37:~# docker -v
Docker version 18.03.1-ce, build 9ee9f40

root@ip-172-31-3-37:~# docker info| grep 'Storage Driver'
Storage Driver: zfs

root@ip-172-31-3-37:~# time docker run -it --rm alpine hostname
1fb430f27c25

real    0m4.285s
user    0m0.062s
sys 0m0.008s

Using zfs storage on ubuntu16.04 only requires 0.926s

root@ip-172-31-8-46:/etc/docker# cat /etc/issue
Ubuntu 16.04.5 LTS \n \l

root@ip-172-31-8-46:/etc/docker# docker -v
Docker version 18.03.1-ce, build 9ee9f40

root@ip-172-31-8-46:/etc/docker#  docker info| grep 'Storage Driver'
Storage Driver: zfs

root@ip-172-31-8-46:/etc/docker# time docker run -it --rm alpine hostname
d9e8f3c0b67a

real    0m0.926s
user    0m0.056s
sys 0m0.012s
pdschandler commented 5 years ago

Yeah... I just got bit by this. Been working on deploying a RancherOS server on bare metal and am getting basically what's described here. Two raidz2 vdev's of 4 SSD's each.

version v1.4.2 from os image rancher/os:v1.4.2 Docker version 18.03.1-ce, build 9ee9f40 Storage Driver: zfs time docker run -it --rm alpine hostname real 5m8.353s user 0m0.096s sys 0m0.013s

My pool has ashift=12 and I also set recordsize=32k for the docker dataset. Do y'all know what is responsible for this? Any fix on the horizon or workarounds I can try?

Please let me know if I can help with any testing, I've got a server here I can wipe and try whatever on, and am looking to implement this.

niusmallnan commented 5 years ago

This is not a performance issue with zfs itself.

When we use ros s up zfs, the zfs command is installed like this:

[root@ip-172-31-3-240 rancher]# which zfs
/sbin/zfs
[root@ip-172-31-3-240 rancher]# cat /sbin/zfs
#!/bin/sh

exec system-docker run --rm --privileged \
                --pid host \
                --net host \
                --ipc host \
        -v /mnt:/mnt:shared \
        -v /media:/media:shared \
        -v /dev:/host/dev \
        -v /run:/run \
        zfs-tools $(basename $0) "$@"

In fact it will run a container to run the zfs parameters.

If we run zfs list on RancherOS, it will be called more times than Ubuntu.

If we use zfs as the storage driver, the zfs command will be called when we run a container, so we will see Slow Docker on ZFS.

The performance of the zfs file system should have no effect.

pdschandler commented 5 years ago

Thanks, that's good news. This does affect the responsiveness of Docker administration tools though. Portainer, for example, takes forever on any page that needs to fetch Docker info, and unfortunately even times out sometimes :( (otherwise it'd be a little less of an issue)

I have Portainer installed in system-docker, and I mount the Unix socket of user docker, which produces these results. Granted, I have not gotten a chance to test the actual responsiveness of containers deployed in that user docker yet.

Even if container performance is unaffected, is there still a chance this could get fixed? Or, given what you've explained, would the ultimate solution be more complex?

niusmallnan commented 5 years ago

I am considering providing a pre-compiled zfs, which will be the same as other OS, but this does increase the maintenance difficulty.