kata-containers / runtime

Kata Containers version 1.x runtime (for version 2.x see https://github.com/kata-containers/kata-containers).
https://katacontainers.io/
Apache License 2.0
2.1k stars 375 forks source link

RFC: change 9pfs mount option to cache=mmap #769

Closed bergwolf closed 6 years ago

bergwolf commented 6 years ago

Summary

I have done a some compatibility tests for kata 9pfs and I think we should change 9pfs mount option from default cache=none to cache=mmap. It does not give worse POSIX compliance (compared to cache=none) but allows us to run dnf install and mariadb:latest images.

We still need the 9pfs open-delete-fstat workaround. Even though it gives worse fstest results but it works around some important common usages otherwise we'll fail both apt update and mariadb:latest.

setup docker/overlayfs 9pfs cache option guest kernel fstest failures apt update dnf install launch mariadb
runc N/A N/A 2/8789 Y Y Y
kata cache=none upstream 4.14.57 8/8789 N N N
kata cahce=mmap upstream 4.14.57 8/8789 N Y N
kata cache=none 9p patched 4.14.67.11 30/8789 Y N N
kata cache=mmap 9p patched 4.14.67.11 30/8789 Y Y Y

0. test setup

0.0 check storage driver

$docker info|grep Storage
Storage Driver: overlay2

0.1 clone pjdfstest: git clone git@github.com:pjd/pjdfstest.git

0.2 run docker: docker run -it -v /path/to/pjdfstest:/fstest ubuntu

0.3 run kata: docker run -it --runtime kata -v /path/to/pjdfstest:/fstest ubuntu

0.4 run apt update test: docker run --runtime kata -it --rm ubuntu apt update

0.5 run dnf install -y strace test: docker run --runtime kata -it --rm fedora dnf install -y strace

0.6 run mariadb test: docker run --runtime kata --rm -it --env MYSQL_ALLOW_EMPTY_PASSWORD=yes mariadb

1. docker with overlayfs

Test Summary Report
-------------------
/pjdfstest/tests/symlink/03.t        (Wstat: 0 Tests: 6 Failed: 2)
  Failed tests:  1-2
Files=232, Tests=8789, 200 wallclock secs ( 2.18 usr  0.82 sys + 41.07 cusr 15.38 csys = 59.45 CPU)
Result: FAIL

2. kata 9pfs(cache=none)/overlayfs/upstream-guest-kernel-4.14.29

Test Summary Report
-------------------
/fstest/tests/chmod/12.t          (Wstat: 0 Tests: 14 Failed: 6)
  Failed tests:  3-4, 7-8, 11-12
/fstest/tests/symlink/03.t        (Wstat: 0 Tests: 6 Failed: 2)
  Failed tests:  1-2
Files=232, Tests=8789, 730 wallclock secs ( 5.33 usr  2.94 sys + 23.28 cusr 179.46 csys = 211.01 CPU)

However, it fails apt update:

[macbeth@runtime]$docker run --rm -it --runtime kata ubuntu
root@65009609e06f:/# apt update
Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [83.2 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic InRelease [242 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Get:4 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
Reading package lists... Done
E: Unable to determine file size for fd 9 - fstat (2: No such file or directory)
E: The repository 'http://security.ubuntu.com/ubuntu bionic-security InRelease' provides only weak security information.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: Unable to determine file size for fd 9 - fstat (2: No such file or directory)
E: The repository 'http://archive.ubuntu.com/ubuntu bionic InRelease' provides only weak security information.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: Unable to determine file size for fd 9 - fstat (2: No such file or directory)
E: The repository 'http://archive.ubuntu.com/ubuntu bionic-updates InRelease' provides only weak security information.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: Unable to determine file size for fd 9 - fstat (2: No such file or directory)
E: The repository 'http://archive.ubuntu.com/ubuntu bionic-backports InRelease' provides only weak security information.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

And it fails to run dnf update

[macbeth@runtime]$docker run --runtime kata -it --rm fedora bash
[root@53cff1523281 /]# dnf install -y strace
Error: Failed to synchronize cache for repo 'updates'

And it fails to run mariadb:

[macbeth@runtime]$docker run --runtime kata -it --rm --env MYSQL_ALLOW_EMPTY_PASSWORD=yes mariadb
Initializing database
2018-09-21  8:44:56 0 [ERROR] Can't init tc log
2018-09-21  8:44:56 0 [ERROR] Aborting

3. kata 9pfs(cache=mmap)/overlayfs/upstream-guest-kernel-4.14.29

Test Summary Report
-------------------
/fstest/tests/chmod/12.t          (Wstat: 0 Tests: 14 Failed: 6)
  Failed tests:  3-4, 7-8, 11-12
/fstest/tests/symlink/03.t        (Wstat: 0 Tests: 6 Failed: 2)
  Failed tests:  1-2
Files=232, Tests=8789, 730 wallclock secs ( 5.33 usr  2.94 sys + 23.28 cusr 179.46 csys = 211.01 CPU)

However, it fails apt update:

[macbeth@runtime]$docker run --rm -it --runtime kata ubuntu
root@f4c771352dc9:/# apt update
Get:1 http://archive.ubuntu.com/ubuntu bionic InRelease [242 kB]
Get:2 http://security.ubuntu.com/ubuntu bionic-security InRelease [83.2 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Get:4 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
Reading package lists... Done
E: Unable to determine file size for fd 9 - fstat (2: No such file or directory)
E: The repository 'http://security.ubuntu.com/ubuntu bionic-security InRelease' provides only weak security information.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: Unable to determine file size for fd 9 - fstat (2: No such file or directory)
E: The repository 'http://archive.ubuntu.com/ubuntu bionic InRelease' provides only weak security information.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: Unable to determine file size for fd 9 - fstat (2: No such file or directory)
E: The repository 'http://archive.ubuntu.com/ubuntu bionic-updates InRelease' provides only weak security information.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: Unable to determine file size for fd 9 - fstat (2: No such file or directory)
E: The repository 'http://archive.ubuntu.com/ubuntu bionic-backports InRelease' provides only weak security information.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

And it fails to run mariadb:

2018-09-21  8:34:22 0 [Note] mysqld: ready for connections.
Version: '10.3.9-MariaDB-1:10.3.9+maria~bionic'  socket: '/var/run/mysqld/mysqld.sock'  port: 0  mariadb.org binary distribution
Warning: Unable to load '/usr/share/zoneinfo/leap-seconds.list' as time zone. Skipping it.
ERROR: Can't initialize batch_readline - may be the input source is a directory or a block device.

4. kata 9pfs(cache=none)/overlayfs/patched-9p

Test Summary Report
-------------------
/pjdfstest/tests/chmod/12.t          (Wstat: 0 Tests: 14 Failed: 6)
  Failed tests:  3-4, 7-8, 11-12
/pjdfstest/tests/link/00.t           (Wstat: 0 Tests: 202 Failed: 10)
  Failed tests:  135-136, 142-143, 149-150, 156-157, 163-164
/pjdfstest/tests/mkdir/00.t          (Wstat: 0 Tests: 36 Failed: 1)
  Failed test:  34
/pjdfstest/tests/mkfifo/00.t         (Wstat: 0 Tests: 36 Failed: 1)
  Failed test:  34
/pjdfstest/tests/mknod/00.t          (Wstat: 0 Tests: 36 Failed: 1)
  Failed test:  34
/pjdfstest/tests/mknod/11.t          (Wstat: 0 Tests: 28 Failed: 2)
  Failed tests:  13, 26
/pjdfstest/tests/open/00.t           (Wstat: 0 Tests: 47 Failed: 1)
  Failed test:  34
/pjdfstest/tests/symlink/03.t        (Wstat: 0 Tests: 6 Failed: 2)
  Failed tests:  1-2
Files=232, Tests=8789, 1130 wallclock secs ( 5.83 usr  3.71 sys + 28.47 cusr 197.50 csys = 235.51 CPU)
Result: FAIL

And if fails to run dnf install:

[macbeth@runtime]$docker run --runtime kata -it --rm fedora dnf install -y strace
Error: Failed to synchronize cache for repo 'updates'

And it fails to run mariadb:

[macbeth@runtime]$docker run --runtime kata -it --rm --env MYSQL_ALLOW_EMPTY_PASSWORD=yes mariadb
Initializing database
2018-09-21  8:46:34 0 [ERROR] Can't init tc log
2018-09-21  8:46:34 0 [ERROR] Aborting

5. kata 9pfs(cache=mmap)/overlayfs/patched-9p

/fstest/tests/chmod/12.t          (Wstat: 0 Tests: 14 Failed: 6)
  Failed tests:  3-4, 7-8, 11-12
/fstest/tests/link/00.t           (Wstat: 0 Tests: 202 Failed: 10)
  Failed tests:  135-136, 142-143, 149-150, 156-157, 163-164
/fstest/tests/mkdir/00.t          (Wstat: 0 Tests: 36 Failed: 2)
  Failed tests:  33-34
/fstest/tests/mkfifo/00.t         (Wstat: 0 Tests: 36 Failed: 2)
  Failed tests:  33-34
/fstest/tests/mknod/00.t          (Wstat: 0 Tests: 36 Failed: 2)
  Failed tests:  33-34
/fstest/tests/mknod/11.t          (Wstat: 0 Tests: 28 Failed: 4)
  Failed tests:  12-13, 25-26
/fstest/tests/open/00.t           (Wstat: 0 Tests: 47 Failed: 2)
  Failed tests:  33-34
/fstest/tests/symlink/03.t        (Wstat: 0 Tests: 6 Failed: 2)
  Failed tests:  1-2
Files=232, Tests=8789, 713 wallclock secs ( 5.38 usr  2.98 sys + 23.03 cusr 176.53 csys = 207.92 CPU)
grahamwhaley commented 6 years ago

Nice work @bergwolf I'd not realised that the unlink patch to 9p made us worse on the posix tests... Having a peek at the kernel 9p docs, it says:

mmap = minimal cache that is only used for read-write                            
                                mmap.  Northing else is cached, like cache=none

Your results make me feel that maybe it is not quite that simple, but gives me confidence that we are unlikely to have broken anything else. So, the idea looks sound to me.

bergwolf commented 6 years ago

@grahamwhaley yeah, that patch is never a clean fix but just a dirty hack workaround :) For a clean fix, it should be addressed on the protocol layer and fix both 9p client and server. The kernel community has discussed it many times but didn't come to a real fix... OTOH, pjdfstest is clearly not testing all POSIX fs semantics and it seems to be missing important semantics some daily apps rely on (like the mmap one).