Closed lukego closed 8 years ago
Everything that we have done by hand would be automated. It would become taboo to rely on instructions in README files and manually created binaries: our new religion would be automation and reproducibility.
I would love to see this.
@lukego Yup already started working on this, see: https://github.com/eugeneia/snabbswitch-docker
After tinkering on this for a while here is a progress update. My working branch can be inspected here: https://github.com/SnabbCo/snabbswitch/compare/master...eugeneia:simplify-test
To just see the docker related changes see: https://github.com/SnabbCo/snabbswitch/commit/03ba0a710808fc4dbd8bd5eb84901e8f4e57e4d9
First of all a summary of whats done (docker-wise):
make docker
that will build a docker image containing a full Snabb Switch test environment.src/
you can call scripts/dock.sh <command>
to run command
on the current Snabb Switch tree within a fresh docker container based on mentioned image.Issues I have encountered include:
mount -o loop ...
, chroot
and possibly other (e.g. mkfs
?) privileged commands. Since docker build
can not be run in privileged mode, we need to invoke the asset building in a docker run
followed by a docker commit
. See https://github.com/docker/docker/issues/1916The run commands in the docker file could be grouped into a single command using &&. This will reduce the image size.
On Tue, Sep 1, 2015, 12:26 AM Max Rottenkolber notifications@github.com wrote:
After tinkering on this for a while here is a progress update. My working branch can be inspected here: master...eugeneia:simplify-test https://github.com/SnabbCo/snabbswitch/compare/master...eugeneia:simplify-test
To just see the docker related changes see: 03ba0a7 https://github.com/SnabbCo/snabbswitch/commit/03ba0a710808fc4dbd8bd5eb84901e8f4e57e4d9
First of all a summary of whats done (docker-wise):
- There is a new top-level target make docker that will build a docker image containing a full Snabb Switch test environment.
- In src/ you can call scripts/dock.sh
to run command on the current Snabb Switch tree within a fresh docker container based on mentioned image. Issues I have encountered include:
- The Dockerfile https://github.com/SnabbCo/snabbswitch/blob/03ba0a710808fc4dbd8bd5eb84901e8f4e57e4d9/Dockerfile is small but the script to build the test assets https://github.com/SnabbCo/snabbswitch/blob/03ba0a710808fc4dbd8bd5eb84901e8f4e57e4d9/src/scripts/make-assets.sh is "big". The reason is that the asset VM images are not docker images and docker is not designed to build "child images" (it would be useful if there was an easy way to turn docker images into disk images to be used by qemu). To build the VM images we use mount -o loop ..., chroot and possibly other (e.g. mkfs?) privileged commands. Since docker build can not be run in privileged mode, we need to invoke the asset building in a docker run followed by a docker commit. See docker/d ocker#1916 https://github.com/docker/docker/issues/1916
- The original idea of using volumes to share test assets does not work out, because volumes are bound to containers (instances of images) and can not be shared on DockerHub which is for sharing images.
- I see a 30% performance degradation in the packetblaster/dpdk benchmark when run in docker. I have not yet been able to figure out why.
- The resulting docker image is 5GB...
— Reply to this email directly or view it on GitHub https://github.com/SnabbCo/snabbswitch/issues/588#issuecomment-136358697 .
I'm not sure that the docker creation should be part of the core SnabbSwitch tree. Does anybody else share my opinion that this could be extracted as a separate project?
@nnikolaev-virtualopensystems I find it quite annoying having to use a different repo for docker builds. It is only adding a couple of files, and now travis-ci supports docker it is useful for eg tests too, and you can do automated builds to docker hub on every change.
@eugeneia tooling for creating populated VM images without loopback mounts largely seems to require booting qemu to boot an installer. There are some other workarounds, eg mkisofs can build a bootable cd image without root.
@nnikolaev-virtualopensystems @justincormack Maybe having the docker related files in a git submodule would be a good compromise.
Cool stuff, Max!
Thoughts:
Is there really one "full Snabb Switch test environment"? This all looks pretty specific to the NFV application so far. I would expect to find this Dockerfile somewhere under program/snabbnfv/
. I would expect other applications to have their own separate Dockerfiles for CI tests e.g. for ALEx VPN that will install NetSNMP with its perl extension and so on. I think it makes sense to split this up and make it easy for individuals to create and run only the tests that they care about.
I also think that for Snabb NFV we need to scale up in terms of the number of guests we support. Suppose that we want to run the CI with every Ubuntu release since 12.04 and with every DPDK release since 1.7. And that we also want to test every major version of important binary-only network appliances from Juniper and other companies. Then we will need a framework that makes it easy for a motivated person to go and create a new test case that can be executed by the CI e.g. like Marcel is Dockerizing the Juniper vMX router.
This makes me imagine a filesystem layout like:
program/
snabbnfv/
test/
vm/
ubuntu/
12.04/
14.04/
...
dpdk/
1.7
1.8
...
juniper/
vMX/
and then to have a Dockerfile in each place that automates the process of creating the VM? (This could be bootstrapped from the ground or downloaded from a well-known location e.g. official VM image distributed by Ubuntu or Juniper and then with some configuration applied.) The actual tests to be executed would have to depend on the VM under test too e.g. whether it is a Linux box, a router between two Linux boxes, etc.
@eugeneia How impractical am I being here?
One more passing thought... if it is logistically difficult to bootstrap VMs then could we use off-the-shelf ones and install software in them? (even use Docker to get iperf/dpdk/etc installed inside the VMs? Just a thought..)
also re: size I see the command dd if=/dev/zero of=$OUT/qemu.img bs=1MiB count=2048
that is creating a 2GB file. That may be excessive.
I suppose the case could also be made for a "jumbo" guest VM that can be booted with many kernel versions and includes the software needed for many test cases (e.g. iperf, ping, and every relevant version of DPDK). That would look very much like the asset-creation script. I am still interested in making it easy for people to run tests with other VMs e.g. binary releases from vendors but I suppose that this could be a separate problem. (Indeed perhaps you would even want to test these by putting Linux VMs on either side and driving traffic through with iperf/ping/dpdk/etc -- then the test cases would look very much like the ones we have now?)
You cant very usefully use Dockerfiles not at the root - they can only see files for ADD commands below them in the tree. It is kind of annoying, but if you have a main dockerfile that builds everything you can still run tests via parameters to the docker RUN command.
I tried a different approach today which I think would be a better fit: Have a program/snabbnfv/test/vm/ubuntu/14.04/Dockerfile
, and use docker export
to “burn” the resulting tar archive onto a sparse RAW image suitable for qemu.
$ mkdir mnt
$ dd if=/dev/zero of=qemu.img bs=1 count=0 seek=2G
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000332418 s,
$ mkfs.ext2 -F qemu.img
mke2fs 1.42.9 (4-Feb-2014)
Discarding device blocks: done
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
131072 inodes, 524288 blocks
26214 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=536870912
16 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912
Allocating group tables: done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done
$ sudo mount -o loop qemu.img mnt
$ cd mnt
$ docker export guest | sudo tar x
Sadly I couldn't get the docker-native user space to boot with our kernel yet:
Mount failed for selinuxfs on /sys/fs/selinux: No such file or directory
[ 14.183740] random: init urandom read with 11 bits of entropy available
[ 15.109652] init: plymouth-upstart-bridge main process (73) terminated with status 1
[ 15.111734] init: plymouth-upstart-bridge main process ended, respawning
[ 15.752925] init: plymouth-upstart-bridge main process (83) terminated with status 1
[ 15.753559] init: plymouth-upstart-bridge main process ended, respawning
[ 16.179781] init: plymouth-upstart-bridge main process (89) terminated with status 1
[ 16.188983] init: plymouth-upstart-bridge main process ended, respawning
The Dockerfile I used:
FROM ubuntu:14.04
MAINTAINER Max Rottenkolber (@eugeneia)
RUN mkdir /hugetlbfs
RUN cp /etc/init/tty1.conf /etc/init/ttyS0.conf
RUN sed -i '$s/.*/exec \/sbin\/getty -8 115200 ttyS0 linux -a root/' /etc/init/ttyS0.conf
RUN printf "auto eth0\niface eth0 inet manual\ndns-nameserver 8.8.8.8\n" > /etc/network/interfaces
RUN apt-get update && apt-get install -y ethtool tcpdump netcat iperf
Any hints whats the error here would be appreciated. If I could get this to work it would be really something!
Turns out I needed to undo a dockerism introduced by FROM ubuntu:14.04
, namely re-enable the init system.
# Reactivate init, see
# https://github.com/tianon/docker-brew-ubuntu-core/blob/dist/trusty/Dockerfile
RUN rm /usr/sbin/policy-rc.d \
&& rm /sbin/initctl \
&& dpkg-divert --local --rename --remove /sbin/initctl
While it is slightly ugly to have this magic in the Dockerfile, it works! I can docker build/create/export
a VM image to use in test_env.
Today I added a Makefile target make test_env
that will build the guest image, kernel and qemu. The exact versions of each can be controlled by make variables. E.g.
make test_env
will build the default NFV test environment (program/snabbnfv/test/{vm/ubuntu/14.04,kernel/ubuntu-trusty,qemu/SnabbCo}
).
If I would want to test with a different VM image I would have to create e.g. vm/ubuntu/12.04/Dockerfile
and then run
make NFV_GUEST_OS=ubuntu NFV_GUEST_VERSION=12.04 test_env
Likewise there are variables for building other kernels and qemu versions (NFV_GUEST_KERNEL
, NFV_QEMU
). While new guest images can be defined as Dockerfiles, kernels and qemu versions can be added in form of Makefiles that produce a bzImage
or qemu
builds respectively.
Going further, I would now add a Makefile target to build Docker images containing different test_env builds which could then be distributed using DockerHub and used by others to run test.
Right now its still in proof of concept state: No DPDK related stuff is built, I ignored kernel modules for now, the Makefile needs some more work as some of the additions I made don't play well with existing rules (don't try to run my snabb-docker branch yet)... details. :)
@lukego What do you think? I obviously like it and I think it serves all our use cases:
Great hacking!
This looks very neat and tidy :-).
Let me see if I understand...
src/Makefile
builds an NFV test environment: compile QEMU, compile guest kernel, create guest image. It is easy to define new variations (software versions and build parameters). You can run this directly on your development machine for testing.I really like this direction! I can easily imagine using this framework to test more things that I am interested in e.g. all recent QEMU versions and all recent guest kernel versions.
Couple of questions:
also thinking long-term, maybe best not to worry about right now, we seem to have a few things mixed together that in my mind are logically separate:
src/Makefile
? Or is it specific to the NFV application and better contained in src/program/nfv
where it will not confuse/distract/annoy people who don't care about NFV?SNABB_PCI0
and so on for running the entire CI test suite we might prefer a different mechanism for interactively running individual test cases. (This comment comes specifically from seeing snabbmark
move one of its parameters from the command line into an environment variable and wondering whether this is the right trend.)Lastly: if the size of qemu.img
becomes an issue then you might want to try compressing it with e.g. gzip
. I believe that Linux file systems use clever tricks for keeping track of which parts of files do not really contain valid data and not bothering to store them (so you can have large image files that are mostly empty space) but that this breaks down when you use the files in other ways e.g. upload to Dockerhub.
Linux supports 'sparse files', but most Linux filesystems don't support transparent compression or magically make files that could be sparse sparse, as far as I know. It's a mechanism to use with caution, since various tools can end up filling in the extra empty bytes if they aren't sparse file aware.
- Will it be easy to take ready-made test environment from Dockerhub and run it with a new version of Snabb Switch? (I mean e.g. if we have 20 test environments defined and the CI wants to run each of them to test a PR.)
I would like to handle this by "mounting the Snabb tree" in containers as volumes. E.g. images never contain Snabb Switch but instead we have an entrypoint/script that launches a container with the current PWD "mounted" under /snabbswitch
. This script could have an option to run a shell instead of snabb for interactive testing. A volume here is not a container-owned docker volume but a "host volume" (e.g. docker -v /host/path:/container/path
).
- Can we use this for ad-hoc testing? For example, would I be able to quickly drop into an interactive tmux session in a container with various interesting assets available and ready to run (e.g. a specific QEMU and a Juniper vMX guest image that I would start manually)?
See above. In this case we could just run a shell in the container and do anything we would do on the host.
- Does building QEMU and virtual machines really belong in src/Makefile? Or is it specific to the NFV application and better contained in src/program/nfv where it will not confuse/distract/annoy people who don't care about NFV?
No. Maybe this should even go in a git subtree. I ran into problems with having a kernel tree checked out below src/
and src/Makefile
slowing to a crawl due to millions of files. So if we want this below src/
we need to deal with that too.
- How should the "user interface" to individual test cases really look? For example selftest of a NIC. Is it better to provide arguments as environment variables, command-line arguments, or Lua scripts? I mean that even if we like SNABB_PCI0 and so on for running the entire CI test suite we might prefer a different mechanism for interactively running individual test cases. (This comment comes specifically from seeing snabbmark move one of its parameters from the command line into an environment variable and wondering whether this is the right trend.)
In our tests/bechmarks, environment variable act like keyword parameters and provide a super-set of the functionality provided by regular parameters. We could support both but that would add lots of boilerplate for little or no additional functionality.
Regarding file size: Currently I use the spare file approach mentioned by @kbara and so far it works. I would prefer not compressing anything and leaving that up to the docker hub transport layer because otherwise we spend time waiting for things to decompress on each test run. I can imagine even specifying exact final sizes to dd
, since the image sizes are predictable (ubuntu:14.04 is <250MB).
I am having trouble getting our QEMU build to run in Docker:
/root/.test_env/qemu/obj/x86_64-softmmu/qemu-system-x86_64: error while loading shared libraries: libgnutls-deb0.so.28: cannot open shared object file: No such file or directory
But the missing library does not exist in trusty: http://packages.ubuntu.com/search?suite=trusty&arch=any&mode=filename&searchon=contents&keywords=libgnutls-deb0.so.28
Can anyone enlighten me on how we solved this in the lab?
Maybe ./configure --disable-gnutls
can do the trick?
This surprises me. The initial proof-of-concept Dockerfile was successfully running test_env
inside the container and that should have been compiling this feature of QEMU from source.
Is there a hint to be found there? https://gist.github.com/lukego/850a037c7d9ecdf594af
@lukego The difference is that in your PoC QEMU was compiled from source inside the container, and I am currently cross-compiling it “ahead of time”. QEMU was linked to the outdated libgnutls-deb0.so.28
because was installed on grindelwald (some sort of auto-detection in the configure script I guess). Once I removed it I was able to build a QEMU that successfully starts inside the container (I believe the package was in Canonical's Snabb Switch ppa). It still won't run in the container because it can't find a bios file, e.g. apparently you can't even move a QEMU build without breaking it.
So basically I can't simply compile QEMU and copy it into the Docker image unless I am building it on an equivalent host. I guess the solution is to compile QEMU inside a suitable container...
An update: I have pushed my first image to DockerHub: https://hub.docker.com/r/eugeneia/snabb-nfv-test/ (@lukego I was unable to create a SnabbCo repository, DockerHub permission issue?) The image is built using https://github.com/eugeneia/snabbswitch-docker (make image
).
You can now (using my modified Snabb Switch branch: https://github.com/eugeneia/snabbswitch/tree/snabb-docker ) do the following on machines where docker is installed:
docker pull eugeneia/snabb-nfv-test
SNABB_PCI0=0000:88:00.0 SNABB_PCI1=0000:88:00.1 scripts/dock.sh make test
And it will run the test suite in a fresh docker container. You can substitute make test
with whatever command you like, e.g. ./snabb snsh -i
, ...
That sounds fantastic.
Agree! This sounds wonderful.
I also agree that we want to do as much as possible inside containers. Assume that the host will be some horrendously crippled incomprehensible distribution like NixOS with no software installed :). This will make it practical to administrate the lab and keep everything repeatable.
(The fine tweaks you are doing on Grindelwald are probably breaking the fine tweaks that @nnikolaev-virtualopensystems did previously to get OpenStack running with customized Ubuntu packages. That stuff all has to be moved into containers too so that we can run it on all servers. One step at a time though...)
The kernel/DPDK image I am building as of now yield bad performance, caused by the DPDK VM dropping lots of packets (unrelated to docker):
Port statistics ====================================
Statistics for port 0 ------------------------------
Packets sent: 89430904
Packets received: 141823839
Packets dropped: 52392935
Aggregate statistics ===============================
Total packets sent: 89430904
Total packets received: 141823839
Total packets dropped: 52392935
====================================================
Any hints on what could be the cause of this are highly appreciated.
Can we check if these packet drops are related to #592 somehow?
Generally it takes quite some head-scratching to work out the reason for packet drops in test environments where the input load is above the capacity of the system under test. Often it depends more on the order in which each queue is processed than on raw performance. (Guest dropping packets doesn't necessarily mean the guest has slowed down: can also mean we are processing its receive queue faster than its transmit queue and so it can't always send the packets it receives.)
I just compared with the DPDK VM we use on master
and found out that it does not suffer from the packet drops. (Same machine, same snabb/packetblaster build.) So I have to assume that the problem lies in the VM I build.
The difference seems to be the DPDK version, currently we use https://github.com/virtualopensystems/dpdk/commit/36c248ebc629889fff4e7d9d17e109412ddf9ecf plus this ad-hoc diff:
diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
index 4069d7c..99a1088 100644
--- a/examples/l2fwd/main.c
+++ b/examples/l2fwd/main.c
@@ -102,8 +102,8 @@
/*
* Configurable number of RX/TX ring descriptors
*/
-#define RTE_TEST_RX_DESC_DEFAULT 128
-#define RTE_TEST_TX_DESC_DEFAULT 512
+#define RTE_TEST_RX_DESC_DEFAULT 0
+#define RTE_TEST_TX_DESC_DEFAULT 0
static uint16_t nb_rxd = RTE_TEST_RX_DESC_DEFAULT;
static uint16_t nb_txd = RTE_TEST_TX_DESC_DEFAULT;
diff --git a/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h b/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
index 4c27d5d..a590797 100644
--- a/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
+++ b/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
@@ -3842,7 +3842,7 @@ static inline struct sk_buff *__kc__vlan_hwaccel_put_tag(struct sk_buff *skb,
#define HAVE_ENCAP_TSO_OFFLOAD
#endif /* >= 3.10.0 */
-#if ( LINUX_VERSION_CODE < KERNEL_VERSION(3,14,0) )
+#if ( LINUX_VERSION_CODE < KERNEL_VERSION(3,13,0) )
#ifdef NETIF_F_RXHASH
#define PKT_HASH_TYPE_L3 0
static inline void
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c b/lib/librte_pmd_virtio/virtio_ethdev.c
index d0b419d..6bb776d 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -799,6 +799,7 @@ eth_virtio_dev_init(__rte_unused struct eth_driver *eth_drv,
} else {
hw->max_rx_queues = 1;
hw->max_tx_queues = 1;
+virtio_dev_cq_queue_setup(eth_dev, 2, SOCKET_ID_ANY);
}
eth_dev->data->nb_rx_queues = hw->max_rx_queues;
The DPDK VM I build on the other hand uses https://github.com/virtualopensystems/dpdk/commit/7807fbbcd2ae9c1da8c9e0d20a3d5a4f783e9d6f
I bet the culprit is somewhere between these commits.
I would like to switch to an upstream DPDK. Ideal would actually be the latest release.
The customized one should only have been needed for earlier experiments when we were planning to send patches to QEMU and DPDK to increase the sizes of the vrings, but we dropped that change and should not require any patches nowadays.
Right @nnikolaev-virtualopensystems?
It could be the problem is in this commit: https://github.com/virtualopensystems/dpdk/commit/7807fbbcd2ae9c1da8c9e0d20a3d5a4f783e9d6f. It is from the times we used a patched QEMU with 8192 sized vrings, so we wanted to propagate this to the Intel vritio PMD (same way as the Linux kernel driver does). Like this https://github.com/SnabbCo/qemu/commit/7a94322b279c5dd8fc5f2cb429814e0411ca0b0e.
Maybe the patch is obsolete nowadays, and will give better performance with the default values. It might be also a good idea to get a more recent DPDK (2.1 is stable, and 2.2 is on the way).
@lukego ... I guess great minds think alike - its just that you've been faster :)
Anyway, i think that at least this one might be needed: https://github.com/virtualopensystems/dpdk/commit/dae0a7f57e5656ad6c8422a5ea6a368cf306ae24
And for compiling on older versions of the kernel (14.04 LTS): https://github.com/virtualopensystems/dpdk/commit/d34d593d9de054e910e4081512a8a1ab48f654bf
Hope this helps.
I built a new image with DPDK (https://github.com/eugeneia/dpdk/commits/v2.1.0-snabb) as @nnikolaev-virtualopensystems suggested but got similar results:
Aggregate statistics ===============================
Total packets sent: 84511007
Total packets received: 123580066
Total packets dropped: 39069059
====================================================
Testing with the “legacy” DPDK (I reproduced virtualopensystems/dpdk@36c248e plus diff) yields the same packet loss. That makes me think I am doing something different with the VM configuration. What could affect DPDK performance? Hugepages?
I take that back: Actually the “legacy build” has ~20% less packet loss and yields ~30% better performance, I think that's actually the performance hit I am hunting down here.
Closing because #626 landed which covers “Dockerization of CI”.
I propose that we Dockerize our test and CI infrastructure. That is, we would use Docker to make the test setup completely bootstrappable (everything done with Dockerfiles) and also instantly installable (as a versioned prebuilt binary from http://hub.docker.com/). Then that we actually execute CI tests using Dockerfiles and expand the set of hardware and configurations that we test.
This is following on from related threads:
574 troubleshooting of an NFV test scenario.
Everything that we have done by hand would be automated. It would become taboo to rely on instructions in README files and manually created binaries: our new religion would be automation and reproducibility.
This requires work and especially for the NFV application:
Dockerfile
that builds a complete NFV guest operating system. For example with a specific Linux kernel version, kernel configuration, and Ubuntu version. (This would be generalized to support more kernels, kernel configurations, distros, and operating systems.) The build results (e.g.bzImage
anddisk.img
) would be stored read-only inside the container and could be accessed externally from other containers via Docker "volumes". The container with completed build would be uploaded to http://hub.docker.com/ as free storage of the image files with proper naming and versioning.test_env
to access image files via a Docker "volume" (that the user can override e.g. for Ubuntu vs FreeBSD guest) and to use the system-installedqemu
(etc) instead of manually building its own.Dockerfile
to run the CI test suite (or a well-defined subset thereof). The "build" phase of this container would install all necessary software including building a specific version of QEMU. The "run" phase would checkout Snabb Switch (version provided as env/arg) and execute the tests and post the results to Github. See draft Dockerfile.@eugeneia you might be interested in taking the lead on this? (or on shooting the idea down :-))