tern-tools / tern

Tern is a software composition analysis tool and Python library that generates a Software Bill of Materials for container images and Dockerfiles. The SBOM that Tern generates will give you a layer-by-layer view of what's inside your container in a variety of formats including human-readable, JSON, HTML, SPDX and more.
BSD 2-Clause "Simplified" License
967 stars 188 forks source link

PyCon 2018 Sprint: Implement rudimentary isolated layered filesystem #37

Closed nishakm closed 6 years ago

nishakm commented 6 years ago

Currently, Tern is heavily dependent on Docker to run shell scripts against a given root filesystem. It does:

  1. Pulls the base image from Dockerhub
  2. Spins up a container using Docker
  3. Runs docker exec against the container

This isn't very effective as we ultimately want to reference the packages that came with every diff filesystem. It's clunky to use Docker or any other existing tool to disentangle a container image, so let's try to use some rudimentary Linux Kernel system calls to do it instead.

Here is a shell proof of concept that we can use to accomplish this using overlay, unshare and chroot:

# make all the working directories in temp first
$ cd temp
$ mkdir mergedir workdir
# untar all of the layer.tar files - each layer.tar file can be found in the manifest.json
$ cat manifest.json
[{"Config":"dfed51d8bf5d12569254cec5109fd0e5a3ccc3ced1c70647306b93fc2835a37f.json","RepoTags":["photon:3layers"],"Layers":["800692525c0779e33fda674f7a61a4925b453138cba7e0d7748dff691e38e491/layer.tar","092b3b6b9538cfeec7252aa2a95c8ffb9a6677023cac1751bc8948ee0ccedd3a/layer.tar","bf622c56bf154c72c90ecd4742f154abb8cd65757cab88488a173413b3ca63e5/layer.tar"]}]# untar the first layer into the mergedir
# note - the above result may be different for you
# layers are applied in order from first to last in the manifest.json
# untar first layer
$ mkdir 800692525c0779e33fda674f7a61a4925b453138cba7e0d7748dff691e38e491/contents
$ tar xvf 800692525c0779e33fda674f7a61a4925b453138cba7e0d7748dff691e38e491/layer.tar -C 800692525c0779e33fda674f7a61a4925b453138cba7e0d7748dff691e38e491/contents
# bind mount to mergedir 
$ sudo mount -o bind 800692525c0779e33fda674f7a61a4925b453138cba7e0d7748dff691e38e491/contents mergedir 
# mount proc, sys and dev because processes may be looking at these
$ sudo mount -t proc /proc mergedir/proc
$ sudo mount -o bind /sys mergedir/sys
$ sudo mount -o bind /dev mergedir/dev
# execute required shell command
$ sudo unshare -pf --mount-proc=$PWD/mergedir/proc chroot mergedir <shell> -c <command in single quotes''>
# undo proc sys and dev mounts
$ sudo umount mergedir/proc
$ sudo umount mergedir/sys
$ sudo umount mergedir/dev
# overlay next layer
$ mkdir 092b3b6b9538cfeec7252aa2a95c8ffb9a6677023cac1751bc8948ee0ccedd3a/contents
$ tar xvf 092b3b6b9538cfeec7252aa2a95c8ffb9a6677023cac1751bc8948ee0ccedd3a/layer.tar -C 092b3b6b9538cfeec7252aa2a95c8ffb9a6677023cac1751bc8948ee0ccedd3a/contents
$ sudo mount -t overlay overlay -o lowerdir=mergedir,upperdir=092b3b6b9538cfeec7252aa2a95c8ffb9a6677023cac1751bc8948ee0ccedd3a/contents,workdir=workdir mergedir
# remount proc, sys and dev
# run unshare and chroot command again
# unmount again
# repeat for successive layers
# clean up
$ sudo umount -rl mergedir

To resolve issue:

  1. Create some utilities to execute above commands (in utils directory)
  2. Come up with a reliable subroutine to set up an overlay filesystem to work on (common.py)
  3. Implement invoke_in_container module (in command_lib/command_lib.py) using the above utilities
  4. Show layer by layer debugging using the python in interactive mode

This issue is reserved for PyCon 2018 sprint participants. Some issues may be spun off from this main one during the sprint. After the sprint ends, anyone can work either with me on this issue or independently on a sub-issue.

The current development branch for this issue is: https://github.com/vmware/tern/tree/layer-debug

nishakm commented 6 years ago

Our working branch is https://github.com/vmware/tern/tree/layer-debug

It's up to a point where I can do this using the sample Dockerfile: https://github.com/vmware/tern/tree/layer-debug/samples/photon_3_layers:

First build the Docker image:

$ cd samples/photon_3_layers
$ docker build -t photon:3layers .

In the python interpreter:

>>> full_cmd = 'tdnf check-update > /dev/null && tdnf list installed | cut -f1 -d"."'      (the command to run in chroot)                                                                
>>> from classes.docker_image import DockerImage
>>> d = DockerImage('photon:3layers')
>>> d.load_image() (loads all the metadata)
[sudo] password for nisha: (I run Docker without elevated permissions)
>>> from utils import rootfs
>>> rootfs.mount_base_layer(d.layers[0].tar_file)
>>> result1 = rootfs.run_chroot_command(full_cmd, 'usr/bin/bash')
>>> result1 (all the packages that the base layer contains)
b'bash\nbzip2-libs\nca-certificates\nca-certificates-pki\ncurl\ncurl-libs\ne2fsprogs-libs\nelfutils-libelf\nexpat-libs\nfilesystem\nglibc\nhawkey\nkrb5\nlibcap\nlibdb\nlibgcc\nlibsolv\nlibssh2\nncurses-libs\nnspr\nnss-libs\nopenssl\nphoton-release\nphoton-repos\npopt\nreadline\nrpm-libs\nsqlite-libs\ntdnf\ntoybox\nxz-libs\nzlib\n'
>>> rootfs.mount_diff_layer(d.layers[1].tar_file)
>>> result2 = rootfs.run_chroot_command(full_cmd, 'usr/bin/bash')
>>> result2 (all the packages that the base + diff layer contains)
b'bash\nbzip2-libs\nca-certificates\nca-certificates-pki\ncurl\ncurl-libs\ne2fsprogs-libs\nelfutils-libelf\nexpat\nexpat-libs\nfilesystem\ngdbm\ngit\nglibc\nhawkey\nkrb5\nlibcap\nlibdb\nlibffi\nlibgcc\nlibsolv\nlibssh2\nncurses\nncurses-libs\nnspr\nnss-libs\nopenssl\nperl\nperl-CGI\nperl-DBI\nperl-YAML\nphoton-release\nphoton-repos\npopt\npython3\npython3-libs\nreadline\nrpm-libs\nsqlite-libs\ntdnf\ntoybox\nxz\nxz-libs\nzlib\n'                                          
>>> rootfs.mount_diff_layer(d.layers[2].tar_file)
>>> result3 = rootfs.run_chroot_command(full_cmd, 'usr/bin/bash')
>>> result3 (all the packages that layer1 + layer2 + layer3 contain)
b'bash\nbzip2-libs\nca-certificates\nca-certificates-pki\ncurl\ncurl-libs\ne2fsprogs-libs\nelfutils-libelf\nexpat\nexpat-libs\nfilesystem\ngdbm\ngit\nglibc\nhawkey\nkrb5\nlibcap\nlibdb\nlibffi\nlibgcc\nlibsolv\nlibssh2\nncurses\nncurses-libs\nnspr\nnss-libs\nopenssl\nperl\nperl-CGI\nperl-DBI\nperl-YAML\nphoton-release\nphoton-repos\npopt\npython3\npython3-libs\nreadline\nrpm-libs\nsqlite-libs\ntdnf\ntoybox\nvim\nxz\nxz-libs\nzlib\n'                                     
>>> rootfs.unwind_mount() <--- this is not working
>>> result_list1 = result1.decode('utf-8').split('\n')
>>> result_list1
['bash', 'bzip2-libs', 'ca-certificates', 'ca-certificates-pki', 'curl', 'curl-libs', 'e2fsprogs-libs', 'elfutils-libelf', 'expat-libs', 'filesystem', 'glibc', 'hawkey', 'krb5', 'libcap', 'libdb', 'libgcc', 'libsolv', 'libssh2', 'ncurses-libs', 'nspr', 'nss-libs', 'openssl', 'photon-release', 'photon-repos', 'popt', 'readline', 'rpm-libs', 'sqlite-libs', 'tdnf', 'toybox', 'xz-libs', 'zlib', '']
>>> result_list2 = result2.decode('utf-8').split('\n')
>>> result_list3 = result3.decode('utf-8').split('\n')
>>> list(set(result_list2) - set(result_list1))
['perl-CGI', 'gdbm', 'perl-DBI', 'expat', 'perl-YAML', 'python3-libs', 'perl', 'python3', 'ncurses', 'xz', 'libffi', 'git'] <--- all the packages that were installed with 'tdnf install git'
>>> list(set(result_list3) - set(result_list2))
['vim'] <--- all the packages that were installed with 'tdnf install vim'
snmpboy commented 6 years ago

Helpful awk for looking at mounts grep temp/mergedir /proc/mounts | awk '{ print $2 }'

nishakm commented 6 years ago

Resolved in #56. Thanks everyone for participating the the PyCon2018 developer sprint! It means a lot :)