Closed eumel8 closed 1 year ago
@hdonnay what do you think of this? It looks like google is packaging containers that utilize distributions we do support. At first glance it does seem like some simple modifications to the dpkg scanner would make this work.
Running cctool unpack
/tmp/cctool-unpack-248681396
❯ fd os-release | xargs cat
PRETTY_NAME="Distroless"
NAME="Debian GNU/Linux"
ID="debian"
VERSION_ID="10"
VERSION="Debian GNU/Linux 10 (buster)"
HOME_URL="https://github.com/GoogleContainerTools/distroless"
SUPPORT_URL="https://github.com/GoogleContainerTools/distroless/blob/master/README.md"
BUG_REPORT_URL="https://github.com/GoogleContainerTools/distroless/issues/new"
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"/tmp/cctool-unpack-248681396
REPORT_URL="https://bugs.debian.org/"
both layers have a valid Debian os-release file, however one is appended with google information.
In theory this would work @eumel8 my hesitation is that any NON-debian based image that google decides to package may not work if it doesn't align with a base distribution.
So far their "production" ready distroless images utilize debian.
@ldelossa thanks for a first look. Indeed that catch only Debian based images from Google Distroless but it would support the corona-warn-app project which decided to use that kind of image hosted in Quay and scanned by Clair.
Two thoughts:
Distroless built their own dpkg parser to be used in their builds, it seems unlikely in the near future that they would switch. https://github.com/GoogleContainerTools/distroless/blob/master/package_manager/dpkg.bzl
The distroless approach is less about whether files came from packages as it is that only the files required to run an app are included. As a result, a number of files do not come from packages. Notably Jetty, Node, Dotnet, Busybox (debug-tagged containers)... Additionally, these containers don't contain packaging tools, so third-party uses and third-party applications will be built without them. It's worth it, then, to perhaps flag that these are not meant to be images maintained by package managers, they're meant to be images with runtime code only.
I wanted to figure out where the dpkg/status.d files were created since it wasn't distroless. Turns out it's part of Bazel's rules_docker:
https://github.com/bazelbuild/rules_docker/blob/0cceda163989527dea4740a98b261080eb65446f/README.md#container_layer - packages are specified as "debs" here.
Package metadata is written here: https://github.com/bazelbuild/rules_docker/blob/131eb110957ef46ebbcbb0e81dce99132f15334c/container/build_tar.py#L297-L300
So as a bonus, any containers built with Bazel using the same Distroless technique from .deb files should store metadata that can be checked, not just Distroless-official containers. Those built using Bazel but following the language_tool_layer approach -- which uses, for example, Ubuntu as a base image, shouldn't be affected. https://github.com/bazelbuild/rules_docker/issues/1231 But then those aren't "Distroless" by definition. (I'm not sure anyone's tried the third approach -- start from a base Ubuntu image, track modified changes in layers like usual, then replace the base Ubuntu image layer? It's probably redundant even though it would run dpkg postinstall scripts at that point.)
But the same kinds of issues could apply if someone exported say, a static binary that happened to include versions of functions that were vulnerable. I suppose the only answer there is to scan for known vulnerable function strings and instructions and hope the compiler hasn't mangled things too much? :)
Regarding scanning binaries in Distroless containers -- Though I can find a number of proof of concept implementations, it's unclear to me if anyone in open source has tried to build a dataset from text and binary function hash differences before and after a CVE fix to then determine if the same vulnerability is present in a static build output. Presumably changes to compiler settings would add noise or complicate such a dataset. I suppose this is an argument to layer your scanning so you scan your source code, your build logs (perhaps) and your builds for outdated modules.
I'm interested in having fingerprint-based detection, but it needs lots of thought.
And so on.
I think this thread is starting to splinter in topic.
Right now it looks as if we can support the Distroless images by changing dpkg scanner to also evaluate the /status.d directory. The info about the bazel build system is useful. Seeing how this is more of a build artifact vs a Distroless concept makes supporting it attractive.
ClairCore evaluates package addition and removal on a "per package database" basis. If we treat each file in /status.d as its own package db, everything should "just work".
I understand the caveats in that we are not actually scanning the binary, but as @hdonnay pointed out, that is a bigger topic that I believe can become its own topic. Feel free to correct me if im wrong.
Im not opposed to starting a branch where we simply evaluate the /status.d directory in our dpkg scanner as auxiliary dpkg package databases on the filesystem. @hdonnay let me know of any red flags or issues pop up in your head with that.
@ldelossa if you would need some field testings, let me know, we have some use cases available :)
Plan this PoC for v4.2 tentatively.
Hi Team, Do we have any update on this support ?
The linked PR will add the ability to detect distroless packages, this means the SBOM created from indexing will show the packages, but there won't be any associated vulnerability information until a fetcher is created with a distroless vulnerability source.
@crozzy : many thanks for catching up this topic!
Description of Problem / Feature Request
When I push Google Distroless Images to my Quay registry, I've got Unsupported security scan results. I.e from Debian Java Image
Expected Outcome
Security findings based on CVE for Debian.
Actual Outcome
Unsupported
Environment
uname -a
): 4.15.0-76-generickubectl version
): 1.16.8helm version
): 3.1.2Additional infos: in https://github.com/quay/clair/blob/8cdd815ccdab27a2ded0e68740b27444efca8d1e/ext/featurefmt/dpkg/dpkg.go#L41 is a file regex for "var/lib/dpkg/status". It seems Google holds the package information in one file per package in "var/lib/dpkg/status.d"
Maybe it's easy for a Go programmer to add a loop to catch the package information from there :-) Workaround would be an extra build to provide the required status file with information from the status directory.