quay / claircore

foundation modules for scanning container packages and reporting vulnerabilities
https://quay.github.io/claircore/
Apache License 2.0
144 stars 84 forks source link

Unsupported scan results on Google Distroless images #181

Closed eumel8 closed 1 year ago

eumel8 commented 4 years ago

Description of Problem / Feature Request

When I push Google Distroless Images to my Quay registry, I've got Unsupported security scan results. I.e from Debian Java Image

Expected Outcome

Security findings based on CVE for Debian.

Actual Outcome

Unsupported

Environment

Additional infos: in https://github.com/quay/clair/blob/8cdd815ccdab27a2ded0e68740b27444efca8d1e/ext/featurefmt/dpkg/dpkg.go#L41 is a file regex for "var/lib/dpkg/status". It seems Google holds the package information in one file per package in "var/lib/dpkg/status.d"

Maybe it's easy for a Go programmer to add a loop to catch the package information from there :-) Workaround would be an extra build to provide the required status file with information from the status directory.

ldelossa commented 4 years ago

@hdonnay what do you think of this? It looks like google is packaging containers that utilize distributions we do support. At first glance it does seem like some simple modifications to the dpkg scanner would make this work.

Running cctool unpack

/tmp/cctool-unpack-248681396
❯ fd os-release | xargs cat
PRETTY_NAME="Distroless"
NAME="Debian GNU/Linux"
ID="debian"
VERSION_ID="10"
VERSION="Debian GNU/Linux 10 (buster)"
HOME_URL="https://github.com/GoogleContainerTools/distroless"
SUPPORT_URL="https://github.com/GoogleContainerTools/distroless/blob/master/README.md"
BUG_REPORT_URL="https://github.com/GoogleContainerTools/distroless/issues/new"
PRETTY_NAME="Debian GNU/Linux 10 (buster)"

NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"/tmp/cctool-unpack-248681396
REPORT_URL="https://bugs.debian.org/"

both layers have a valid Debian os-release file, however one is appended with google information.

In theory this would work @eumel8 my hesitation is that any NON-debian based image that google decides to package may not work if it doesn't align with a base distribution.

So far their "production" ready distroless images utilize debian.

eumel8 commented 4 years ago

@ldelossa thanks for a first look. Indeed that catch only Debian based images from Google Distroless but it would support the corona-warn-app project which decided to use that kind of image hosted in Quay and scanned by Clair.

LouisStAmour commented 4 years ago

Two thoughts:

  1. Distroless built their own dpkg parser to be used in their builds, it seems unlikely in the near future that they would switch. https://github.com/GoogleContainerTools/distroless/blob/master/package_manager/dpkg.bzl

  2. The distroless approach is less about whether files came from packages as it is that only the files required to run an app are included. As a result, a number of files do not come from packages. Notably Jetty, Node, Dotnet, Busybox (debug-tagged containers)... Additionally, these containers don't contain packaging tools, so third-party uses and third-party applications will be built without them. It's worth it, then, to perhaps flag that these are not meant to be images maintained by package managers, they're meant to be images with runtime code only.

I wanted to figure out where the dpkg/status.d files were created since it wasn't distroless. Turns out it's part of Bazel's rules_docker:

https://github.com/bazelbuild/rules_docker/blob/0cceda163989527dea4740a98b261080eb65446f/README.md#container_layer - packages are specified as "debs" here.

Package metadata is written here: https://github.com/bazelbuild/rules_docker/blob/131eb110957ef46ebbcbb0e81dce99132f15334c/container/build_tar.py#L297-L300

So as a bonus, any containers built with Bazel using the same Distroless technique from .deb files should store metadata that can be checked, not just Distroless-official containers. Those built using Bazel but following the language_tool_layer approach -- which uses, for example, Ubuntu as a base image, shouldn't be affected. https://github.com/bazelbuild/rules_docker/issues/1231 But then those aren't "Distroless" by definition. (I'm not sure anyone's tried the third approach -- start from a base Ubuntu image, track modified changes in layers like usual, then replace the base Ubuntu image layer? It's probably redundant even though it would run dpkg postinstall scripts at that point.)

But the same kinds of issues could apply if someone exported say, a static binary that happened to include versions of functions that were vulnerable. I suppose the only answer there is to scan for known vulnerable function strings and instructions and hope the compiler hasn't mangled things too much? :)

Regarding scanning binaries in Distroless containers -- Though I can find a number of proof of concept implementations, it's unclear to me if anyone in open source has tried to build a dataset from text and binary function hash differences before and after a CVE fix to then determine if the same vulnerability is present in a static build output. Presumably changes to compiler settings would add noise or complicate such a dataset. I suppose this is an argument to layer your scanning so you scan your source code, your build logs (perhaps) and your builds for outdated modules.

hdonnay commented 4 years ago

I'm interested in having fingerprint-based detection, but it needs lots of thought.

And so on.

ldelossa commented 4 years ago

I think this thread is starting to splinter in topic.

Right now it looks as if we can support the Distroless images by changing dpkg scanner to also evaluate the /status.d directory. The info about the bazel build system is useful. Seeing how this is more of a build artifact vs a Distroless concept makes supporting it attractive.

ClairCore evaluates package addition and removal on a "per package database" basis. If we treat each file in /status.d as its own package db, everything should "just work".

I understand the caveats in that we are not actually scanning the binary, but as @hdonnay pointed out, that is a bigger topic that I believe can become its own topic. Feel free to correct me if im wrong.

Im not opposed to starting a branch where we simply evaluate the /status.d directory in our dpkg scanner as auxiliary dpkg package databases on the filesystem. @hdonnay let me know of any red flags or issues pop up in your head with that.

romdalf commented 3 years ago

@ldelossa if you would need some field testings, let me know, we have some use cases available :)

ldelossa commented 3 years ago

Plan this PoC for v4.2 tentatively.

mustaFAB53 commented 1 year ago

Hi Team, Do we have any update on this support ?

crozzy commented 1 year ago

The linked PR will add the ability to detect distroless packages, this means the SBOM created from indexing will show the packages, but there won't be any associated vulnerability information until a fetcher is created with a distroless vulnerability source.

eumel8 commented 1 year ago

@crozzy : many thanks for catching up this topic!