Closed russjones closed 4 years ago
I looked into this the other day and ran some test builds/scans against images on both gcr.io
and quay.io
.
We do actually run apt-get -y update && apt-get -y upgrade
and the changes do persist, so our images do contain up-to-date system level packages. The reason this issue has been reported is because the vulnerability scanners which run on the images are always using up-to-date data, but our images get fixed at the point in time when a given version of Teleport is released. Therefore, if you scan an image on the day it's released, it likely has zero vulnerabilities. Scan it a month later and it might have a few which have been discovered in the meantime.
The only way for us to "fix" this would be to proactively rebuild and republish Docker images for every single version of Teleport on a regular schedule, say weekly. This doesn't scale very well as it's a big build job to run and will just start to take longer and longer as the number of versions increases. If we want to do this, it's possible - but it would rely on people proactively re-pulling the Docker image they're using. If they're doing that, they might as well update to the latest version anyway, as we ensure compatibility between patch versions of Teleport on the same minor release.
I actually think the best approach here is a three-pronged strategy:
1) We should start proactively rebuilding the Docker image for the most recent Teleport release in each supported branch on a regular basis, say once a day. This will incentivise people to run the latest version as its container will be getting regular security updates while older versions will not. When a new version of Teleport is released, we leave the old image at the latest update as of the new image's release date, then start updating the new image.
2) Encourage people to update their infrastructure more regularly. It's no surprise to anyone that running out-of-date software is more likely to have vulnerabilities. As much as I don't like the idea in principle, maybe we should build a version check into Teleport which emits log messages regularly to notify people that they are running an older version and should update.
3) We should switch to using a two-stage build. The first stage will update all packages and establish the base layer for the image, then the second stage will just add the Teleport binaries from the release package. This will reduce image build times and also reduce the overall image size.
We should definitely switch to a two-stage build and base the final image on scratch
or distroless or busybox
.
Current teleport image size is 243MB, which is too much for a static Go binary.
Question: is there a use-case for including debug tools in the image? If yes, which ones?
Some more info:
tctl
The above weeds out scratch
and distroless
unfortunately.
We can still go with busybox
! It has basic system tools and vi
, but no package manager. It's unlikely to have major CVEs (no image scanner would pick them up anyway).
I'm thinking the busybox:1.31-musl
, @webvictim @russjones do you have any preference here?
Also, does anyone object to dropping the package manager from the image?
@webvictim mentioned that using Alpine in the past was problematic because of musl
causing build problems. busybox
has ulibc
, glibc
and musl
variants, so we can see what works best.
Also, if there are any debugging stories that required extra tools to be installed within the image, please mention them here.
The things I use most regularly:
vim
to edit Teleport configsnetstat
to check listening ports/connections etcjq
for pretty-printing audit logsThanks @webvictim. Could these tools theoretically be used from the host? I'd like to avoid bundling debug stuff with the image itself if possible - less attack surface and smaller download size.
Potentially, as long as the data is actually available on the host via volume.
For my use cases I'm mostly running this stuff on demo clusters which use teleport-ent
as a base image and then add some build steps anyway, so I can probably just install the tools into those images for demo clusters rather than bloating up the base image. It just becomes considerably harder to install things if there's no package management in the container.
Another thought about rebuilding images: we could publish patch releases (e.g. 4.2.1
), but also mutable tags for minor versions (e.g. 4.2
). The minor version tags would be aliases to the latest patch version.
That way a customer can just point to the minor version, not worrying about compatibility and also getting fresh bug-fixes.
The only way for us to "fix" this would be to proactively rebuild and republish Docker images for every single version of Teleport on a regular schedule, say weekly. This doesn't scale very well as it's a big build job to run and will just start to take longer and longer as the number of versions increases.
We should probably wait for the new CI system with cron jobs. And we could choose to only support the latest 2 major releases or so (e.g. 3.x and 4.x).
I've pretty much implemented this build pipeline via Drone now. Some notes:
busybox:glibc
.get.gravitational.com
.Image naming proposal:
OSS: quay.io/gravitational/teleport:4.2.x
Enterprise: quay.io/gravitational/teleport-ent:4.2.x
FIPS: quay.io/gravitational/teleport-ent:4.2.x-fips
OSS: quay.io/gravitational/teleport:4.1.x
Enterprise: quay.io/gravitational/teleport-ent:4.1.x
FIPS: quay.io/gravitational/teleport-ent:4.1.x-fips
OSS: quay.io/gravitational/teleport:4.0.x
Enterprise: quay.io/gravitational/teleport-ent:4.0.x
FIPS: quay.io/gravitational/teleport-ent:4.0.x-fips
This will allow anyone to set their base image to the .x
tag and always be on the latest version of Teleport - or at least get updates however often they choose to pull the image.
There is also the possibility with this pipeline to simultaneously tag and push the built images to a second backup repo in case quay.io is down for a period of time. GCR may be an option here.
Can we tag images similar to how busybox (and some others) does it?
quay.io/gravitational/teleport:4.2
as a tag pointing to the latest patch release. This is mutable, users get the latest patch releases at a tiny risk of things suddenly breaking.quay.io/gravitational/teleport:4.2.5
as a tag pointing to a specific patch release. This is immutable and users know that the image they run will be bit-by-bit identical every time. Paranoid users, can verify the checksum of the image before deploying every time.Yes, we could certainly do that. My thinking behind the current scheme was that it might avoid people accidentally getting confused by 4.0
vs 4.0.0
, or especially something like 4.0-fips
vs 4.0.0-fips
.
There’s a couple of other questions then also:
:4
tag as a pointer to the latest 4.x.x
release?latest
tag which always points to the very latest release?
latest
tags are a bit of an antipattern nowOne issue that I'm seeing with the use of busybox:glibc
is that Teleport won't run, as the image doesn't seem to provide a full quotient of glibc
libraries.
These are the libraries Teleport actually needs - we've never been able to build a fully static binary because of the fact that sqlite
uses cgo, which requires glibc
, and we also have PAM/BPF code that needs glibc
:
$ ldd /usr/local/bin/teleport
linux-vdso.so.1 (0x00007ffce07a9000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007ff082d89000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ff082d67000)
libc.so.6 => /lib64/libc.so.6 (0x00007ff082b9e000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff082dca000)
These are the libraries in the busybox:glibc
image:
/ # find / -iname "*.so*"
/lib/libresolv.so.2
/lib/libnss_files.so.2
/lib/ld-linux-x86-64.so.2
/lib/libnss_compat.so.2
/lib/libc.so.6
/lib/libpthread.so.0
/lib/libnss_nisplus.so.2
/lib/libnss_nis.so.2
/lib/libnsl.so.1
/lib/libm.so.6
/lib/libnss_hesiod.so.2
/lib/libnss_dns.so.2
The lack of libdl.so.2
means Teleport won't start:
/ # /usr/local/bin/teleport version
/usr/local/bin/teleport: error while loading shared libraries: libdl.so.2: cannot open shared object file: No such file or directory
I'm changing the base image to frolvlad/alpine-glibc
for now which is only 12MB bigger than busybox and works fine.
Yes, we could certainly do that. My thinking behind the current scheme was that it might avoid people accidentally getting confused by 4.0 vs 4.0.0, or especially something like 4.0-fips vs 4.0.0-fips.
That's fair. I guess it's a trade-off between potential confusion and not covering some niche use cases.
should we offer a:4 tag as a pointer to the latest 4.x.x release?
No, because we don't guarantee compatibility between (>1) minor versions. Practically, our minor versions are somewhat like major versions.
should we offer a latest tag which always points to the very latest release?
Probably not, prevent customers from shooting themselves in the foot.
I'm changing the base image to frolvlad/alpine-glibc for now which is only 12MB bigger than busybox and works fine.
It seems scary to depend on some individual's dockerhub image for our production builds. Would it be difficult to make teleport work with musl libc instead of glibc?
covering some niche use cases.
Presumably you're talking about the 4.2
tag - which niche use cases are these? I'm curious. I think I'm going to leave the final naming decision up to @benarent anyway, but it'd be nice to have as much detail as we can get.
It seems scary to depend on some individual's dockerhub image for our production builds.
I agree it's a little scary, but the image has nearly a million pulls and seems to be reasonably well-used. We could always fork the Github repo/Dockerfile and build the container ourselves if that's preferable?
Would it be difficult to make teleport work with musl libc instead of glibc?
I just gave this a try. I can make Teleport build with musl libc inside a modified version of our standard buildbox container and the outputted binaries do then work in the standard busybox
image. It took a bit of extra work to get the binaries to compile with PAM and BPF support but I think I've got there.
This probably needs to be a bigger conversation though as changing the binaries from being dynamically linked with glibc
to statically linked with musl
might present some issues for certain customers.
Presumably you're talking about the 4.2 tag - which niche use cases are these?
I mean when customers want to lock in a specific version of teleport
and prevent it from changing under their noses. They could also verify the image hash before starting to detect modifications.
Although users with such strict requirements would probably vendor images into their own registries anyway.
I agree it's a little scary, but the image has nearly a million pulls and seems to be reasonably well-used. We could always fork the Github repo/Dockerfile and build the container ourselves if that's preferable?
As left-pad
has taught us, even popular dependencies are not reliable.
Forking the dockerfile seems preferable, if license and tooling allow it.
This probably needs to be a bigger conversation though as changing the binaries from being dynamically linked with glibc to statically linked with musl might present some issues for certain customers.
Fair enough. It's just an idea so we don't discard alpine
outright. I'd much rather depend on alpine
than a questionable forked image or something unnecessarily large like ubuntu
.
Hi Gents, there has been some good work on this issue.
I think the one question for me is the final naming for minor version that'll be constantly patched and updated. I thin Andrews point of making quay.io/gravitational/teleport:4.2
is good, it looks like this is something other projects do.
Other projects also offer both ubuntu and alpine images, with -alpine
and -ubuntu
respectfully? and provide customers with the options as some section of the market has more network / segfaults with alpine images.
also https://snyk.io/blog/10-docker-image-security-best-practices/ has a few good extra tips (regardless of the small synk upsell ) such as singing images, and adding metadata labels.
@benarent need one other decision from you: should we also produce immutable patch release tags (e.g. quay.io/gravitational/teleport:4.2.10
)?
Also, by offering multiple base image variants we sign up for more maintenance work. We can handle it, just something to keep in mind.
I would say yes, keep patch releases immutable.
re: variants, I would say let's start with whatever works. We might not need to define it, since it's a smaller image.
On an unrelated note, I noticed other projects adding in different OS/ARCH via a digest. Quay 3 now supports Multiple Architectures: https://www.redhat.com/en/blog/introducing-red-hat-quay-3-registry-your-linux-and-windows-containers, it's low on the request list, but we've had a few people asking for ARM builds, so something to consider.
This is almost done - it will be set in stone once #3793 is merged.
4.2 branch (currently Teleport 4.2.10):
OSS: quay.io/gravitational/teleport:4.2
Enterprise: quay.io/gravitational/teleport-ent:4.2
FIPS: quay.io/gravitational/teleport-ent:4.2-fips
4.1 branch (currently Teleport 4.1.10):
OSS: quay.io/gravitational/teleport:4.1
Enterprise: quay.io/gravitational/teleport-ent:4.1
FIPS: quay.io/gravitational/teleport-ent:4.1-fips
4.0 branch (currently Teleport 4.0.16):
OSS: quay.io/gravitational/teleport:4.0
Enterprise: quay.io/gravitational/teleport-ent:4.0
FIPS: quay.io/gravitational/teleport-ent:4.0-fips
Teleport images often have an old image and due to how caching works in Dockerfiles
apt-get update
is never run. Even though these vulnerabilities don't affect Teleport (they're often in unused system libraries), it's better if the image does not contain them.Look into options to like
--no-cache
orapt-get update && time
to make sure the latest patches are included in our images.