Keep Teleport base images updated.

russjones commented 4 years ago

Teleport images often have an old image and due to how caching works in Dockerfiles apt-get update is never run. Even though these vulnerabilities don't affect Teleport (they're often in unused system libraries), it's better if the image does not contain them.

Look into options to like --no-cache or apt-get update && time to make sure the latest patches are included in our images.

webvictim commented 4 years ago

I looked into this the other day and ran some test builds/scans against images on both gcr.io and quay.io.

We do actually run apt-get -y update && apt-get -y upgrade and the changes do persist, so our images do contain up-to-date system level packages. The reason this issue has been reported is because the vulnerability scanners which run on the images are always using up-to-date data, but our images get fixed at the point in time when a given version of Teleport is released. Therefore, if you scan an image on the day it's released, it likely has zero vulnerabilities. Scan it a month later and it might have a few which have been discovered in the meantime.

The only way for us to "fix" this would be to proactively rebuild and republish Docker images for every single version of Teleport on a regular schedule, say weekly. This doesn't scale very well as it's a big build job to run and will just start to take longer and longer as the number of versions increases. If we want to do this, it's possible - but it would rely on people proactively re-pulling the Docker image they're using. If they're doing that, they might as well update to the latest version anyway, as we ensure compatibility between patch versions of Teleport on the same minor release.

I actually think the best approach here is a three-pronged strategy:

1) We should start proactively rebuilding the Docker image for the most recent Teleport release in each supported branch on a regular basis, say once a day. This will incentivise people to run the latest version as its container will be getting regular security updates while older versions will not. When a new version of Teleport is released, we leave the old image at the latest update as of the new image's release date, then start updating the new image.

2) Encourage people to update their infrastructure more regularly. It's no surprise to anyone that running out-of-date software is more likely to have vulnerabilities. As much as I don't like the idea in principle, maybe we should build a version check into Teleport which emits log messages regularly to notify people that they are running an older version and should update.

3) We should switch to using a two-stage build. The first stage will update all packages and establish the base layer for the image, then the second stage will just add the Teleport binaries from the release package. This will reduce image build times and also reduce the overall image size.

awly commented 4 years ago

We should definitely switch to a two-stage build and base the final image on scratch or distroless or busybox. Current teleport image size is 243MB, which is too much for a static Go binary.

Question: is there a use-case for including debug tools in the image? If yes, which ones?

awly commented 4 years ago

Some more info:

customers are expected to use the image for auth server and proxy, but not nodes
- this likely means no forking, no mounting of anything other than config and key/cert dir
a shell is required, to be able to exec into the container and run tctl
@stevenGravy sometimes execs into the container for debugging (looking at config files, logs, tweaking with vim); we need at least a basic set of tools in the image

The above weeds out scratch and distroless unfortunately. We can still go with busybox! It has basic system tools and vi, but no package manager. It's unlikely to have major CVEs (no image scanner would pick them up anyway).

I'm thinking the busybox:1.31-musl, @webvictim @russjones do you have any preference here? Also, does anyone object to dropping the package manager from the image?

awly commented 4 years ago

@webvictim mentioned that using Alpine in the past was problematic because of musl causing build problems. busybox has ulibc, glibc and musl variants, so we can see what works best.

Also, if there are any debugging stories that required extra tools to be installed within the image, please mention them here.

webvictim commented 4 years ago

The things I use most regularly:

vim to edit Teleport configs
netstat to check listening ports/connections etc
jq for pretty-printing audit logs

awly commented 4 years ago

Thanks @webvictim. Could these tools theoretically be used from the host? I'd like to avoid bundling debug stuff with the image itself if possible - less attack surface and smaller download size.

webvictim commented 4 years ago

Potentially, as long as the data is actually available on the host via volume.

For my use cases I'm mostly running this stuff on demo clusters which use teleport-ent as a base image and then add some build steps anyway, so I can probably just install the tools into those images for demo clusters rather than bloating up the base image. It just becomes considerably harder to install things if there's no package management in the container.

awly commented 4 years ago

Another thought about rebuilding images: we could publish patch releases (e.g. 4.2.1), but also mutable tags for minor versions (e.g. 4.2). The minor version tags would be aliases to the latest patch version.

That way a customer can just point to the minor version, not worrying about compatibility and also getting fresh bug-fixes.

awly commented 4 years ago

The only way for us to "fix" this would be to proactively rebuild and republish Docker images for every single version of Teleport on a regular schedule, say weekly. This doesn't scale very well as it's a big build job to run and will just start to take longer and longer as the number of versions increases.

We should probably wait for the new CI system with cron jobs. And we could choose to only support the latest 2 major releases or so (e.g. 3.x and 4.x).

webvictim commented 4 years ago

I've pretty much implemented this build pipeline via Drone now. Some notes:

We will build and publish images for the current release branch and the two previous releases.
- Currently, this will be Teleport 4.2, 4.1 and 4.0.
- When Teleport 4.3 comes out, we will stop building the 4.0 images. Each image will automatically use the latest non-prerelease version in the branch.
- The intention here is partly to incentivise users to update to newer versions, rather than staying on older Teleport versions.
- If users wish to manually hardcode an older image with a specific version they are very welcome to, but each version-specific image will only be built at the time the binaries are released and not on an ongoing basis. This will generally lead to vulnerability scanners detecting more issues with those images over time.
The pipeline will run to rebuild the images every night.
The base image for the container will be busybox:glibc.
This is using a multi-stage Dockerfile to keep the final image size low.
To cut down on build times and also enforce consistency, we will be installing Teleport inside the image using the tarball archive downloaded from get.gravitational.com.
There are three primary goals here:
- reduce the number of surplus binaries and potential vulnerabilities inside the image
- reduce the size of the Teleport image that customers need to download
- make it easier for customers using Docker or Kubernetes to stay on the latest version without having to implement their own update procedures.

Image naming proposal:

4.2 branch (currently Teleport 4.2.10)

OSS: quay.io/gravitational/teleport:4.2.x Enterprise: quay.io/gravitational/teleport-ent:4.2.x FIPS: quay.io/gravitational/teleport-ent:4.2.x-fips

4.1 branch (currently Teleport 4.1.10)

OSS: quay.io/gravitational/teleport:4.1.x Enterprise: quay.io/gravitational/teleport-ent:4.1.x FIPS: quay.io/gravitational/teleport-ent:4.1.x-fips

4.0 branch (currently Teleport 4.0.16)

OSS: quay.io/gravitational/teleport:4.0.x Enterprise: quay.io/gravitational/teleport-ent:4.0.x FIPS: quay.io/gravitational/teleport-ent:4.0.x-fips

This will allow anyone to set their base image to the .x tag and always be on the latest version of Teleport - or at least get updates however often they choose to pull the image.

There is also the possibility with this pipeline to simultaneously tag and push the built images to a second backup repo in case quay.io is down for a period of time. GCR may be an option here.

awly commented 4 years ago

Can we tag images similar to how busybox (and some others) does it?

quay.io/gravitational/teleport:4.2 as a tag pointing to the latest patch release. This is mutable, users get the latest patch releases at a tiny risk of things suddenly breaking.
quay.io/gravitational/teleport:4.2.5 as a tag pointing to a specific patch release. This is immutable and users know that the image they run will be bit-by-bit identical every time. Paranoid users, can verify the checksum of the image before deploying every time.

webvictim commented 4 years ago

Yes, we could certainly do that. My thinking behind the current scheme was that it might avoid people accidentally getting confused by 4.0 vs 4.0.0, or especially something like 4.0-fips vs 4.0.0-fips.

There’s a couple of other questions then also:

should we offer a:4 tag as a pointer to the latest 4.x.x release?
should we offer a latest tag which always points to the very latest release?
- my understanding is that latest tags are a bit of an antipattern now

webvictim commented 4 years ago

One issue that I'm seeing with the use of busybox:glibc is that Teleport won't run, as the image doesn't seem to provide a full quotient of glibc libraries.

These are the libraries Teleport actually needs - we've never been able to build a fully static binary because of the fact that sqlite uses cgo, which requires glibc, and we also have PAM/BPF code that needs glibc:

$ ldd /usr/local/bin/teleport
    linux-vdso.so.1 (0x00007ffce07a9000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007ff082d89000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ff082d67000)
    libc.so.6 => /lib64/libc.so.6 (0x00007ff082b9e000)
    /lib64/ld-linux-x86-64.so.2 (0x00007ff082dca000)

These are the libraries in the busybox:glibc image:

/ # find / -iname "*.so*"
/lib/libresolv.so.2
/lib/libnss_files.so.2
/lib/ld-linux-x86-64.so.2
/lib/libnss_compat.so.2
/lib/libc.so.6
/lib/libpthread.so.0
/lib/libnss_nisplus.so.2
/lib/libnss_nis.so.2
/lib/libnsl.so.1
/lib/libm.so.6
/lib/libnss_hesiod.so.2
/lib/libnss_dns.so.2

The lack of libdl.so.2 means Teleport won't start:

/ # /usr/local/bin/teleport version
/usr/local/bin/teleport: error while loading shared libraries: libdl.so.2: cannot open shared object file: No such file or directory

I'm changing the base image to frolvlad/alpine-glibc for now which is only 12MB bigger than busybox and works fine.

awly commented 4 years ago

Yes, we could certainly do that. My thinking behind the current scheme was that it might avoid people accidentally getting confused by 4.0 vs 4.0.0, or especially something like 4.0-fips vs 4.0.0-fips.

That's fair. I guess it's a trade-off between potential confusion and not covering some niche use cases.

should we offer a:4 tag as a pointer to the latest 4.x.x release?

No, because we don't guarantee compatibility between (>1) minor versions. Practically, our minor versions are somewhat like major versions.

should we offer a latest tag which always points to the very latest release?

Probably not, prevent customers from shooting themselves in the foot.

I'm changing the base image to frolvlad/alpine-glibc for now which is only 12MB bigger than busybox and works fine.

It seems scary to depend on some individual's dockerhub image for our production builds. Would it be difficult to make teleport work with musl libc instead of glibc?

webvictim commented 4 years ago

covering some niche use cases.

Presumably you're talking about the 4.2 tag - which niche use cases are these? I'm curious. I think I'm going to leave the final naming decision up to @benarent anyway, but it'd be nice to have as much detail as we can get.

It seems scary to depend on some individual's dockerhub image for our production builds.

I agree it's a little scary, but the image has nearly a million pulls and seems to be reasonably well-used. We could always fork the Github repo/Dockerfile and build the container ourselves if that's preferable?

Would it be difficult to make teleport work with musl libc instead of glibc?

I just gave this a try. I can make Teleport build with musl libc inside a modified version of our standard buildbox container and the outputted binaries do then work in the standard busybox image. It took a bit of extra work to get the binaries to compile with PAM and BPF support but I think I've got there.

This probably needs to be a bigger conversation though as changing the binaries from being dynamically linked with glibc to statically linked with musl might present some issues for certain customers.

awly commented 4 years ago

Presumably you're talking about the 4.2 tag - which niche use cases are these?

I mean when customers want to lock in a specific version of teleport and prevent it from changing under their noses. They could also verify the image hash before starting to detect modifications. Although users with such strict requirements would probably vendor images into their own registries anyway.

I agree it's a little scary, but the image has nearly a million pulls and seems to be reasonably well-used. We could always fork the Github repo/Dockerfile and build the container ourselves if that's preferable?

As left-pad has taught us, even popular dependencies are not reliable. Forking the dockerfile seems preferable, if license and tooling allow it.

This probably needs to be a bigger conversation though as changing the binaries from being dynamically linked with glibc to statically linked with musl might present some issues for certain customers.

Fair enough. It's just an idea so we don't discard alpine outright. I'd much rather depend on alpine than a questionable forked image or something unnecessarily large like ubuntu.

benarent commented 4 years ago

Hi Gents, there has been some good work on this issue.

I think the one question for me is the final naming for minor version that'll be constantly patched and updated. I thin Andrews point of making quay.io/gravitational/teleport:4.2 is good, it looks like this is something other projects do.

Other projects also offer both ubuntu and alpine images, with -alpine and -ubuntu respectfully? and provide customers with the options as some section of the market has more network / segfaults with alpine images.

also https://snyk.io/blog/10-docker-image-security-best-practices/ has a few good extra tips (regardless of the small synk upsell ) such as singing images, and adding metadata labels.

awly commented 4 years ago

@benarent need one other decision from you: should we also produce immutable patch release tags (e.g. quay.io/gravitational/teleport:4.2.10)?

Also, by offering multiple base image variants we sign up for more maintenance work. We can handle it, just something to keep in mind.

benarent commented 4 years ago

I would say yes, keep patch releases immutable.

re: variants, I would say let's start with whatever works. We might not need to define it, since it's a smaller image.

On an unrelated note, I noticed other projects adding in different OS/ARCH via a digest. Quay 3 now supports Multiple Architectures: https://www.redhat.com/en/blog/introducing-red-hat-quay-3-registry-your-linux-and-windows-containers, it's low on the request list, but we've had a few people asking for ARM builds, so something to consider.

webvictim commented 4 years ago

This is almost done - it will be set in stone once #3793 is merged.

4.2 branch (currently Teleport 4.2.10): OSS: quay.io/gravitational/teleport:4.2 Enterprise: quay.io/gravitational/teleport-ent:4.2 FIPS: quay.io/gravitational/teleport-ent:4.2-fips

4.1 branch (currently Teleport 4.1.10): OSS: quay.io/gravitational/teleport:4.1 Enterprise: quay.io/gravitational/teleport-ent:4.1 FIPS: quay.io/gravitational/teleport-ent:4.1-fips

4.0 branch (currently Teleport 4.0.16): OSS: quay.io/gravitational/teleport:4.0 Enterprise: quay.io/gravitational/teleport-ent:4.0 FIPS: quay.io/gravitational/teleport-ent:4.0-fips

gravitational / teleport