A tool for exploring each layer in a docker image
Unable to analyze the image using dive #498

venkatn087 commented 5 months ago

What happened: I am using RHEL9 free tier from AWS and installed the dive with below steps by referring the github page https://github.com/wagoodman/dive and the version is dive 0.11.0 i have downloaded the Docker image "nginx:latest" by using "docker pull nginx" command then i have executed "dive nginx:latest", getting below error

[ec2-user@ip-172-31-36-53 ~]$ sudo dive nginx Image Source: docker://nginx Fetching image... (this can take a while for large images) cannot fetch image could not find image config [ec2-user@ip-172-31-36-53 ~]$

Could you please let me know how to scan an image using dive?

Anything else we need to know?: I am interested to know how to include the dive into jenkins CI/CD pipeline. like we are using Free style jobs so need to include the configuration in the "execute shell"

Environment: RHEL9

zerok commented 5 months ago

Could it be that this is related to a Docker update that was released last weekend? I also see the same issue on Docker v25.0.0 but haven't yet seen in on v24.0.7.

mark2185 commented 5 months ago

@rajiv-k is this something you could look into?

tbroyer commented 5 months ago

Apparently, the files no longer follow the same naming logic, so this code no longer detects the config file, and this one later fails.

Here's the same image (same Dockerfile) built with Docker 24.0.7 (pulled from our registry last week):

$ tar tf redacted.tar 

and saved from a failed CI build today, with Docker 25.0.0:

$ tar tf xxx.tar 

In the first case, the config file is that 8d179503ce3761ab615a6c26c87cb707d9e622b5862ee09e3ee020895b06d545.json at the root of the archive, in the second, it's the blobs/sha256/a08f2f0d2d5e0ed1f1b74bb62a504ff68cf3ea7f803060a0ca1340109bbc99d0 file. Note how we can no longer tell just from their name which files are JSON and which are TAR files.

This is I believe because:

The docker image save tarball output is now OCI compliant. moby/moby#44598


Borph commented 5 months ago

Same here using dive 0.11.0 as docker image, Docker version 25.0.0, build e758fe5, Ubuntu

mark2185 commented 5 months ago

@tbroyer since you've already dived into the relevant code, are you interested in sending a PR?

tbroyer commented 5 months ago

@mark2185 I started looking at it yesterday, but am really not sure how to approach it: start reading each file in the tar looking for a "magic number"? (knowing that tar and json don't really have such things) and/or trying to load them as tar or json and ignoring errors? (this could imply buffering a big chunk of the file) should this new/fallback approach be limited to entries in blobs/? Also, I haven't written Go in years.

I'll happily review a PR and test it though, and could help brainstorm the best approach.

mark2185 commented 5 months ago

Fair enough, wanted to be sure before taking a crack at it.

I'd wager a good way would be initializing a tar.NewReader with the buffer and invoking Next(), if there's an error (that's not a io.EOF) we're definitely not reading a tar and hopefully we're reading a JSON.

Then we try unmarshalling it to see how well that works out, and if that errors out as well we can safely say what the hell.

tbroyer commented 5 months ago

You have to take gzip into account too, and remember that in many (most?) cases we're reading a tar that's directly streamed from the Docker daemon, and is not seekable. This means adding some buffering; 512 bytes should be enough as that's the size of a tar entry header (according to https://en.wikipedia.org/wiki/Tar_(computing)#File_format), but what if it's gzipped? maybe in this case it can be assumed to not be JSON so the fallback to JSON, and need for seeking back to the start of the entry, is not needed?

That would mean doing something like:

for each tar entry:
  // use the current algorithm based on file names, then do the following:
  else: // handle OCI-compatible Docker images
    buffer = read 512 bytes
    if buffer matches gzip magic number:
      create gzip stream (from a MultiReader on the buffer and remaining of the tar entry)
      create tar reader
      try processing layer, continue to next tar entry on error (ignoring it)
      create a tar reader (from a MultiReader on the buffer and remaining of the tar entry)
      try processing layer
      on error, try processing JSON (from a MultiReader on the buffer and remaining of the tar entry; assuming/hoping the previous tar reader didn't consume past the buffer)
mark2185 commented 5 months ago

@tbroyer if I'm reading the image layout specification correctly, the index.json should have the mediaType indicating which type it is.

Can't check because I don't have such an image, which leads me to my next question - where did you get docker v25? I have v24.0.7 on archlinux and a couple of friends I asked (windows, macOS) don't have v25 either.

tbroyer commented 5 months ago

@tbroyer if I'm reading the image layout specification correctly, the index.json should have the mediaType indicating which type it is.

Yes, but nothing guarantees that you'll see the index.json tar entry before the others, so either you possibly process the tar twice (once to get the index –you could also get the manifest.json instead– and then again looking up each tar entry in the index/manifest to know how to process it …but that means storing the tar to disk temporarily when resolving it from the Docker daemon), or you "content sniff" and parse/process each tar entry (like the pseudo algorithm above) Maybe extracting the index or manifest first would be the best way forward, but that's a major change.

Can't check because I don't have such an image, which leads me to my next question - where did you get docker v25? I have v24.0.7 on archlinux and a couple of friends I asked (windows, macOS) don't have v25 either.

I too am on Arch so still on 24.0.7, our CI is on Ubuntu though, and we get the Docker Engine packages straight from Docker, Inc.: https://docs.docker.com/engine/install/ubuntu/

tbroyer commented 5 months ago

I managed to generate the .data/test-docker-image.tar on our CI server using make generate-test-data (I created an empty .scripts/ so it's not exactly equivalent to the TAR that's checked into the repository) Note that I had to ZIP the TAR for GitHub to accept the upload :man_shrugging: test-docker-image.zip

And I reproduce the issue with it:

$ dive --source docker-archive test-docker-image.tar 
Image Source: docker-archive://test-docker-image.tar
Fetching image... (this can take a while for large images)
cannot fetch image
could not find image config
utamas commented 5 months ago

I'm on ubuntu 22.04 using docker 25.0.0 and run into this problem. Can I help in any way?

saderi commented 5 months ago

I have same problem

$ dive ubuntu:22.04
Image Source: docker://ubuntu:22.04
Fetching image... (this can take a while for large images)
Handler not available locally. Trying to pull 'ubuntu:22.04'...
22.04: Pulling from library/ubuntu
29202e855b20: Pull complete 
Digest: sha256:e6173d4dc55e76b87c4af8db8821b1feae4146dd47341e4d431118c7dd060a74
Status: Downloaded newer image for ubuntu:22.04
cannot fetch image
could not find image config

System info:

Debian GNU/Linux 12
Docker version 25.0.1, build 29cf629
dive 0.11.0
tbroyer commented 5 months ago

I spent a couple hours and managed to get something working (see PR #500)

Tested on the included test-oci-docker-image.tar (same as test-docker-image.tar above) and on a real image built on our CI.

$ tar tf .data/test-oci-docker-image.tar 
$ ./dive_linux_amd64 --ci docker-archive://.data/test-oci-docker-image.tar 
  Using default CI config
Image Source: docker-archive://.data/test-oci-docker-image.tar
Fetching image... (this can take a while for large images)
Analyzing image...
  efficiency: 99.3041 %
  wastedBytes: 50845 bytes (51 kB)
  userWastedPercent: 50.0000 %
Inefficient Files:
Count  Wasted Space  File Path
    2         20 kB  /root/saved.txt
    2         20 kB  /root/example/somefile1.txt
    2         10 kB  /root/example/somefile3.txt
   11           0 B  /etc
  FAIL: highestUserWastedPercent: too many bytes wasted, relative to the user bytes added (%-user-wasted-bytes=0.5 > threshold=0.1)
  SKIP: highestWastedBytes: rule disabled
  PASS: lowestEfficiency
Result:FAIL [Total:3] [Passed:1] [Failed:1] [Warn:0] [Skipped:1]
james-johnston-thumbtack commented 5 months ago

The PR from @tbroyer fixes the problem for me. Until it gets merged and a new dive release made, here's the TL;DR of how I was able to get it compiled and running:

  1. Prerequisite: I already have Go version 1.19.5 installed on my OS X host, so I went with that.
  2. Commands for cloning and building:
    git clone https://github.com/tbroyer/dive
    cd dive
    git checkout docker25compat
    go build -o dive-tool

Run the tool:


There are probably proper Makefile targets I should be using, but I didn't bother digging in to figure it out.  This worked.