wagoodman / dive

A tool for exploring each layer in a docker image
MIT License
46.2k stars 1.76k forks source link

Question: How do you associate layers and filesystem changes? #273

Open dAnjou opened 4 years ago

dAnjou commented 4 years ago

Hi,

I recently found your tool because I'd like to do some custom image analysis myself and it has already been very useful! There's just one more thing I'd like to confirm with your help.

The Docker archive contains a manifest.json which points to another file which is kind of the config file (also JSON). It contains a rootfs field and a history field. This is what you're using to associate a layer and its filesystem changes, right? Here's an example:

{
  "created": "2020-01-23T23:05:05.035158536Z",
  "architecture": "amd64",
  "os": "linux",
  "config": {
    "Env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "URL=https://danjou.gitlab.io/goup/builds/goup_0.2.2_amd64.tar.gz"
    ],
    "Cmd": [
      "/bin/sh"
    ]
  },
  "rootfs": {
    "type": "layers",
    "diff_ids": [
      "sha256:5216338b40a7b96416b8b9858974bbe4acc3096ee60acbc4dfb1ee02aecceb10",
      "sha256:84d37ff1ac0ca62587a1639d3e9fb0e78bff2e94389b684b9f622e09f50e3a8a",
      "sha256:bdaa31f83b57d4f34e3f8d8008c0713ada4beb12a594f99509b025e73d96011d"
    ]
  },
  "history": [
    {
      "created": "2020-01-18T01:19:37.02673981Z",
      "created_by": "/bin/sh -c #(nop) ADD file:e69d441d729412d24675dcd33e04580885df99981cec43de8c9b24015313ff8e in / "
    },
    {
      "created": "2020-01-18T01:19:37.187497623Z",
      "created_by": "/bin/sh -c #(nop)  CMD [\"/bin/sh\"]",
      "empty_layer": true
    },
    {
      "created": "2020-01-23T23:04:53.473947968Z",
      "created_by": "/bin/sh -c apk add --quiet --no-progress --no-cache bash curl libarchive-tools"
    },
    {
      "created": "2020-01-23T23:04:57.969478831Z",
      "created_by": "/bin/sh -c #(nop) ENV URL=\"https://danjou.gitlab.io/goup/builds/goup_0.2.2_amd64.tar.gz\"",
      "empty_layer": true
    },
    {
      "created": "2020-01-23T23:05:05.035158536Z",
      "created_by": "/bin/sh -c curl -sL ${URL} | bsdtar -x -f - \u0026\u0026 mv /goup_0.2.2_amd64/goup /usr/local/bin"
    }
  ]
}

As you can see there are 3 layer diff IDs but 5 history elements. Are you simply going through them in order while ignoring the empty layers and it all works out nicely?

Thanks in advance :)

wagoodman commented 4 years ago

Indeed! As you've found out, there it not a 1:1 relationship between layer contents and the history, and in short, yes what you described is exactly what I'm doing: Since the Dockerfile format is essentially annotated bash, there are some annotations (like the ENV entry above) that don't create file system changes, thus, no diffid is needed. Also, Docker is doing some investigation before and after running commands to determine if there was a change in the underlying filesystem, again, not assigning a diffid if there are no filesystem changes. In case you're curious, here is where dive is doing this work, which is entirely leveraging the empty_layer field to determine which history entries to "ignore".

Something to note, I've found that history is not a required field in the docker image format... that is, I've run into a few images built by other tools that Docker will run that did not have history entries.

Docker has a write up of what to expect in an image archive documented here, also be sure to checkout this blog post on Docker image IDs and how they have changed overtime.

Shout out if you build something fun!