google / go-containerregistry

Go library and CLIs for working with container registries
Apache License 2.0
3.08k stars 531 forks source link

`mutate.Extract` can fail for images which create a hard-link and later remove it #977

Open vito opened 3 years ago

vito commented 3 years ago

Hi there! My project has code to unpack an OCI image into a flat local rootfs directory, and I just noticed mutate.Extract exists for doing the same thing. I hoped to switch to it, but one of my tests started failing.

The current implementation of mutate.Extract unpacks the layers in reverse-order to avoid creating files which are later removed. I took this same approach initially and ran into the bug I'm raising here: if a target for a hard-link is created in one layer and then removed in a later layer, it fails to create any hard-links to it present in the earlier layers. (Something like that; sorry, it's been a while.)

My fix (https://github.com/concourse/registry-image-resource/commit/ecf520ba9f1bfca4853bd4fbc5a756d0505cc4d6) was to un-optimize it and just unpack in the original order, removing files whenever it sees a whiteout. It was kind of a bummer.

This is low priority for me, just raising the issue in case someone else runs into the same error. This is a bit of an edge case so I'd understand if the performance trade-offs make it not worth fixing, but in my case I had to be compatible with everything; I'm happy to just keep maintaining my code for this. Feel free to just close this if you don't want to fix it.

Here's a Dockerfile that can demonstrate the issue:

FROM alpine
RUN apk --no-cache add git
RUN rm /usr/bin/git
A patch and test to run (in my project) to reproduce the error Apply to [`registry-image` resource](https://github.com/concourse/registry-image-resource/tree/7e65e92c87d7dcc8a83508a0cf771526e38a2299): ```patch diff --git a/commands/in.go b/commands/in.go index b085c42..7fcc553 100644 --- a/commands/in.go +++ b/commands/in.go @@ -8,10 +8,12 @@ import ( "os" "path/filepath" + "github.com/concourse/go-archive/tarfs" resource "github.com/concourse/registry-image-resource" "github.com/fatih/color" "github.com/google/go-containerregistry/pkg/name" v1 "github.com/google/go-containerregistry/pkg/v1" + "github.com/google/go-containerregistry/pkg/v1/mutate" "github.com/google/go-containerregistry/pkg/v1/remote" "github.com/google/go-containerregistry/pkg/v1/tarball" "github.com/sirupsen/logrus" @@ -194,7 +196,7 @@ func ociFormat(dest string, tag name.Tag, image v1.Image) error { } func rootfsFormat(dest string, image v1.Image, debug bool, stderr io.Writer) error { - err := unpackImage(filepath.Join(dest, "rootfs"), image, debug, stderr) + err := tarfs.Extract(mutate.Extract(image), filepath.Join(dest, "rootfs")) if err != nil { return fmt.Errorf("extract image: %w", err) } ``` Test failure: ``` fetching concourse/test-image-removed-hardlinks@sha256:ee6f9ea0fe639ba0f30b3d6085f6c7813660a4de3e84bbec87c32e521d200dc9 ERRO[0000] download failed: save image: write rootfs: extract image: link /tmp/docker-image-in-dir081144662/rootfs/usr/bin/git /tmp/docker-image-in-dir081144662/rootfs/usr/bin/git-receive-pack: no such file or directory • Failure in Spec Setup (JustBeforeEach) [0.827 seconds] In /home/vito/src/registry-image-resource/in_test.go:25 a hardlink that is later removed [JustBeforeEach] /home/vito/src/registry-image-resource/in_test.go:213 works /home/vito/src/registry-image-resource/in_test.go:222 Unexpected error: <*exec.ExitError | 0xc000031060>: { ProcessState: { pid: 22284, status: 256, rusage: { Utime: {Sec: 0, Usec: 104605}, Stime: {Sec: 0, Usec: 58114}, Maxrss: 28472, Ixrss: 0, Idrss: 0, Isrss: 0, Minflt: 1264, Majflt: 0, Nswap: 0, Inblock: 0, Oublock: 640, Msgsnd: 0, Msgrcv: 0, Nsignals: 0, Nvcsw: 1344, Nivcsw: 40, }, }, Stderr: nil, } exit status 1 occurred /home/vito/src/registry-image-resource/in_test.go:74 ```
jonjohnsonjr commented 3 years ago

Interesting coincidence that I was wrestling with hardlinks already this afternoon :)

I'm ashamed to admit that I don't think I understand the problem 100%, but let me try to reason about it openly, and maybe you can correct my misunderstanding:

With symlinks, if you delete the target of the symlink, it's expected for the symlink to point to a non-existent file, so our current behavior works as expected.

With hardlinks, if you delete the target of the hardlink, you would still expect the contents to exist in the archive, because the hardlink still points to them.

Our mutate.Extract code effectively flattens multiple tarballs into a single, equivalent tarball based on the OCI changeset rules. Because we omit any files that have been deleted via whiteout files, we can break hardlinks that point to files that no longer exist.

Fixing this seems challenging? What should we emit for a hardlink entry that points to a deleted file? Change the type to a regular file? What is the expect behavior for a hardlink if the file gets overwritten in a higher layer? What if that file gets deleted?

At first glance, it seems like we could do something like this:

If we see a hardlink, check to see if its Linkname has been deleted. If so, add it to a map[string][]tar.Header of Linkname to hardlink headers, and keep processing. The next time we encounter Linkname, we can emit one of those hardlinks as a regular file and re-write any other hardlinks we encounter(ed) to point to that instead of the file that got deleted.

I think this should allow us to take a very similar approach, with the only cost being some buffered tar headers and bookkeeping. Does that sound right?

github-actions[bot] commented 3 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Keep fresh with the 'lifecycle/frozen' label.

hckuo commented 1 year ago

Is this planned to be fixed? I have seen similar issues.