rmohr / bazeldnf

Build multi-arch base containers based on RPM with bazel.
Apache License 2.0
30 stars 18 forks source link

How to deal with duplicate files? #47

Closed malt3 closed 1 year ago

malt3 commented 1 year ago

Duplicate files in rpmtree tar files can be quite problematic when exporting them as container image layers! This is what happens when I try to load a container image created from an rpmtree layer that contains duplicate files:

docker load --input /path/to/docker/image.tar
31ff0fbf1732: Loading layer [=================================>                 ]  607.7MB/916.2MB
Error processing tar file(duplicates of file paths not supported):

When looking at the image layer and searching for duplicate entries, I find these:

tar -tf layer.tar | sort | uniq -d
./usr/share/licenses/systemd/LICENSE.LGPL2.1
./usr/share/man/man3/nfs4_uid_to_name.3.gz

In my case they come from

Can I somehow filter for duplicates or remove one duplicate manually? I believe that this may be an issue with the rpms that I should report upstream (not a redhat engineer, but having two packages that may be installed together ship the same files sounds like a bug). But this is something that can always happen whenever you create large trees.

EDIT: upstream bzs:

Feedback from the bz:

Sorry, but that will just need to be fixed in those tools. RPMs are allowed to contain the same paths, as long as the contents are the same. [...] There are countless rpms out there that do this.

rmohr commented 1 year ago

Hm, it may make sense to add a way to pick one of those and by default just take the first one and do not add the second one to the tar archive (but write a warning).

rmohr commented 1 year ago

Adding a way to pick would just be advanced, just picking one to unblock you is fine by me if you can create a PR.

malt3 commented 1 year ago

Thanks for taking PRs for this. I'll try to understand the current implementation and create a fix.

rmohr commented 1 year ago

In pkg/rpm/cpio2tar.go the Tar function should be the place where to track if a file was already added and skip it if necessary.