openSUSE / obs-service-go_modules

OBS Source Service to download, verify, and vendor Go module dependency sources
GNU General Public License v2.0
19 stars 18 forks source link

vendor.tar.gz archives should be idempotent between service runs #40

Open marcosbc opened 1 year ago

marcosbc commented 1 year ago

We are observing that tarballs generated by obs-service-go_modules are not idempotent, i.e. the archives are not bit-identical after execution even if there are no changes in the file contents, and therefore their checksums differ. For us, having idempotent archives is useful as a re-execution of the source service with identical file contents would avoid the file being stored again in our repositories.

For example, executing this service twice gives different results even if the file contents are the same:

$ rm -f vendor.tar.gz; ./go_modules --outdir $(pwd); md5sum vendor.tar.gz; ./tarsum <vendor.tar.gz >vendor-orig
(...)
e94b934dafe9d375199543e5391f9571  vendor.tar.gz
$ rm -f vendor.tar.gz; ./go_modules --outdir $(pwd); sha256sum vendor.tar.gz; ./tarsum <vendor.tar.gz >vendor-latest
(...)
978de70d571768e27d6b5fb41dfcb971  vendor.tar.gz
# check if the file contents have changed
$ md5sum vendor-orig vendor-latest
ac565e66ded7d78dc409548f42d39a61  vendor-orig
ac565e66ded7d78dc409548f42d39a61  vendor-latest

In this example we were using the tarsum script (you can find it here) to calculate the checksum of each individual files inside the archive. And as you can see, it is identical for both cases so the actual contents of the archive is the same.

Note that in other plugins such as obs-service-node_modules, this does not seem to happen since a re-execution generates bit-identical archives.

jfkw commented 1 year ago

Thanks for surfacing this issue. I had noticed that vendor.tar.gz changes on every run and would like to eliminate that if we have the necessary controls to do so. Considerations to investigate:

diconico07 commented 1 month ago

One way to handle that is to set the files times in the archive to the same as the go.mod (or would it be better to use go.sum here) mtime. More things are needed for gz and obscpio as these format have other variants embedded:

diconico07 commented 3 weeks ago

I created #55 to do this, so it works for at least .tar.gz, .tar.xz and .tar.zst formats. Cpio is out of scope of this PR as it requires things that are not doable with libarchive as of today.