perfsonar / unibuild

A kit for building repositories of packaged software
Apache License 2.0
1 stars 0 forks source link

Subsequent builds of the same release tag produce different orig tarball #34

Closed laeti-tia closed 1 year ago

laeti-tia commented 1 year ago

When doing subsequent builds of the same release tag, the orig tarballs (.org.tar.gz files) produced have a different hash (md5, sha1, sha256). These are then rejected by the Debian repository we maintain because the same file (same version of the same source code) cannot have a different hash.

Example of first run:

$ sha1sum unibuild-repo/pscheduler-tool-sleep_5.0.0~b2.4*
b39ea5a68995fcc2f025310b58ddc8465338ca29  unibuild-repo/pscheduler-tool-sleep_5.0.0~b2.4-1.debian.tar.xz
5876f010ef8f7ab36e1eb5af2486a2031fdcf6c0  unibuild-repo/pscheduler-tool-sleep_5.0.0~b2.4-1_all.deb
91bb8526285d84e993e3fa8ee53f4335814f49b7  unibuild-repo/pscheduler-tool-sleep_5.0.0~b2.4.orig.tar.gz

And subsequent run a few minutes later:

$ sha1sum unibuild-repo-take1/pscheduler-tool-sleep_5.0.0~b2.4*
b39ea5a68995fcc2f025310b58ddc8465338ca29  unibuild-repo-take1/pscheduler-tool-sleep_5.0.0~b2.4-1.debian.tar.xz
5876f010ef8f7ab36e1eb5af2486a2031fdcf6c0  unibuild-repo-take1/pscheduler-tool-sleep_5.0.0~b2.4-1_all.deb
04900d4be430f756933a6aa7f1184f87dcef19aa  unibuild-repo-take1/pscheduler-tool-sleep_5.0.0~b2.4.orig.tar.gz

The debian.tar.xz and .deb files have the same checksum, but the orig.tar.gz doesn't.

mfeit-internet2 commented 1 year ago

The problem line is this one: https://github.com/perfsonar/unibuild/blob/153382161048aee49277716b8a9a90dbc54cea00/unibuild-package/unibuild-package/unibuild-deb.make#L129

The expedient way to solve this problem on Debian would be to use strip-nondeterminism if it supports tarballs.

We should probably look for a way to do this cleanly there and RPM since it would be good to have tarballs that digest the same way in subsequent builds of SRPMs, too.

The Reproducible Builds Project offers this suggestion for tar:

# requires GNU Tar 1.28+
$ tar --sort=name \
      --mtime="@${SOURCE_DATE_EPOCH}" \
      --owner=0 --group=0 --numeric-owner \
      --pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime \
      -cf product.tar build

Any compression should be done by piping the file through the compressor; for gzip, make sure the -n switch is in effect so it doesn't store the original filename.

I do not want anything VCS-specific brought into Unibuild, so trying to pull data out of Git to determine the dates should be right out. We should look into what timestamps Git restores when cloning or pulling and see if there's something sane we can do with one of the files. Worst case, we pin it to the epoch and be done with it.

laeti-tia commented 1 year ago

From what I see, the tar itself is correct, it seems it's the compression afterward that makes it different:

$ sha1sum *.orig.tar
878ac1ab649a417ee68323545546a681ba2a3fcf  pscheduler-tool-sleep_5.0.0~b2.4-last1.orig.tar
878ac1ab649a417ee68323545546a681ba2a3fcf  pscheduler-tool-sleep_5.0.0~b2.4-last2.orig.tar
878ac1ab649a417ee68323545546a681ba2a3fcf  pscheduler-tool-sleep_5.0.0~b2.4-take1.orig.tar
878ac1ab649a417ee68323545546a681ba2a3fcf  pscheduler-tool-sleep_5.0.0~b2.4-take2.orig.tar
$ sha1sum *.orig.tar.gz
40e1e0831b8c263a403a9491875070820bdddfcc  pscheduler-tool-sleep_5.0.0~b2.4-last2.orig.tar.gz
04900d4be430f756933a6aa7f1184f87dcef19aa  pscheduler-tool-sleep_5.0.0~b2.4-take1.orig.tar.gz
9d0fbc05e6157f2ebf98e0ce772c6c5ff811935f  pscheduler-tool-sleep_5.0.0~b2.4-take2.orig.tar.gz

So, is it at L129 or at https://github.com/perfsonar/unibuild/blob/153382161048aee49277716b8a9a90dbc54cea00/unibuild-package/unibuild-package/unibuild-deb.make#L131