Closed laeti-tia closed 6 months ago
Replicated by running three builds a few seconds apart:
-rw-rw-r--. 1 mfeit mfeit 120524 Apr 15 17:45 python-pscheduler_5.0.8.orig-1.tar.gz
-rw-rw-r--. 1 mfeit mfeit 120524 Apr 15 17:45 python-pscheduler_5.0.8.orig-2.tar.gz
-rw-rw-r--. 1 mfeit mfeit 120523 Apr 15 17:46 python-pscheduler_5.0.8.orig-3.tar.gz
Analyzing the innards of the tarballs, I found this:
(python-pscheduler_5.0.8.orig-1.tar.gz)
drwxrwxr-x mfeit/mfeit 0 2024-04-15 17:44 python-pscheduler-5.0.8/
drwxrwxr-x mfeit/mfeit 0 2024-04-15 17:43 python-pscheduler-5.0.8/tests/
-rw-rw-r-- mfeit/mfeit 1168 2024-04-15 17:43 python-pscheduler-5.0.8/tests/threadsafe_test.py
(python-pscheduler_5.0.8.orig-2.tar.gz)
drwxrwxr-x mfeit/mfeit 0 2024-04-15 17:45 python-pscheduler-5.0.8/
drwxrwxr-x mfeit/mfeit 0 2024-04-15 17:43 python-pscheduler-5.0.8/tests/
-rw-rw-r-- mfeit/mfeit 1168 2024-04-15 17:43 python-pscheduler-5.0.8/tests/threadsafe_test.py
(python-pscheduler_5.0.8.orig-3.tar.gz)
drwxrwxr-x mfeit/mfeit 0 2024-04-15 17:46 python-pscheduler-5.0.8/
drwxrwxr-x mfeit/mfeit 0 2024-04-15 17:43 python-pscheduler-5.0.8/tests/
-rw-rw-r-- mfeit/mfeit 1168 2024-04-15 17:43 python-pscheduler-5.0.8/tests/threadsafe_test.py
The differences are entirely in the tarball's directory; this should be a solvable problem:
The UID/GID of the files is dependent on who is running it. GNU tar
can force this to a fixed value with the --owner
and --group
switches.
The timestamp of the root directory of the created tarball changes each time a new directory is created because a new directory is created whenever the tarball is built. This can be fixed by copying the original directory and preserving its timestamp.
Git does not restore the timestamps on the files during cloning and has no facility to do so. This will make the tarball's directory reflect the time the repo was cloned rather than when the content changed. GNU tar
can force the mtime to a fixed value with the --mtime
switch. I don't think there's any hazard to setting it to a fixed value.
Guidance on producing idempotent builds: https://reproducible-builds.org/docs/archives/
This commit seems to fix it:
$ ls -al ~/hole/orig-*
-rw-rw-r--. 1 mfeit mfeit 124014 Apr 15 20:06 /home/mfeit/hole/orig-1.tar.gz
-rw-rw-r--. 1 mfeit mfeit 124014 Apr 15 20:06 /home/mfeit/hole/orig-2.tar.gz
-rw-rw-r--. 1 mfeit mfeit 124014 Apr 15 20:06 /home/mfeit/hole/orig-3.tar.gz
$ sha1sum ~/hole/orig-*
0de07d4bed175a1da688a6d0ab1e811b986f40e1 /home/mfeit/hole/orig-1.tar.gz
0de07d4bed175a1da688a6d0ab1e811b986f40e1 /home/mfeit/hole/orig-2.tar.gz
0de07d4bed175a1da688a6d0ab1e811b986f40e1 /home/mfeit/hole/orig-3.tar.gz
$ tar tzvf ~/hole/orig-1.tar.gz
drwxrwxr-x 0/0 0 1970-01-01 00:00 python-pscheduler-5.1.0~b1.1/
-rw-rw-r-- 0/0 10142 1970-01-01 00:00 python-pscheduler-5.1.0~b1.1/LICENSE
-rw-rw-r-- 0/0 460 1970-01-01 00:00 python-pscheduler-5.1.0~b1.1/Makefile
The file list is different now because tar was forced to sort the files by name, but it will be consistent from now on.
It seems #40 is not fully satisfactory. I still see some changes. Here is the output of 3 subsequent builds using
unibuild --release build --start python-pscheduler --stop python-pscheduler
on the same machine, minutes apart.