google / gitiles

A simple browser for Git repositories.
https://gerrit.googlesource.com/gitiles/
Other
578 stars 174 forks source link

use timestamp from commit in tarballs to make archive downloads reproducible #217

Open eighthave opened 3 years ago

eighthave commented 3 years ago

Right now, when downloading from an +archive link, the resulting .tar.gz file is entirely the same except for one trivial difference: the timestamps in the tar file listing. It seems that the current time is used. To see this, do:

$ wget https://android.googlesource.com/platform/frameworks/native/+archive/android-10.0.0_r36.tar.gz
$ mv android-10.0.0_r36.tar.gz first-android-10.0.0_r36.tar.gz
$ wget https://android.googlesource.com/platform/frameworks/native/+archive/android-10.0.0_r36.tar.gz
$ tar tvzf first-android-10.0.0_r36.tar.gz | head -1
-rw-r--r-- 0/0             349 2020-11-22 16:29 .clang-format
$ tar tvzf android-10.0.0_r36.tar.gz | head -1
-rw-r--r-- 0/0             349 2020-11-22 16:35 .clang-format
$ diffoscope  first-android-10.0.0_r36.tar.gz android-10.0.0_r36.tar.gz

Instead, I propose setting the timestamp to the timestamp from the commit that is being archived. So for the android-10.0.0_r36 tag, that would be Sun Jan 12 00:16:24 2020 +0000. With this trivial change, the downloaded tarballs would be the same every time.

eighthave commented 2 years ago

It seems that this issue was fixed in JGit, but is still present in Gitiles:

Yes, there might always be changes in the future that change the compression algorithm. But this issue is referring to something that can be fixed once and for all. And that increases the chances that the SHA-256 will remain the same. And it also makes comparing different versions of the same release tarball much easier, since there would only be a diff in the compressed stuff, not in the contents.

vapier commented 2 years ago

this is a dupe of #84 basically

eighthave commented 2 years ago

IHMO #84 is perhaps broader than this issue, though this issue does fit under #84