mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
47.14k stars 9.82k forks source link

Implement reproducibility for the release builds #18258

Closed timvandermeij closed 2 weeks ago

timvandermeij commented 2 weeks ago

The release builds are currently not reproducible because ZIP files record the modification date of files generated during the build process, meaning that two builds from identical source code, made at different times, result in different output.

This is undesirable because it makes detecting differences in the output harder, for instance recently during the Gulp 5 efforts, because the modification date differences are irrelevant and could obscure actually important differences in the output during e.g. code changes. Moreover, reprodicibility of build artifacts has become increasingly important; please refer to the Reproducible Builds initiative at https://reproducible-builds.org (note the "Why does it matter?" section specifically) and https://reproducible-builds.org/docs/timestamps which further explains the problem of timestamps in build artifacts.

This commit fixes the issue by configuring the ZIP file creation to use the (fixed) date of the last Git commit for which the release is being made. With this the build is fully reproducible so that identical source code builds result in bit-by-bit identical output artifacts.

To improve readability we convert the compression method to take a parameter object and use template strings where useful.

timvandermeij commented 2 weeks ago

Before this patch, on the current master branch, we have the following situation:

$ npx gulp publish
<snip>
$ mv build/ build1/
$ npx gulp publish
<snip>
$ mv build/ build2/
$ diff -r build1/ build2/
Binary files build1/pdfjs-4.4.56-dist.zip and build2/pdfjs-4.4.56-dist.zip differ
Binary files build1/pdfjs-4.4.56-legacy-dist.zip and build2/pdfjs-4.4.56-legacy-dist.zip differ
$ echo $?
1
$ sha256sum build1/pdfjs-4.4.56-dist.zip build2/pdfjs-4.4.56-dist.zip 
f95c27b43c4c4c804b946f025e727ee4c5ac6b627b940817b052947f046d556b  build1/pdfjs-4.4.56-dist.zip
2cceb023db8a0cc61c74e9c7ef115afcaf858330e7c1a58ecca6c1367914678b  build2/pdfjs-4.4.56-dist.zip
$ sha256sum build1/pdfjs-4.4.56-legacy-dist.zip build2/pdfjs-4.4.56-legacy-dist.zip 
31831a0e2dd2d9dec477a927976fc0f3c6b5eaa6a628bffc4cb6e88f0fb10f2c  build1/pdfjs-4.4.56-legacy-dist.zip
90b62ed45e29c3f2f73e2e8dd89b676a8d742e239306401feb531cc6de7e49bd  build2/pdfjs-4.4.56-legacy-dist.zip

I have triggered two builds from the same source code, moved the output into separate folders, computed the SHA256 hash of the ZIP files and generated the diff. Note that the SHA256 hashes are different, showing that the ZIP files are not reproducible.

I have repeated this process with this patch applied below. Note that the SHA256 hashes are equal now and the diff is empty:

$ npx gulp publish
<snip>
$ mv build/ build1/
$ npx gulp publish
<snip>
$ mv build/ build2/
$ diff -r build1/ build2/
$ echo $?
0
$ sha256sum build1/pdfjs-4.4.57-dist.zip build2/pdfjs-4.4.57-dist.zip 
5eedfd3b522b6e7b0e10d1a0e7b04bf7e2faf93b48dbf50e0b8ab24b20fe66a1  build1/pdfjs-4.4.57-dist.zip
5eedfd3b522b6e7b0e10d1a0e7b04bf7e2faf93b48dbf50e0b8ab24b20fe66a1  build2/pdfjs-4.4.57-dist.zip
$ sha256sum build1/pdfjs-4.4.57-legacy-dist.zip build2/pdfjs-4.4.57-legacy-dist.zip 
ee3496e7d63bfca5dc045341e74e4220da92b30b8e03fdf97256d305db20cd14  build1/pdfjs-4.4.57-legacy-dist.zip
ee3496e7d63bfca5dc045341e74e4220da92b30b8e03fdf97256d305db20cd14  build2/pdfjs-4.4.57-legacy-dist.zip