mstorsjo / llvm-mingw

An LLVM/Clang/LLD based mingw-w64 toolchain
Other
1.75k stars 176 forks source link

Improve tar packaging #394

Closed nolange closed 4 months ago

nolange commented 5 months ago

This introduces the normal practise of not storing personal uid/gid in the archives. tar does a different thing with them whether you unpack as root or another user.

Further the timestamp variations are useless noise, the second commit will normalize timestamps used by compiler/toolchain as well as the tar archive.

mstorsjo commented 4 months ago

Thanks; I think this looks good now. I've amended your branch (changing an instance of checkout@v3 into checkout@v4, and added a commit to release-macos.sh to check whether gtar is available), and I'll merge this.

FWIW, when linking a DLL/EXE, those files get the current time stored in a timestamp field as well. I've implemented using SOURCE_DATE_EPOCH in lld now as well, in https://github.com/llvm/llvm-project/commit/0df8aed6c30f08ded526038a6bbb4daf113a31c1, which also is backported to the next 18.x release branch, so it will be included in the next (pre)release.

For those purposes, it would kinda be nice to have a more stable source of data for SOURCE_DATE_EPOCH than the commit date - e.g. if just tweaking the CI pipeline, by making a new commit, we'd now be affecting the contents of the output files. But anything more specific also becomes quite brittle and fiddly - e.g. if we'd have it be the latest commit date for any file touching build-llvm.sh or the relevant build-libcxx.sh or similar, plus all of wrappers, etc. (IIRC it can be important that the field actually does change, when the file contents change.)

LLD does have another flag, -Brepro on the lld-link level, which replaces the timestamp with a hash (and sets a flag indicating that the timestamp isn't a real timestamp), but that flag isn't exposed on the mingw linker level, and there's no env variable one can set to enable it.

nolange commented 4 months ago

For those purposes, it would kinda be nice to have a more stable source of data for SOURCE_DATE_EPOCH than the commit date

I try to clean up these sources of random noise, but fully reproducible builds are still really hard in general (if you use debinfo you need to build in the same directories, even then it sometimes fails). It's a decent start to take out the time the build job started.

For no "productive" change your best case is ending up with identical build-ids.

mstorsjo commented 4 months ago

For those purposes, it would kinda be nice to have a more stable source of data for SOURCE_DATE_EPOCH than the commit date

I try to clean up these sources of random noise, but fully reproducible builds are still really hard in general (if you use debinfo you need to build in the same directories, even then it sometimes fails). It's a decent start to take out the time the build job started.

Yeah - in this case I haven't tried how it behaves if built separately elsewhere (there's a million other things that affects the output there), but having two consecutive runs of the github actions pipeline produce identical output would at least be nice. Before these patches, there were differences only in the gendef executable (one case of __DATE__, which does get handled by SOURCE_DATE_EPOCH here, but I also took it out altogether upstream, https://github.com/mingw-w64/mingw-w64/commit/202e375b74a20fb13ce02af65a5f168bdba760f1), and the <root>/<arch>-w64-mingw32/bin/*.dll due to timestamps. With the support for SOURCE_DATE_EPOCH in upstream lld, and these changes, that shouldn't vary any longer as long as rerunning the same pipeline from the same commit. But any commit will change COMMIT_DATE_UNIX and affect that.