matricks / bam

Bam is a fast and flexible build system. Bam uses Lua to describe the build process. It's takes its inspiration for the script files from scons. While scons focuses on being 100% correct when building, bam makes a few sacrifices to acquire fast full and incremental build times.
http://matricks.github.com/bam
Other
146 stars 47 forks source link

unreproducible builds from Collect("*.cpp") #111

Closed bmwiedemann closed 6 years ago

bmwiedemann commented 7 years ago

I was looking into why http://rb.zq1.de/compare.factory-20170519/teeworlds-compare.out shows random ordering of symbols between builds of the teeworld package in openSUSE and found that it uses bam-0.5.0 with a bam.lua that has lines like

        server = Compile(server_settings, Collect("src/engine/server/*.cpp"))
        server_exe = Link(server_settings, "teeworlds_srv", engine, server,
                game_shared, game_server, zlib, server_link_other)

which leads to g++ being called with .o files in a random order.

IMHO, a sort needs to be added somewhere, but my lua is not good enough to do it.

See also https://reproducible-builds.org/ for why this matters.

matricks commented 7 years ago

Interesting. Most likely reason is because listing a directory is "unstable" between machines. Can you try this simple code and see if it produces the same list for the machines?

https://gist.github.com/matricks/af0ec7cdca0dc60654b3fbcce1a45495

matricks commented 7 years ago

I googled a bit as well and found several snippets like this:

"The order in which filenames are read by successive calls to readdir() depends on the filesystem implementation; it is unlikely that the names will be sorted in any fashion." -readdir(3), Linux manpages

So these needs to be sorted some how before they get into the Lua machine, or sort the table after iteration is done. I'll have a look into it.

bmwiedemann commented 7 years ago

See also https://github.com/weidai11/cryptopp/pull/426 and https://github.com/weidai11/cryptopp/commit/82accdc13bfed8fa52e0b7681f5d10e5100740c4 https://github.com/py4n6/pytsk/pull/29 for similar fixes in other pieces of software. So it might be a valid option to require a change in individual packages, but if it can be solved here once, it would be less work overall.

btw: older filesystems like FAT and ext2 tended to return listings in the on-disk order which was usually the order files were added, but they became very slow when you had thousands of files in a single directory. That is why squid, git and others created many sub-directories to store their objects and more modern filesystems like ext4 and NTFS avoided that problem with more sophisticated on-disk structures that will cause more random ordering.