Open ceedubs opened 8 months ago
Yeah this is a bummer. I was surprised to be reminded that git reflog doesn't include timestamps.
I did a basic sqlite test (create a table and add two rows), and that did produce identical results in two trials.
It would be nice to know if anyone is using reflog timestamps. They seem nice, but I'm not sure I've used them. They also are a culprit in some nondeterministic transcript outputs, which cause CI to fail.
@aryairani is it really just reflog timestamps? I assumed that if I did a pull
or clone
it would fetch a bunch of stuff in parallel which would result in different orders of rows in my SQLite tables.
@ceedubs I'm not sure about the parallel fetches, I would guess that you're right.
I think that fetching stuff in parallel may not be that useful though and we might consider turning the number of concurrent fetches to 1 or something, which then should help.
Side note, I just talked to @rlmark who definitely uses the reflog timestamps.
Tools like Bazel and Nix ensure reproducible builds by constraining IO at build time. One way that Nix enforces this (I assume Bazel too?) is by only allowing builds to perform network activity if the result has a fixed output hash. Unfortunately, a pull from Share does not result in a file with a fixed hash. I suspect that two culprits are timestamps (like in the reflog) and fetches happening in parallel, but for all I know it could be that SQLite is just completely incompatible with deterministic file hashes (unlike a git codebase).
So far in Nix builds I have gotten around this by only saving the result of
compile
and not the whole codebase. But this isn't an ideal solution for a couple of reasons:Some notes on the properties I care about:
Related (but more helpful for Docker than Bazel/Nix): #3892
Side note: it seems a bit ironic that this is hard in Unison, a language premised on code being content-addressed, when it comes for free(ish) in just about any language that uses text files and traditional source control 😬.