Consider using random keys for incr. comp. hashing

michaelwoerister commented 4 weeks ago

There's been recent discussion about the problems of using unkeyed SipHash128 in the compiler and if that could be exploited by an attacker.

With respect to incremental compilation, it would be possible to generate random keys and cache them together with the dep-graph. These keys could then affect query result fingerprints and dep-node identifiers. Any new from-scratch compilation session would generate new keys, so finding stable collisions should be impossible.

The only downside is that it would be hard to reproduce an actual collision if we ever found one because the keys have to be known for that. However, reproducing collisions that are due to faulty HashStable impls (which is the much more likely case) should be reproducible independent of the keys being used.

Mark-Simulacrum commented 4 weeks ago

We should also consider the perf stability - my guess is that this would hurt our reproducibility there. Maybe we can have a -Z or environment variable opt out.

michaelwoerister commented 4 weeks ago

Yes, we'll want to have a way to explicitly set the keys in any case. I imagine that (for example) unstable fingerprint ICE messages would print the keys and how to invoke the compiler for reproducing.

briansmith commented 3 weeks ago

The only downside is [...]

Besides that downside, other downsides:

Every process would need to read the key from disk at least once, which costs at least an extra syscall for the read.
We'd have a secret key. Now we are doing key storage and key management. Usually this ends up being a lot more work than expected. We'd forever be bothered by people filing security bug reports saying we're not protecting the key good enough.
Any future attempt to distribute the work across multiple systems would require distributing the key or pinning shards of the inputs to the same worker systems. The key distribution approach would be complicated because you then have to have a security analysis of the security of sharing the key across the systems. My understanding is that the pinning approach tends be avoided because it results in unfair loading of systems.

michaelwoerister commented 3 weeks ago

Thanks for the comments, @briansmith!

Every process would need to read the key from disk at least once, which costs at least an extra syscall for the read.

I'm certain that's not an issue in practice. The compiler already reads many files for incr. comp.

We'd have a secret key. Now we are doing key storage and key management. Usually this ends up being a lot more work than expected. We'd forever be bothered by people filing security bug reports saying we're not protecting the key good enough.

Yes, I can imagine that that's a problem. Maybe the solution is to not call it a "key"? It's a salt to prevent being able craft a set of two commits that compiled one after the other would reliably resulting a collision.

Any future attempt to distribute the work across multiple systems would require distributing the key or pinning shards of the inputs to the same worker systems.

The compiler's approach to incr. comp. doesn't really support this scenario to begin with. But I would imagine that a build system trying to do something like that would explicitly set a single key for all agents involved.

rust-lang / rust

Consider using random keys for incr. comp. hashing #129272