Explain all the stable hashing shenanigans

rust-lang / rustc-dev-guide

A guide to how rustc works and how to contribute to it.

https://rustc-dev-guide.rust-lang.org

Apache License 2.0

1.67k stars 514 forks source link

Explain all the stable hashing shenanigans #203

Open RalfJung opened 6 years ago

RalfJung commented 6 years ago

So rustc is full of these impl_stable_hash_for. What are these for? I originally thought that would be for FxHashMap, but that seems to be wrong (I still need to derive(Hash) to use FxHashMap). So now I am just confused. It would be great if the guide could explain that.

@eddyb said "incremental" but that on its own does not explain much of anything -- why is Hash not good enough? Why do I need a tcx to compute a "stable hash"?

RalfJung commented 6 years ago

I was told that @michaelwoerister knows all about this? :D

eddyb commented 6 years ago

Our Hash impls hash pointers, IDs, etc. - they're designed for efficiency, not stability.

Stability here is across compilations, and it means the hash depends on semantic data, not transient representation. tcx is needed to e.g. convert an ID into its "stable" representation / get a cached hash.

RalfJung commented 6 years ago

What kind of "ID" are you referring to?

RalfJung commented 6 years ago

Okay so "stable" here means "guaranteed not to change between rustc invocations". We must not hash pointers, for example. Good to know.

michaelwoerister commented 6 years ago

"Stable" here means stable across compilation sessions and crate boundaries. For example, if you Hash a Ty you get a different value in two different compiler processes (because you are actually hashing a pointer to an interned data structure). If you StableHash it, the hash value will be the same for different invocations of the compiler, and it will also be the same, independently of whether the type was defined in the current crate being compiled or if it was loaded from an upstream crate.

This is used for telling if something has changed in between to sessions (for incr. comp.) without actually having to have the value stored somewhere. Another example is the hash value at the end of every Rust symbol. This also needs be stable across sessions and crate boundaries.

RalfJung commented 6 years ago

@michaelwoerister thanks, that helps! Why does this kind of stability require access to a "context" (StableHashingContext) though?

eddyb commented 6 years ago

@RalfJung To cache some kinds of more expensive hashes and to look up IDs (NodeId, DefId, etc.), as you can't hash the numerical value of the ID, but rather the "definition" that it refers to.

michaelwoerister commented 6 years ago

It's not just caching. NodeId, DefId, Span, etc are not stable things. The context provides the data needed for mapping them into a stable format. For example mapping Span from a u32 to file:line:col.

eddyb commented 6 years ago

That's what I meant by "looking up IDs".

michaelwoerister commented 6 years ago

Right, I wasn't reading your answer properly :)