rust-lang / miri

An interpreter for Rust's mid-level intermediate representation
Apache License 2.0
4.18k stars 323 forks source link

Add global string de-duplication cache #3470

Closed tiif closed 4 days ago

tiif commented 3 months ago

Currently, in localtime_r and also for some cases where we raise panics, we allocate a fresh string each time. These strings never get deallocated so they just keep accumulating. Conceptually these are static strings so we can deduplicate them with a global cache.

The cache should be used everywhere that allocate_str is called with MiriMemoryKind::Machine.

RalfJung commented 3 months ago

(I have updated the issue title and description.)

dev-ardi commented 3 months ago

There exist many string interning crates, are you leaning towards any specific one?

RalfJung commented 3 months ago

None of them are usable I think, since the strings need to be interned in machine memory.

This should be just a FxHashMap<Vec<u8>, Pointer<Option<Provenance>>>.

Cgettys commented 3 weeks ago

Isn't this also part of a more general gap in allocate_str? For example, certain string constants in rustc const evaluation? https://github.com/rust-lang/rust/blob/f944afe3807104803976484e7ee3aff49b6ac070/compiler/rustc_const_eval/src/util/caller_location.rs#L24 "See https://github.com/rust-lang/rust/pull/89920#discussion_r730012398" I'm guessing the performance characteristics required are different, as well as what the cache "value" should be. But thought it was worth mentioning.

Also, though currently I don't think there any current MiriMemoryKind::Machine allocate_str calls where this isn't the case, isn't an additional necessary check for interning these strings that mutability is Mutability::Not? That "seems" obvious to me, but wanted to call it out.

RalfJung commented 3 weeks ago

caller_location strings could also use the cache, yeah. In fact the entire caller_location could be cached, but it's harder to handle that with the same generic infrastructure.

And indeed, only immutable strings should be cached.