Closed Kmeakin closed 1 year ago
Yeah it does clean some stuff up. I imagine it might make things like a language server or interactive binary format editor/explorer harder to implement in the future? We’re a long way from that though, so saddling ourselves with this pain right now might be counterproductive?
How could a global interner make LSP or interactive features harder?
Ahh, I imagine it would mean you were allocating strings in an unbounded fashion without cleaning them up later? Ie. a memory leak? Probably not a massive issue though.
I think whatever the use case: batch compiler, LSP implementation, interactive editor, we would need to keep all strings live, since there is no way of knowing which strings are referenced and which are not
Yeah that makes sense. Come to think of it I think rust-analyzer uses some sort of reference counting for strings? Let’s do this then – I think the improved clarity will be a win.
With a batch compiler with a clear separation of passes, you might be able to all interned strings at some late stage (eg once rust code/LLVM IR has been emmited and you hand off to rustc/llvm). But an LSP has no clear separating of passes. You can always receive new input from the user. Trying to figure out a way of identifying unreferenced strings would probably be a lot of effort for a small reduction in peak memory usage
Yeah that makes sense. Come to think of it I think rust-analyzer uses some sort of reference counting for strings? Let’s do this then – I think the improved clarity will be a win.
Last time I checked they use smol-str
: basically Arc<str>
with a small string optimization. Saves on allocations and clones but no O(1)
comparison
Yeah… we can always switch to that sometime if that’s a better approach.
I think we should make the
StringInterner
global. Usually global variables are seen as a code smell, but I think it can be justified in this case:StringInterner
s in the same process.StringInterner
adds extra complexity and inconvenience for no benefit.StringId
s are an integer index, not a pointer, resolving can fail, but there isn't any good recovery strategy (and logically shouldn't be possible), so will always beunwrap
ed anyway