salsa-rs / salsa

A generic framework for on-demand, incrementalized computation. Inspired by adapton, glimmer, and rustc's query system.
https://salsa-rs.netlify.app/
Apache License 2.0
2.09k stars 142 forks source link

Revise approach to interning #497

Open nikomatsakis opened 2 months ago

nikomatsakis commented 2 months ago

As noted on Zulip, I think we ought to reconsider global interning because of its poor interaction with garbage collection.

davidbarsky commented 2 months ago

(copying the context from Zulip to avoid jumps...)

Lukas:

Since you are working on 3.0 salsa now it might be a good occasion to bring it up, but are there any plans on re-introducing some form of garbage collection for interned values? We are running into problems with that in rust-analyzer where long running sessions slowly leak memory due to interned slots not ever being reclaimed even if the backing values are never being used again.

Micha:

Micha: This is also something that's one of the main open questions for adopting Salsa. Although it isn't limited to interned data. I think a similar problem exists for other data when input data gets invalidated (e.g. a file gets deleted) but dependent queries never get pulled (because the file got deleted...). Salsa then never evicts the derived data (at least that's what I understand). Thinking more about this. I think it is partially solved in Salsa 2022 already. Salsa 2022 supports a new "tracked" type. Queries can create instances of tracked structs and salsa collects tracked structs if rerunning the same query produces fewer or different tracked structs. So what I understand is that FunctionData could be a tracked struct (that salsa interns but also collects when no longer used).

Niko:

I've been thinking about it One thing I've been wondering is how much we really need interned values I believe we can do a lot better in the salsa 3.0 model in general though when it comes to eventually reclaiming data

Niko, later a few days later:

I've been thinking about this I am wondering if interning is the problem :) I have this idea I've been thinking about to add the ability to make an "arena-carrying" type, vaguely rental-like, where you create an arena and can allocate data in it It could also carry interning tables and then we could leverage lifetimes so that when you are comparing things across functions, you have to explicitly "re-intern" the interned value into your new space I'll try to do a proof-of-concept to show what I mean it would mean a bit less centralization but the advantage would be that we can always just collect data aggressively and not worry about ref-counting etc another option I have considered in the past is basically to have "interning" return to you a "representative" Arc, rather like String in Java -- i.e., you can have two equivalent Arc<Foo> types, but if you execute intern, you get back a single Arc<Foo> and can use pointer comparison (i.e., pointer equality means they are the same (duh) but pointer inequality does not mean they are different). This way you can just remove thiings from your intern table without disturbing other aliases that are floating around. But this seems less good than the above, and requires cloning and reference counting.