Open stuhood opened 5 years ago
One place in which we could cheaply start to do more garbage collection would be to convert the interning Key
s that the engine currently uses to a WeakRef map of some sort. Currently, Get
inputs are held forever: see https://github.com/pantsbuild/pants/blob/5aebe766778917b4dca3f5fdc82f22049805103f/src/rust/engine/src/interning.rs#L11-L43
To accomplish garbage collection of Node
outputs/values (but not of Node
s themselves, which effectively act as their own keys), we could probably:
roots_by_age: HashMap<N, std::time::Instant>,
./cargo check -p graph
)previous_result
. Currently the method name is a bit of a misnomer: it forces a Node to be recomputed, but still keeps the previous value in order to try and compute a generation value. But in this case, we want to free the memory and not worry about its dependents needing to re-run.InnerGraph
that will walk the graph from "relevant" roots, and will then call Entry::clear
on nodes which weren't reachable during the walk.
graph
crate isn't aware of Node
sizes, so that will probably need to be a followup. But one approach might be to only consider roots_by_age
which are newer than some window and/or account for some minimum number of roots that we want to keep../cargo test -p graph
to check that it passes../cargo test -p graph
.One place in which we could cheaply start to do more garbage collection would be to convert the interning
Key
s that the engine currently uses to a WeakRef map of some sort. Currently,Get
inputs are held forever: see https://github.com/pantsbuild/pants/blob/5aebe766778917b4dca3f5fdc82f22049805103f/src/rust/engine/src/interning.rs#L11-L43
Is it worth going down this route? From my naive point of view it looks like you could do this as long as you could create a weakref to the object and implement a remove key callback on the interns struct.
- Add a collection of roots with the time when they were requested to InnerGraph:
roots_by_age: HashMap<N, std::time::Instant>,
- (confirm that your changes compile by running
./pants check -p graph
)- Adjust Graph::create and Graph::poll to record when roots were requested.
Okay Rust newb question: it looks like I can't do HashMap<N, Instant>
in there because that leads to two fields in the struct owning the same data. And I don't think structs can reference self-owned data? So IIUC that means we need to just expand the value of the original nodes
map to HashMap<N, (EntryId, Interval)>
.
Also is there are particular reason you're suggesting age as the discriminant? Simplicity of implementation? Seems like access time might be a better discriminant long term, which could turn this into something resembling an LRU cache. But I guess that could be a follow-up
Okay Rust newb question: it looks like I can't do
HashMap<N, Instant>
in there because that leads to two fields in the struct owning the same data. And I don't think structs can reference self-owned data? So IIUC that means we need to just expand the value of the originalnodes
map toHashMap<N, (EntryId, Interval)>
.
When you run into cases like this early on, the answer will be to use Clone
: i.e. let node2 = node.clone()
. Types which are relatively cheaply copyable generally implement Clone
(if they are very cheaply/simply copyable they implement Copy
, which allows them to be copied automatically).
Also is there are particular reason you're suggesting age as the discriminant? Simplicity of implementation? Seems like access time might be a better discriminant long term, which could turn this into something resembling an LRU cache. But I guess that could be a follow-up
The idea behind using a HashMap
was for it actually to be access time: when you overwrite an entry in the hashmap, it gets a newer access time.
Commented on https://github.com/pantsbuild/pants/issues/14676#issuecomment-1741446693: actually allowing Key
s to be garbage collected would require that we fully delete Node
s in the graph
crate: described there.
The v2
Graph
(implemented in rust) does not implement garbage collection, although it is definitely feasible.As we fix other issues and
pantsd
instances are able to stay up longer and longer, we should consider implementing garbage collection based on walking from recently requested roots in theSession
.