pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.27k stars 626 forks source link

Garbage collect the v2 engine Graph #7675

Open stuhood opened 5 years ago

stuhood commented 5 years ago

The v2 Graph (implemented in rust) does not implement garbage collection, although it is definitely feasible.

As we fix other issues and pantsd instances are able to stay up longer and longer, we should consider implementing garbage collection based on walking from recently requested roots in the Session.

stuhood commented 1 year ago

One place in which we could cheaply start to do more garbage collection would be to convert the interning Keys that the engine currently uses to a WeakRef map of some sort. Currently, Get inputs are held forever: see https://github.com/pantsbuild/pants/blob/5aebe766778917b4dca3f5fdc82f22049805103f/src/rust/engine/src/interning.rs#L11-L43

stuhood commented 1 year ago

To accomplish garbage collection of Node outputs/values (but not of Nodes themselves, which effectively act as their own keys), we could probably:

  1. Add a collection of roots with the time when they were requested to InnerGraph:
     roots_by_age: HashMap<N, std::time::Instant>,
  2. (confirm that your changes compile by running ./cargo check -p graph)
  3. Adjust Graph::create and Graph::poll to record when roots were requested.
  4. Adjust Entry::clear to optionally discard the previous_result. Currently the method name is a bit of a misnomer: it forces a Node to be recomputed, but still keeps the previous value in order to try and compute a generation value. But in this case, we want to free the memory and not worry about its dependents needing to re-run.
  5. Add a method to InnerGraph that will walk the graph from "relevant" roots, and will then call Entry::clear on nodes which weren't reachable during the walk.
    • How to define "relevant" is unclear: right now the graph crate isn't aware of Node sizes, so that will probably need to be a followup. But one approach might be to only consider roots_by_age which are newer than some window and/or account for some minimum number of roots that we want to keep.
  6. Add a test that calls your collection method in tests.rs: probably have the method return a value that indicates how many nodes were cleared (for the purposes of the test). Use ./cargo test -p graph to check that it passes.
  7. Add "periodic" calls to your collection method in cycle_check_task (and rename it to something more generic to maintenance).
    • "Periodic" needs defining. But depending how efficiently the method runs when there is nothing to do, you could run it more or less frequently.
  8. Confirm that the tests still pass by running ./cargo test -p graph.
jriddy commented 1 year ago

One place in which we could cheaply start to do more garbage collection would be to convert the interning Keys that the engine currently uses to a WeakRef map of some sort. Currently, Get inputs are held forever: see https://github.com/pantsbuild/pants/blob/5aebe766778917b4dca3f5fdc82f22049805103f/src/rust/engine/src/interning.rs#L11-L43

Is it worth going down this route? From my naive point of view it looks like you could do this as long as you could create a weakref to the object and implement a remove key callback on the interns struct.

jriddy commented 1 year ago
  • Add a collection of roots with the time when they were requested to InnerGraph:
    roots_by_age: HashMap<N, std::time::Instant>,
  • (confirm that your changes compile by running ./pants check -p graph)
  • Adjust Graph::create and Graph::poll to record when roots were requested.

Okay Rust newb question: it looks like I can't do HashMap<N, Instant> in there because that leads to two fields in the struct owning the same data. And I don't think structs can reference self-owned data? So IIUC that means we need to just expand the value of the original nodes map to HashMap<N, (EntryId, Interval)>.

Also is there are particular reason you're suggesting age as the discriminant? Simplicity of implementation? Seems like access time might be a better discriminant long term, which could turn this into something resembling an LRU cache. But I guess that could be a follow-up

stuhood commented 1 year ago

Okay Rust newb question: it looks like I can't do HashMap<N, Instant> in there because that leads to two fields in the struct owning the same data. And I don't think structs can reference self-owned data? So IIUC that means we need to just expand the value of the original nodes map to HashMap<N, (EntryId, Interval)>.

When you run into cases like this early on, the answer will be to use Clone: i.e. let node2 = node.clone(). Types which are relatively cheaply copyable generally implement Clone (if they are very cheaply/simply copyable they implement Copy, which allows them to be copied automatically).

Also is there are particular reason you're suggesting age as the discriminant? Simplicity of implementation? Seems like access time might be a better discriminant long term, which could turn this into something resembling an LRU cache. But I guess that could be a follow-up

The idea behind using a HashMap was for it actually to be access time: when you overwrite an entry in the hashmap, it gets a newer access time.

stuhood commented 11 months ago

Commented on https://github.com/pantsbuild/pants/issues/14676#issuecomment-1741446693: actually allowing Keys to be garbage collected would require that we fully delete Nodes in the graph crate: described there.