pnevyk / gryf

Graph data structure library aspiring to be convenient, versatile, correct and performant.
MIT License
69 stars 1 forks source link

State reuse between multiple runs of the same algorithm #45

Open pnevyk opened 1 year ago

pnevyk commented 1 year ago

Some algorithms in petgraph (toposort, has_path_connecting) allow to pass optional state. This allows reusing allocations that were done in a previous run of the algorithm. If the algorithm is executed repeatedly on the same or just slightly changed graph, reusing the internal state or at least some information can avoid potential reallocs for buffers growing. In general, any information from the past runs should help prepare the internal state such that it is more efficient for the run.

We use the builder pattern for specifying parameters for algorithms. A tempting solution would be to introduce state builder method which would accept &mut State (e.g., ShortestPathsState) and use it instead of a default in the run. A disadvantage is that it introduces a lifetime to the builders' type signature. On the other hand, it is expected that builders have many generic parameters, so it might be worthy.

Another way would be to introduce flavors of run methods (e.g., run_with) that would have an additional parameter for the &mut State.

As for the contents of such state, we often cannot reuse some allocations because they are moved to the result (e.g., dist map in the shortest paths algorithm). But we could at least store the final size, so it can be used for initializing the collections using with_capacity constructor. Other allocations can be actually reused because they are indeed internal state.

We also need to have the state such that it is available for all algorithms for a problem. That is, the state must be as general as to support Dijkstra, Bellman-Ford and BFS (and later others) for shortest path problem, for instance. For Bellman-Ford, there seems to be nothing useful to share between runs. For Dijkstra and BFS, we could share size of maps for distances and predecessors as well as the queue (or at least its size). Using just "metadata" (size) instead of real data (queue) might simplify the type signature, which is probably worth an extra allocation, which is insignificant, because the main goal is to avoid reallocations on large graphs.

For ShortestPaths therefore, there could be this type representing the internal state:

#[derive(Default, Clone)]
pub struct ShortestPathsState {
    dist_len: usize,
    pred_len: usize,
    queue_len: usize,
}

And the usage would then be:

struct MyGraphWrapper {
    graph: Graph<String, u32, Directed>,
    state: ShortestPathsState,
}

impl MyGraphWrapper {
    pub fn new() -> Self {
        Self {
            graph: Graph::new(),
            state: ShortestPathsState::default(),
        }
    }

    // graph manipulating API

    pub fn shortest_path(&mut self, from: VertexIndex, to: VertexIndex) -> u32 {
        ShortestPaths::on(&self.graph)
            .goal(to)
            // Alternative: `&self` + a synchronization primitive over state + cloning and storing back
            .state(&mut self.state)
            .run(from)
            .unwrap()[to]
    }
}

For problems where the algorithms differ significantly and their internal state is non-trivial, it will be probably beneficial to wrap the algorithm-specific state into options:

pub struct State {
    algo1: Option<Algo1State>,
    algo2: Option<Algo2State>,
    shared: usize,
}