Data.Graph.bcc is not efficient - Githubissues

haskell / containers

Assorted concrete container types

https://hackage.haskell.org/package/containers

315 stars 177 forks source link

Data.Graph.bcc is not efficient #900

Open meooow25 opened 1 year ago

meooow25 commented 1 year ago

The collect function recursively concats on a tree, making the time complexity quadratic. This can be avoided using difference lists.
The do_label step builds some unnecessary intermediate trees, similar to the dfs we had (#882).

I'm not aware of common use cases for bcc, so I'm not sure if this affects anyone. But as long we have it, we should make it efficient.

I can send a PR.

treeowl commented 1 year ago

Everything should be efficient if it can be, yes! Thanks.

treeowl commented 1 year ago

I can't make head or tail of the current algorithm. Is it explained in the paper? Please comment your version liberally; this doesn't seem likely to be obvious.

meooow25 commented 1 year ago

Agreed that the code is not descriptive at all. It is explained the paper though. I'll make sure to add comments to make things a bit clearer 👍

treeowl commented 1 year ago

The new bcc code still uses forest twice, successively (i.e., not interleaved). This strikes me as rather bad. King and Launchbury pushed for lazy dfs; Tarjan's paper does ... something else. I think the question you raise is a good one: what can we do to help manage complexity without realizing too much forest? Lazy dfs can help in some cases, but for bcc we have to make sure the lazily produced depth-first forest isn't shared with what's used to build an array.

meooow25 commented 1 year ago

I'm not sure I follow your comment. We do not have lazy dfs, so we get the full forest when we run dff. If we traverse it twice there is no extra memory cost. But if the question is whether we can avoid the time cost of traversing it twice, then we can think about it. It's possible to traverse it once but we need arbitrary dnum lookups when collecting, which means mutable arrays to keep the same complexity. If we go one step further and combine the dfs with collecting the result, that makes it very close to Tarjan's algorithm.

treeowl commented 1 year ago

We can achieve semi-lazy dfs (lazy in preorder) using lazy ST.

treeowl commented 1 year ago

Here's another thought for a place to start: can we get rid of the dnum array? Everything's a bit tangled, but it looks to me like we're almost certainly traversing the tree in depth-first order. So instead of checking an array to figure out where we are, we can keep track as we go. Something like

bicomps :: Int -> Tree Vertex -> (Int, Forest [Vertex])
collect :: Int -> Tree Vertex -> (Int, Int, ...)

Each function takes the current preorder index and returns the new one.

This is obviously ... unpleasant. That brings us back to the general DFS abstraction question.