Optimizing DFS in Data.Graph

meooow25 commented 1 year ago

dfs is the core function in Data.Graph that all other algorithms (topSort, scc, bcc, etc) are based on. It takes a Graph and generates a [Tree Vertex].

dfs makes use of generate, prune and chop to generate a [Tree Vertex] and chop it into the returned [Tree Vertex].

Other functions further transform this [Tree Vertex], for example topSort flattens the [Tree Vertex] into a [Vertex].

These intermediate trees are unnecessary.

The trees generated by generate are immediately chopped into other trees.
For topSort, the trees returned by dfs are immediately turned into a list.

There is no reason we need to create these trees in memory.

I propose combining the existing generate, prune, and chop function into a new core function that eliminates the intermediate trees from generate and also allows us to choose what to build from the DFS.

dfsGraph :: (Vertex -> a -> b) -> (b -> a -> a) -> a -> Graph -> [Vertex] -> a

Then we can define, for example

dfs = dfsGraph Node (:) []
topSort = dfsGraph <build the list directly>

Benchmarks on a random graph with 10,000 vertices and 100,000 edges:

  dfs:            OK (0.90s)
    3.51 ms ±  79 μs, 57% less than baseline
  topSort:        OK (0.47s)
    3.60 ms ± 107 μs, 59% less than baseline

Other function such as scc and bcc should also benefit, I've not implemented them yet.

Thoughts? I can clean up my code into a PR if this sounds good.

treeowl commented 1 year ago

Sounds great. Please write the most comprehensive comments you can about how everything works.

meooow25 commented 1 year ago

Looking into this further, removing the second set of trees doesn't appear to be very beneficial for topSort.

What I thought was happening: trees from dfs -> difference list -> list What actually happens after optimization: trees from dfs -> list The change I tested: difference list from dfsGraph -> list

So there is an intermediate structure either way. This also explains why topSort doesn't improve more than dfs in the benchmarks. I'm trying to think of a way to construct the list directly (without a dedicated implementation), which I feel should be possible, but I don't have it yet.

So I'll probably send a PR to only remove the first set of trees first, because that would be simpler with clear gains (as seen above), and we can think about the rest later.

meooow25 commented 1 year ago

Now that the obvious optimization is done, I want to explain the situation with the second set of trees because I haven't made much progress there.

We currently have:

dfs :: Graph -> [Vertex] -> [Tree Vertex]
dfs g vs0 = run (bounds g) $ go vs0
  where
    go :: [Vertex] -> SetM s (Forest Vertex)
    go [] = pure []
    go (v:vs) = do
      visited <- contains v
      if visited
      then go vs
      else do
        include v
        as <- go (g!v)
        bs <- go vs
        pure $ Node v as : bs

Which is nice when [Tree Vertex] is exactly what we want.

But suppose we only want the preorder of this list of trees. Currently we create this [Tree Vertex] and flatten it into a list. This [Tree Vertex] is not constructed lazily, since we build it through ST, so we have the entire structure in memory.

Why should we settle for this, when it's possible to build the preorder list directly via dfs?

dfs_preOrder :: Graph -> [Vertex] -> [Vertex]
dfs_preOrder g vs0 = run (bounds g) $ go vs0 (pure [])
  where
    go :: [Vertex] -> SetM s [Vertex] -> SetM s [Vertex]
    go [] acc = acc
    go (v:vs) acc = do
      visited <- contains v
      if visited
      then go vs acc
      else do
        include v
        as <- go (g!v) (go vs acc)
        pure $ v : as

This is also true for reverse postorder, which is the topological sort of a graph.

dfs_revPostOrder :: Graph -> [Vertex] -> [Vertex]
dfs_revPostOrder g vs0 = run (bounds g) $ go vs0 []
  where
    go :: [Vertex] -> [Vertex] -> SetM s [Vertex]
    go [] acc = pure acc
    go (v:vs) acc = do
      visited <- contains v
      if visited
      then go vs acc
      else do
        include v
        acc' <- go (g!v) acc
        go vs (v : acc')

There are likely other useful direct constructions too.

In general, it is possible to construct any result as long as we obey the dfs rules of marking the current vertex as visited, visiting its subtrees, and then visiting the rest.

But so far I've been unable to come up with a general dfs function that can be used for all of these purposes. Any ideas will be appreciated!

treeowl commented 1 year ago

None of those are lazy either. Unlike dfs (whose result can be explored in various orders), dfs_preOrder could be made lazy in a sensible way:

dfs_preOrder :: Graph -> [Vertex] -> [Vertex]
dfs_preOrder g vs0 = run (bounds g) $ go vs0 (pure [])
  where
    go :: [Vertex] -> SetM s [Vertex] -> SetM s [Vertex]
    go [] acc = acc
    go (v:vs) acc = do
      visited <- contains v
      if visited
      then go vs acc
      else do
        include v
        as <- SetM $ unsafeInterleaveST . runSetM (go (g!v) (go vs acc))
        pure $ v : as

I don't know how well that will perform, however.

Your question is still a good one: can we expose tools that will let the user determine the shape of a strict computation with marking? I suspect so. One good first step would be to expose SetM (with a better named API).

meooow25 commented 1 year ago

None of those are lazy either.

That's right, but my concern has been just with avoiding the extra trees. Even if we construct the full output list, it is what was asked for being directly computed. Of course, if we can make it lazy that would be useful. I'm not too familiar with unsafe magic, so I'll have to figure out your code above.

can we expose tools that will let the user determine the shape of a strict computation...

The primary benefit would be internal, for topSort and scc for example, but sure we could also expose it if it's general enough.

haskell / containers

Optimizing DFS in Data.Graph #882