haskell / containers

Assorted concrete container types
https://hackage.haskell.org/package/containers
315 stars 178 forks source link

Optimize scc as hard as possible #929

Open treeowl opened 1 year ago

treeowl commented 1 year ago

I don't know what can be done here, but GHC uses it, and that makes it a top Data.Graph priority.

jwaldmann commented 1 year ago

The implementation is literally (?) from (e.g.,) Cormen, Leiserson, Rivest: Introduction to Algorithms, Section 23.5 (ninth edition), and O(vertices + edges) is optimal. So this is about the hidden constant factor?

Is there reason to believe that this is critical for GHC? (does scc show up in profiles?)

Does GHC really use Data.Graph? E.g., https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Data/Graph/Directed.hs seems like a copy/re-implementation. But, yes:

import qualified Data.Graph as G
...
scc :: IntGraph -> [SCC Vertex]
scc graph = map decode forest
  where
    forest = {-# SCC "Digraph.scc" #-} G.scc graph
    ...

So, let's look at these profiles...

treeowl commented 1 year ago

Good point.

meooow25 commented 1 year ago

Yes it would be good to know how much time GHC spends in scc.

If we want to improve it we could implement and see if Tarjan's or path-based algorithm (from Wikipedia: https://en.wikipedia.org/wiki/Strongly_connected_component#Algorithms) have a lower constant factor.