snowleopard / alga

Algebraic graphs
MIT License
715 stars 68 forks source link

Explore alternative implementations for AdjacencyMap #141

Open snowleopard opened 5 years ago

snowleopard commented 5 years ago

We currently store vertices of type a directly in an AdjacencyMap a, which may lead to poor performance if the corresponding Ord a instance is slow. One example, which motivated me to record this issue, is the SCC algorithm developed in #128 that produces AdjacencyMap (NonEmpty.AdjacencyMap a), i.e. graphs whose vertices are graphs (and graphs are expensive to compare, see #126).

An alternative would be to define something like

data AdjacencyMap a = AdjacencyMap
    { graph :: AdjacencyIntMap
    , value :: Int -> a
    , index :: a -> Maybe Int }

This is similar to Data.Graph.graphFromEdges:

http://hackage.haskell.org/package/containers-0.6.0.1/docs/Data-Graph.html#v:graphFromEdges

In this way, we can benefit from the high performance of the underlying AdjacencyIntMap yet provide a typed high-level interface AdjacencyMap a to the user.

michaelpj commented 5 years ago

Perhaps more generally, it would be nice to be able to provide a bijective map between the "real" node type and a "key" type which is cheaper to store, or has a cheaper (or existent) Ord instance.

michaelpj commented 5 years ago

Note that this can be a blocker for moving from Data.Graph, which allows the alternative key type. I think I just rediscovered why past-me made that comment ;)

Avasil commented 5 years ago

I will create a prototype

Avasil commented 5 years ago

@snowleopard I finally found some time to start working on this and I'm wondering about operations that combine graphs, e.g. overlay.

If we define value :: Int -> a and index :: a -> Maybe Int on graph itself, there could be two graphs with different mappings, right?

I guess we could combine index functions but I don't know what to do with value which seems to be total.

This is of course if I'm understanding the issue correctly which is to check if this representation of AdjacencyMap could improve anything and if so, rewrite API to forward to AdjacencyIntMap

snowleopard commented 5 years ago

there could be two graphs with different mappings, right?

Yes.

I guess we could combine index functions but I don't know what to do with value which seems to be total.

Actually, I'm not entirely sure my draft above is correct. What we really need to represent here is an isomorphism between vertices a that appear in the graph and their integer counterparts Int. Do you have an idea of how we could do this both efficiently and in a compositional way to permit operations like overlay?

This is of course if I'm understanding the issue correctly which is to check if this representation of AdjacencyMap could improve anything and if so, rewrite API to forward to AdjacencyIntMap

That's right. We know that AdjacencyIntMap is fast and we'd like to benefit from its efficiency when working with more complex vertex types a.

snowleopard commented 5 years ago

That's right. We know that AdjacencyIntMap is fast and we'd like to benefit from its efficiency when working with more complex vertex types a.

Re-reading what I wrote, I think this goal sounds a bit naive: if it were possible, then we could similarly implement Map k a on top of IntMap a :-)

Avasil commented 5 years ago

Do you have an idea of how we could do this both efficiently and in a compositional way to permit operations like overlay?

No clue but I will try to figure something out.

Do you have anything else that I could help with in the meantime?

michaelpj commented 5 years ago

Brainstorming a few ideas:

snowleopard commented 5 years ago

@michaelpj Thanks for the ideas! Most likely, we don't really need to make operations like overlay and connect really fast, because constructing large adjacency maps using these operations is going to be terribly slow anyway. I think it's worth keeping a "dense" mapping to Ints so that we don't run out of them in practice and could rely on IntSets and IntMaps internally.

Do you have anything else that I could help with in the meantime?

@Avasil I think it would be great to prototype the most straightforward implementation of this idea that you can come up with and see how well it performs in benchmarks.

(As a completely separate issue, you might look into #90 which came up in your recent PR #222.)

michaelpj commented 5 years ago

Yes, I agree that querying performance is usually much more relevant. So the relabeling approach might be fine in practice.

Avasil commented 5 years ago

@snowleopard I will try validating @michaelpj ideas first, maybe it can lead us somewhere. :D

  • Redo the indices for one side:

    • For each mapping therein:

    • If it coincides with the mapping on the other side, keep it.

    • If it maps to an index not used on the other side, keep it.

    • Othewise map to a fresh integer.

    • Complexity seems likely terrible.

It's not clear to me how to modify a value :: Int -> a function. My first attempt was going through vertices of one of the graphs (let's say G2), applying value of both graphs and checking if the result is different. If it is different, I call index G1 and see if it's Nothing. If it's not, I don't know how to find new integer. Should I go through all vertices of G1 and choose different one?

It feels like each overlay and connect would add extra "layer" on top of these functions which would be expensive anytime we call them, even during queries. Hopefully I'm misunderstanding or not familiar with a technique to achieve this in different way

snowleopard commented 5 years ago

@Avasil Why don't you start with something naive but very simple: just renumber everything (always keeping the indices starting at 0) whenever you do something like overlay or connect. Then find the resulting complexity and see where we can go from there. It's always best to start with something that works and then iterate & optimise!

Avasil commented 5 years ago

@snowleopard I managed to implement and benchmark few functions.

I plan to add tests now so I can get rid of bugs and then keep trying to optimize. Though any suggestions would be very helpful, currently it's very slow

I changed representation a bit:

data AdjacencyMapPoc a = AM
    { graph :: AIM.AdjacencyIntMap
    , valueMap :: IntMap a
    , indexMap :: Map a Int } deriving (Generic)

value :: AdjacencyMapPoc a -> Int -> Maybe a
value g i = IntMap.lookup i (valueMap g)

index :: Ord a => AdjacencyMapPoc a -> a -> Maybe Int
index g a = Map.lookup a (indexMap g)

I didn't know how to create value :: Int -> a so I was using value :: Int -> Maybe a. For overlay / connect I keep one side and then fold vertices of the other side and map them to new indices which I add to the old mappings. I don't know how to update functions without Map

Comparison to AdjacencyMap: (with a = Int)

addEdge: 26.16 (bad)
addVertex: 17.67 (bad)
creation: 3.28 (bad)
edgeCount: 2.93 (bad)
edgeList: 4.59 (bad)
hasEdge: 1.48 (bad)
hasVertex: 1.27 (bad)
isEmpty: 1.08 (OK)
vertexCount: 33.87 (bad)
vertexList: 3.57 (bad)

Here's my branch https://github.com/Avasil/alga/blob/adjMap-alternative-impl/src/Algebra/Graph/AdjacencyMapPoc.hs Although the code is very messy right now :D

Avasil commented 5 years ago

I added implementations for removeVertex, removeEdge and improved hasVertex:

hasVertex: 1.04 (OK)
removeEdge: 0.80 (good)
removeVertex: 0.55 (good)

I don't really understand why removing is so fast and vertexCount so terribly slow.

BTW which functions are the most interesting to us in this experiment? I could focus on implementing those next

snowleopard commented 5 years ago

@Avasil Thanks for the experiment! I don't have time to look at your implementation in detail at the moment, but it's indeed strange that vertexCount is so slow, e.g. there is a 10x difference compared to vertexList which should be at least as hard! It would be nice to bring all overheads down to 2-3x.

BTW which functions are the most interesting to us in this experiment?

I think your current selection is great. Perhaps you could also add preSet and postSet, since they are often needed in algorithms.

Avasil commented 5 years ago

I think the issue with vertexCount was caused by a bug I've had in overlay, latest results are:

addEdge: 27.98 (bad)
addVertex: 20.13 (bad)
creation: 3.36 (bad)
edgeCount: 3.31 (bad)
edgeList: 4.51 (bad)
equality: 14.01 (bad)
hasEdge: 6.67 (bad)
hasVertex: 0.98 (OK)
isEmpty: 1.04 (OK)
removeEdge: 0.91 (OK)
removeVertex: 0.61 (good)
vertexCount: 1.04 (OK)
vertexList: 0.93 (OK)

equality is quite bad but I started with simple implementation:

instance Ord a => Eq (AdjacencyMapPoc a) where
    (==) x y = and
        [ (==) (vertexCount x) (vertexCount  y)
        , (==) (vertexSet   x) (vertexSet    y)
        , (==) (edgeCount   x) (edgeCount    y)
        , (==) (edgeSet     x) (edgeSet      y) ]

which doesn't leverage AdjacencyIntMap so it's expected.

I also tried to compare underlying AdjacencyIntMap but the problem is that I need to convert values to the same mappings. My first try (gmap over one AIM) was 1000x slower than AdjacencyMap equality. :D

snowleopard commented 5 years ago

@Avasil Is there a chance to bring the overhead in addVertex and addEdge down to 3-5x?

Avasil commented 5 years ago

@snowleopard Current implementation of overlay and connect is quite involved so I assume it is possible to simplify it https://github.com/Avasil/alga/blob/adjMap-alternative-impl/src/Algebra/Graph/AdjacencyMapPoc.hs#L282

I haven't looked into it yet