snowleopard / alga

Algebraic graphs
MIT License
717 stars 68 forks source link

Implement complement for undirected graphs #237

Closed gabrielelana closed 4 years ago

gabrielelana commented 5 years ago

Regarding the last tests we don't know if it's ok for Arbitrary to generate graphs with loops

Fixes #21

Proudly made by @sphaso and @gabrielelana at "Open Source Saturday Milano"

Open Source Saturday

snowleopard commented 5 years ago

@gabrielelana @sphaso Thank you! :)

Could you estimate the complexity of your implementation in terms of time and memory, as well as the size of the resulting expression? I suspect it can only handle very small graphs because of the repeated use of expensive removeEdge, but I may be wrong.

Of course, having an expensive algorithm is much better than having none, so I'm still happy to merge your PR, but I'd like to document the complexity.

we don't know if it's ok for Arbitrary to generate graphs with loops

I think the answer is Yes because we don't rule out self-loops.

It is not entirely clear how complement should handle self-loops. What about just not touching them at all? This allows you to keep the nice property complement . complement = id.

sphaso commented 5 years ago

Hi @snowleopard , thanks for getting back :)

I suspect it can only handle very small graphs because of the repeated use of expensive removeEdge, but I may be wrong.

If you deem removeEdge too expensive there are alternative strategies we considered: simply using \\ on the edgeList was one, going through the symmetricRelation sets was another. Our first idea was to XOR the adjancency matrix with a 1 mask, but we went for what we believed was the most "idiomatic" solution

we don't know if it's ok for Arbitrary to generate graphs with loops

I think the answer is Yes because we don't rule out self-loops.

We were indeed a bit puzzled by loops, because then one might start thinking about multigraphs, and the property complement . complement = id won't pass for those either. The only way to handle loops I can think of, would be to re-apply them on the clique, but that can't work for other types of multigraphs.

As you said, an expensive algorithm is better than none, so we're happy to analyze the complexity of complement and keep these as food for thought depending on how you feel about it.

snowleopard commented 5 years ago

If you deem removeEdge too expensive

It's linear w.r.t. to the size of the graph expression, which is quite expensive:

http://hackage.haskell.org/package/algebraic-graphs-0.4/docs/Algebra-Graph.html#v:removeEdge

simply using \\ on the edgeList

I think filtering existing edges may indeed be faster, although it would be nice to do some benchmarking. Could you compare your approach with \\?

We were indeed a bit puzzled by loops, because then one might start thinking about multigraphs

The basic Graph data type corresponds to "classic" graphs represented by a pair (V, E), where self-loops are allowed, so we should be able to handle self-loops.

The only way to handle loops I can think of, would be to re-apply them on the clique, but that can't work for other types of multigraphs.

Why don't you simply leave self-loops as is, without removing them? As far as I can see, this is the simplest option.

As you said, an expensive algorithm is better than none, so we're happy to analyze the complexity of complement

Great, please do! And some benchmarking would be nice too: I'm curious how your current implementation based on removeEdge compares to the one based on \\.

sphaso commented 5 years ago

Why don't you simply leave self-loops as is, without removing them? As far as I can see, this is the simplest option.

I feel there might be a critical piece of information regarding this library I'm missing. The problem I see is that:

So the situation is: I want complement . complement = id to be a valid property even for graphs with self-loops. However, if I use clique as the reference graph for the complement, I don't have the self-loops in the first place. That's what I meant by "re-apply". If the pseudo code for the complement function is clique edges \\ current edges, I would have to take the self-loops from current edges and re-apply them after the difference operation.

We'll work on the benchmark ASAP.

sphaso commented 5 years ago

Hi @snowleopard , we finally ran the benchmark and you were right! Data.List.\\ is at least one order of magnitude faster than removeEdge.

Here you can find the benchmark code. Here you can find the results.

For easier reference, these are the two differing implementations. We took some time to calculate the complexity of the faster one as well as handling loops:

complement :: Ord a => Graph a -> Graph a
complement g@(UG _) = overlay (edges loops) $ foldr (uncurry removeEdge) (cliqueG g) previousEdges
  where cliqueG = clique . vertexList
        previousEdges = edgeList g
        loops = filter (uncurry (==)) previousEdges

-- Complexity: /O(E^2+V)/ time, /O(E+V)/ memory where
-- E is the number of edges and V is the number of vertices
-- The quadratic bound is due to `edges`.
complement' :: Ord a => Graph a -> Graph a
complement' g@(UG _) = overlay (vertices allVertices) (edges complementEdges)
  where cliqueG = clique . vertexList
        allVertices = vertexList g
        previousEdges = edgeList g
        loops = filter (uncurry (==)) previousEdges
        complementEdges = loops ++ (edgeList (cliqueG g) \\ previousEdges)

We were curious about the time complexity of edges and we ran a second benchmark on it. The benchmark results seem to show that edges is actually linear with regard to time. Here the benchmark code. Here the results.

What do you think?

If you like our implementation we'll move it to the current PR. Would you like to keep the benchmarks as well?

snowleopard commented 5 years ago

we finally ran the benchmark and you were right! Data.List.\ is at least one order of magnitude faster than removeEdge.

@sphaso Great, thank you for doing this!

We were curious about the time complexity of edges and we ran a second benchmark on it. The benchmark results seem to show that edges is actually linear with regard to time.

Yes, edges is indeed linear with respect to the length of the given list, as stated in the documentation:

http://hackage.haskell.org/package/algebraic-graphs-0.4/docs/Algebra-Graph.html#v:edges

Did you expect its complexity to be non-linear?

If you like our implementation we'll move it to the current PR.

Yes, let's use the faster implementation based on \\.

Responding to your earlier comment: yes, I think what you call "re-applying" will work. Conceptually, it's not very complex: you complement all non-loop edges and make sure to keep the original self-loops in the final graph.

snowleopard commented 5 years ago

Would you like to keep the benchmarks as well?

Regarding benchmarks: there is a separate repository for benchmarks, which you can find here:

https://github.com/haskell-perf/graphs

sphaso commented 5 years ago

Yes, edges is indeed linear with respect to the length of the given list [...] Did you expect its complexity to be non-linear?

When estimating the complexity of complement we looked at the docs of edges inside the Undirected module where it's stated:

Complexity: /O(L^2)/ time, /O(L)/ memory and size

We'll adjust our estimates of complement accordingly. Would you like us to amend this comment as well?

snowleopard commented 5 years ago

@sphaso Oh, that's just wrong. If you correct it to "Complexity: O(L) time, memory and size, where L is the length of the given list." right in this PR that would be great!

gabrielelana commented 5 years ago

@snowleopard we ported the faster implementation and amended the complexity of edges, can you check if everything is ok?

snowleopard commented 5 years ago

@gabrielelana Many thanks! The implementation looks good, but see my comments regarding the docs; they should be easy to address.

sphaso commented 5 years ago

@snowleopard I should have addressed all the points you mentioned in the latest comments. Good catch on the quadratic complexity!

snowleopard commented 5 years ago

@sphaso Thanks for the fixes! The result doesn't seem completely right though, please see my comments.

sphaso commented 4 years ago

@snowleopard I pushed a new commit that takes care of at least one concern, see my answer above for the other (time \ memory estimates, for which I already updated it as m^2). Regarding tests, the fact that we're now preserving self-loops rendered two properties false. Good thing we swapped those fixed values :) Namely: complement (edge x y) == (vertices [x, y] :: UGI) is no longer true if x == y.

snowleopard commented 4 years ago

Namely: complement (edge x y) == (vertices [x, y] :: UGI) is no longer true if x == y.

@sphaso Indeed :-) Thanks!

sphaso commented 4 years ago

@snowleopard hopefully I got everything right this time :) thanks!

snowleopard commented 4 years ago

@sphaso Thank you, merged! :)

By the way, feel free to add yourself and @gabrielelana to the list of contributors:

https://github.com/snowleopard/alga/blob/master/AUTHORS.md