Closed gabrielelana closed 4 years ago
@gabrielelana @sphaso Thank you! :)
Could you estimate the complexity of your implementation in terms of time and memory, as well as the size of the resulting expression? I suspect it can only handle very small graphs because of the repeated use of expensive removeEdge
, but I may be wrong.
Of course, having an expensive algorithm is much better than having none, so I'm still happy to merge your PR, but I'd like to document the complexity.
we don't know if it's ok for Arbitrary to generate graphs with loops
I think the answer is Yes because we don't rule out self-loops.
It is not entirely clear how complement
should handle self-loops. What about just not touching them at all? This allows you to keep the nice property complement . complement = id
.
Hi @snowleopard , thanks for getting back :)
I suspect it can only handle very small graphs because of the repeated use of expensive removeEdge, but I may be wrong.
If you deem removeEdge
too expensive there are alternative strategies we considered: simply using \\
on the edgeList was one, going through the symmetricRelation sets was another. Our first idea was to XOR the adjancency matrix with a 1 mask, but we went for what we believed was the most "idiomatic" solution
we don't know if it's ok for Arbitrary to generate graphs with loops
I think the answer is Yes because we don't rule out self-loops.
We were indeed a bit puzzled by loops, because then one might start thinking about multigraphs, and the property complement . complement = id
won't pass for those either. The only way to handle loops I can think of, would be to re-apply them on the clique, but that can't work for other types of multigraphs.
As you said, an expensive algorithm is better than none, so we're happy to analyze the complexity of complement
and keep these as food for thought depending on how you feel about it.
If you deem removeEdge too expensive
It's linear w.r.t. to the size of the graph expression, which is quite expensive:
http://hackage.haskell.org/package/algebraic-graphs-0.4/docs/Algebra-Graph.html#v:removeEdge
simply using
\\
on the edgeList
I think filtering existing edges may indeed be faster, although it would be nice to do some benchmarking. Could you compare your approach with \\
?
We were indeed a bit puzzled by loops, because then one might start thinking about multigraphs
The basic Graph
data type corresponds to "classic" graphs represented by a pair (V, E), where self-loops are allowed, so we should be able to handle self-loops.
The only way to handle loops I can think of, would be to re-apply them on the clique, but that can't work for other types of multigraphs.
Why don't you simply leave self-loops as is, without removing them? As far as I can see, this is the simplest option.
As you said, an expensive algorithm is better than none, so we're happy to analyze the complexity of
complement
Great, please do! And some benchmarking would be nice too: I'm curious how your current implementation based on removeEdge
compares to the one based on \\
.
Why don't you simply leave self-loops as is, without removing them? As far as I can see, this is the simplest option.
I feel there might be a critical piece of information regarding this library I'm missing. The problem I see is that:
Arbitrary Graph
produces graphs with self-loopsclique
does not (e.g. clique [x,y,z] == 'edges' [(x,y), (x,z), (y,z)]
)So the situation is: I want complement . complement = id
to be a valid property even for graphs with self-loops. However, if I use clique
as the reference graph for the complement, I don't have the self-loops in the first place.
That's what I meant by "re-apply". If the pseudo code for the complement
function is clique edges \\ current edges
, I would have to take the self-loops from current edges
and re-apply them after the difference operation.
We'll work on the benchmark ASAP.
Hi @snowleopard ,
we finally ran the benchmark and you were right! Data.List.\\
is at least one order of magnitude faster than removeEdge
.
Here you can find the benchmark code. Here you can find the results.
For easier reference, these are the two differing implementations. We took some time to calculate the complexity of the faster one as well as handling loops:
complement :: Ord a => Graph a -> Graph a
complement g@(UG _) = overlay (edges loops) $ foldr (uncurry removeEdge) (cliqueG g) previousEdges
where cliqueG = clique . vertexList
previousEdges = edgeList g
loops = filter (uncurry (==)) previousEdges
-- Complexity: /O(E^2+V)/ time, /O(E+V)/ memory where
-- E is the number of edges and V is the number of vertices
-- The quadratic bound is due to `edges`.
complement' :: Ord a => Graph a -> Graph a
complement' g@(UG _) = overlay (vertices allVertices) (edges complementEdges)
where cliqueG = clique . vertexList
allVertices = vertexList g
previousEdges = edgeList g
loops = filter (uncurry (==)) previousEdges
complementEdges = loops ++ (edgeList (cliqueG g) \\ previousEdges)
We were curious about the time complexity of edges
and we ran a second benchmark on it. The benchmark results seem to show that edges
is actually linear with regard to time.
Here the benchmark code.
Here the results.
What do you think?
If you like our implementation we'll move it to the current PR. Would you like to keep the benchmarks as well?
we finally ran the benchmark and you were right! Data.List.\ is at least one order of magnitude faster than removeEdge.
@sphaso Great, thank you for doing this!
We were curious about the time complexity of
edges
and we ran a second benchmark on it. The benchmark results seem to show thatedges
is actually linear with regard to time.
Yes, edges
is indeed linear with respect to the length of the given list, as stated in the documentation:
http://hackage.haskell.org/package/algebraic-graphs-0.4/docs/Algebra-Graph.html#v:edges
Did you expect its complexity to be non-linear?
If you like our implementation we'll move it to the current PR.
Yes, let's use the faster implementation based on \\
.
Responding to your earlier comment: yes, I think what you call "re-applying" will work. Conceptually, it's not very complex: you complement all non-loop edges and make sure to keep the original self-loops in the final graph.
Would you like to keep the benchmarks as well?
Regarding benchmarks: there is a separate repository for benchmarks, which you can find here:
Yes,
edges
is indeed linear with respect to the length of the given list [...] Did you expect its complexity to be non-linear?
When estimating the complexity of complement
we looked at the docs of edges
inside the Undirected
module where it's stated:
Complexity: /O(L^2)/ time, /O(L)/ memory and size
We'll adjust our estimates of complement
accordingly. Would you like us to amend this comment as well?
@sphaso Oh, that's just wrong. If you correct it to "Complexity: O(L) time, memory and size, where L is the length of the given list." right in this PR that would be great!
@snowleopard we ported the faster implementation and amended the complexity of edges
, can you check if everything is ok?
@gabrielelana Many thanks! The implementation looks good, but see my comments regarding the docs; they should be easy to address.
@snowleopard I should have addressed all the points you mentioned in the latest comments. Good catch on the quadratic complexity!
@sphaso Thanks for the fixes! The result doesn't seem completely right though, please see my comments.
@snowleopard I pushed a new commit that takes care of at least one concern, see my answer above for the other (time \ memory estimates, for which I already updated it as m^2
). Regarding tests, the fact that we're now preserving self-loops rendered two properties false. Good thing we swapped those fixed values :)
Namely:
complement (edge x y) == (vertices [x, y] :: UGI)
is no longer true if x == y
.
Namely:
complement (edge x y) == (vertices [x, y] :: UGI)
is no longer true if x == y.
@sphaso Indeed :-) Thanks!
@snowleopard hopefully I got everything right this time :) thanks!
@sphaso Thank you, merged! :)
By the way, feel free to add yourself and @gabrielelana to the list of contributors:
Regarding the last tests we don't know if it's ok for Arbitrary to generate graphs with loops
Fixes #21
Proudly made by @sphaso and @gabrielelana at "Open Source Saturday Milano"