sbromberger / LightGraphs.jl

An optimized graphs package for the Julia programming language
Other
672 stars 185 forks source link

Edges are not necessarily Arcs #401

Closed IssamT closed 8 years ago

IssamT commented 8 years ago

In simple graphs :

An arc (directed edge) is an ordered pair of vertices u-->v A line (undirected edge) is a set of two vertices u---v

Directed graphs have arcs, while undirected graphs have lines(often called edges).

https://en.wikipedia.org/wiki/Glossary_of_graph_theory#edge

These two objects are not equal u-->v , v-->u But these two are equal u---v , v---u

In both Graphs.jl and LightGraphs.jl there is only the notion of Arcs that are called Edges and are used for both directed and undirected graphs with a confusing equality test of edges in undirected graphs.

At least in Graphs.jl it is possible to do a work around by redefining the operator == of Edges in parts of the code where only undirected edges are used. But it becomes more problematic in LightGraphs since the edges are just pairs and thus the operator to redefine is this one ==(p::Pair, q::Pair) = (p.first==q.first) & (p.second==q.second)

IssamT commented 8 years ago

Inconsistency example :

The following code

g = Graph(2)
add_edge!(g,Edge(1,2))
has_edge(g,Edge(2,1))

returns true

but this one

for e in edges(g)
    e == Edge(2,1) && print("Edge found!")
end

prints nothing

CarloLucibello commented 8 years ago

for undirected graps edges are always returned such that x=>y has always x<y. This could be of some help

I'm in favor of having a type

type Edge
  first::Int
  second::Int
end

also because I don't like for example having Base.show(p::Pair) overwritten and displaying "Edge 1 2" for hany pair.

I fail to see though how can this solve the problem of Arc/Edge comparison

IssamT commented 8 years ago

Well i don't mind having two types that are exactly the same from a structural point of view.

type Arc
  first::Int
  second::Int
end

type Edge
  first::Int
  second::Int
end

As far as they are different types we can have different function specialization for both. For example

 ==(p::Arc, q::Arc) = (p.first==q.first) & (p.second==q.second)

 ==(p::Edge, q::Edge) = ((p.first==q.first) & (p.second==q.second)) | (p.first==q.second) & (p.second==q.first)
CarloLucibello commented 8 years ago

we should check what impact this would have thoughout the library, in particular regarding code duplication and type instability

IssamT commented 8 years ago

Sure and it would be better to keep all methods that do not make a difference between directed and undirected edges unique :

abstract Edge

type Arc <: Edge
  first::Int
  second::Int
end

type UEdge <: Edge
  first::Int
  second::Int
end
sbromberger commented 8 years ago

It was an explicit design decision to keep the structure of the graph independent of the structure of the edges. That is, edges do not contain information relative to their graph. There are a few reasons for this, but the main one is that it really doesn't matter. We have an accepted interface for checking edge membership, and e in edges is not it. (This used to work, incidentally, before we moved to edge iterators, but it is not the accepted interface.)

Sometimes the lack of directedness on edges makes things more complex, but in general, it simplifies things a great deal.

That is: I'm not in favor of creating two different edge types based on directedness. We have a history with this issue where basic operations required two separate functions depending on directedness of edges, and it was unnecessarily complex.

Also: Edges used to just be a type with two Ints, as you're suggesting. It made sense to move to a primitive structure just for convenience, but if there's a compelling reason to move back, we can do it.

IssamT commented 8 years ago

@sbromberger There are a few reasons for this, but the main one is that it really doesn't matter

but for me it really does matter.

I wanted to write a mip that finds the shortest cycle in a graph using JuMP.jl

function shortest_cycle(g::Graph)
    m = Model(solver=CbcSolver())   

    @variable(m, b[v in vertices(g)], Bin)
    @variable(m, c[e in edges(g)], Bin)
    @constraint(m, sum{b[v] , v in vertices(g)} >= 1)
    @constraint(m , adjacency[v in vertices(g)],  sum{c[e] , e in out_edges(g,v)} == 2b[v])
    @objective(m, Min, sum{b[v] , v in vertices(g)})

    solve(m)   
    println("The shortest cycle has $(getobjectivevalue(m)) elements")
    getvalue(b)
end

In JuMP when you declare your variable, you also declare the set of its indexes then you can do mathematic expressions over subsets in this set.

Since out_edges(v,g) is not a subset of edges(g) there are a lot of mathematical programs that can not be written in a simple way anymore.

sbromberger commented 8 years ago

the easiest ways to do this for undirected graphs are 1) to just ignore edges that are not in vertex order (use the unexported LightGraphs.is_ordered() to do this), or 2) explicitly "or" with reverse(e).

IssamT commented 8 years ago

Yes I saw noticed these solutions. but they have three drawbacks.

1) they aren't intuitive and require the user to know about the implementation details. 2) they can lead to bad performance. 3) they aren't useful to write concise code.

After thinking half an hour one way I found to fix my model is to rewrite this

@constraint(m , adjacency[v in vertices(g)], sum{c[e] , e in out_edges(g,v)} == 2b[v])

into that

    @constraintref adjacency[vertices(g)]    
    for v in vertices(g)
        out_edges_subset_of_edges = []       
        for(e in out_edges(g,v))            
            e in edges(g) && push!(out_edges_subset_of_edges, e)
            reverse(e) in edges(g) && push!(out_edges_subset_of_edges, reverse(e))
        end               
        adjacency[v] = @constraint(m, sum{c[e] , e in out_edges_subset_of_edges} == 2b[v] )
    end

I think this illustrate the 3 points I mentioned.

Another way would be to transform pairs into sets that contain two elements and let the keys be Sets of vertices instead of Edges. but this would be even uglier in my opinion

I am pretty sure you have reasons behind your decision of making Edges always Pairs, and I would be grateful if you could share some of them.

sbromberger commented 8 years ago

Collecting / set-testing edges like that is expensive, though it may be unavoidable given what you need to pass in to JuMP. If that's the case, then just union the edgeset and its reverse:

vec(hcat([[e,reverse(e)] for e in edges(g)]...))

(there's an equivalent for out_edges but I'll leave that as an exercise for the reader.)See below for out_edges.

As far as the reasons for making Edge a typealias of Pair, you can check the commit history for core.jl for some of it, but again - the tendency is to prefer primitives over other data structures wherever possible. But I'm sensing that's not the question you want to ask - my guess is that you want to ask why we're not differentiating between arcs and edges. The answer there is as I said above: this was a conscious decision to keep LG simple and memory-light; having two different edge types depending on the directedness of the graph leads to complications of its own. You might even go back into some of the commit history to see the machinations we had to perform when we kept full edge sets for undirected graphs. I don't really have the time to do any more right now except to say "this proposal was considered previously and we [I, since I was the primary contributor at the time] decided that it wasn't worth the complexity".

Bottom line: yes, you've run into a case where you have to make something more complex due to the "light" nature of LightGraphs, and ultimately, this package may not be well-suited for your application. The alternative, to reengineer LG so that your case is simple, will have the effect of making other stuff more complex. It's not (even) a net-zero tradeoff. The fact that you've got a few reasonable workarounds and we don't have to upend / reengineer all the native data structures to accommodate this request means that, absent any demonstration that this proposed change improves LG with respect to 1) correctness, 2) memory utilization, 3) code simplicity, and/or 4) performance, it's probably not going to happen.

If you want to submit a PR with appropriate benchmarks showing (at a minimum) no regressions in terms of those four goals, then you would then be presenting a more compelling reason to switch.

I hope that helps.

sbromberger commented 8 years ago

Another option is to use in_edges as well as out_edges instead of reverse:

union(in_edges(g,1),out_edges(g,1))

in_edges and out_edges are very efficient.

IssamT commented 8 years ago

Thank you for providing such a long explanation for something that was already discussed in the past (I m sorry i didn't know about that ). I actually did check all the issues posted even the closed ones before submitting this issue but i didn't check the commit history. I will go over it before eventually trying to submit a PR.

sbromberger commented 8 years ago

To be fair, you wouldn't have known where to search in the commit history, nor what to look for, so no worries.

I look forward to your PR. When you get closer to finalizing it, reach out to us to make sure you've got the proper benchmarking information included - this will represent a MAJOR change to the codebase and will require a ton of scrutiny.

CarloLucibello commented 8 years ago

I changed the return type of in_edges in #403 , let me know what you think about it.

Another option is to define and export an order function in LG so that

@constraint(m , adjacency[v in vertices(g)], sum{c[order(e)] , e in out_edges(g,v)} == 2b[v])

would work correctly

IssamT commented 8 years ago

@CarloLucibello I thought also about the same modification as what you suggested in #403 , but I didn't want to request it because, although it solves my issues, it could break some algorithms if they were relying on the fact that when looping thought e in out_edges(g,v), dest(e) is a neighbor of v.

However I really like your idea of adding the function order. It is pretty safe and unlikely to have a negative impact on any code using LightGraphs.jl. Also, out of the 3 drawbacks I have mentioned in a previous comment concerning the current implementation of edges, only one (arguably the least important) would partially persist:

1) they aren't intuitive and require the user to know about the implementation details.

IssamT commented 8 years ago

What do you think about leaving out_edges as it is and addingincident_edges(g,v) that would return edges as ordered Pairs ?

sbromberger commented 8 years ago

wait. What does order() do? Does it rewrite an edge so that the lowest vid is first? If so, I am totally against this since there's no way to prevent it being used with directed graphs.

If you need this, then use is_ordered() and do it yourself so that it's at least explicit for your use case. It doesn't belong in LG.

CarloLucibello commented 8 years ago

I don't see any problem with order, why preventing to use it on directed graphs since we can't prevent also to use reverse? The naming is bad though, a better one would be sort. Consider that reverse and issorted are already defined in Base for Pair, hence for Edge. These side effects advocate for Edge being its own type

IssamT commented 8 years ago

@sbromberger,

You have asked about order() because the name was not appropriate. This is the same problem a user of LightGraphs.jl faces since objects and functions do not have an intuitive behavior (respecting mathematical definitions) and require the user to refer to the doc/implementation continuously.

By the way is_ordered() is another exemple, it should be named is_sorted() in my opinion. All pairs are ordered (they have a "first" and "second" element)

Additionally I don't understand why you want to prevent users from sorting Edges when they are coming from directed graphs since you don t make a difference between edges of directed and undirected graphs. These are just "Pairs" so what you are saying is that you want to prevent a user from sorting a Pair. Example for why sorting Pairs of vertices in directed graph can be useful: In telecommunication networks, sometimes you can have different directed graphs (virtual networks) running over a physical network. One may want to get the underlying undirected subgraph on which all the directed graphs rely. An easy way to do that is to get all edges in directed graphs, sort them, and push them into a set (avoiding hence the addition of "similar" unordered Pairs)

I am just trying to help improve the library in which you guys have done really a great job. You probably know better about it than me and I will find it normal if my suggestions get refused. I will have sort() and incident_edges() defined in my code only and that wouldn't be a big deal for me :)

sbromberger commented 8 years ago

I don't see any problem with order, why preventing to use it on directed graphs since we can't prevent also to use reverse?

For the extremely simple reason that order has nondeterministic behavior on directed graphs only, while reverse has defined, consistent behavior on either type.

Additionally I don't understand why you want to prevent users from sorting Edges when they are coming from directed graphs since you don t make a difference between edges of directed and undirected graphs. These are just "Pairs" so what you are saying is that you want to prevent a user from sorting a Pair.

NOTHING is preventing you from doing this yourself. I have a stated objection to including this functionality in the public LG API because IT WILL CONFUSE PEOPLE.

Bottom line: you can continue to debate the merits (or demerits) of having arcs, edges, other functionality - but in the absence of a PR of the type I described above, I will not take further part.