sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.43k stars 479 forks source link

igraph_max_flow #19003

Closed a8f3419b-6383-42be-b93c-ecaa87929754 closed 9 years ago

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Include the igraph algorithm for max flow / min cut.

Depends on #18929

CC: @nathanncohen @dcoudert

Component: graph theory

Keywords: igraph, maximum flow, minimum cut

Author: Michele Borassi

Branch/Commit: 0fc03ce

Reviewer: Nathann Cohen

Issue created by migration from https://trac.sagemath.org/ticket/19003

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Dependencies: #18929

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Description changed:

--- 
+++ 
@@ -1 +1 @@
-
+Include the igraph algorithm for max flow / min cut.
a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Author: Michele Borassi

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Changed keywords from none to igraph, maximum flow, minimum cut

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Branch: u/borassi/igraph_max_flow

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Commit: f2f97d5

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:3

Hellooooo!

I have implemented max flow/min cut through igraph, which is faster than the old Python-based implmentation of Ford-Fulkerson algorithm. I hope you like it!

Michele

The usual benchmarks:

sage: g = graphs.RandomGNM(2000,20000)
sage: for u,v in g.edges(labels=False):
....:     g.set_edge_label(u,v,random())
....:     
sage: g.flow(0,1,method='LP')
8.742625820571515
sage: g.flow(0,1,method='FF')
8.742625820571511
sage: g.flow(0,1,method='igraph')
8.742625820571515

sage: %timeit g.flow(0,1, method='FF')
1 loops, best of 3: 256 ms per loop
sage: %timeit g.flow(0,1, method='LP')
1 loops, best of 3: 1.8 s per loop
sage: %timeit g.flow(0,1, method='igraph')
10 loops, best of 3: 66.8 ms per loop

sage: g = digraphs.RandomDirectedGNM(5000,60000)
sage: for u,v in g.edges(labels=False):
    g.set_edge_label(u,v,random())
....:     

sage: g.flow(0,1, method='FF')
4.122947644386753
sage: g.flow(0,1, method='LP')
4.122947644386753
sage: g.flow(0,1, method='igraph')
4.122947644386753

sage: %timeit g.flow(0,1, method='FF')
1 loops, best of 3: 404 ms per loop
sage: %timeit g.flow(0,1, method='LP')
1 loops, best of 3: 2.86 s per loop
sage: %timeit g.flow(0,1, method='igraph')
1 loops, best of 3: 226 ms per loop

sage: g.flow(0,1,use_edge_labels=False, method='FF')
10
sage: g.flow(0,1,use_edge_labels=False, method='LP')
10.0
sage: g.flow(0,1,use_edge_labels=False, method='igraph')
10.0

sage: %timeit g.flow(0,1,use_edge_labels=False, method='FF')
1 loops, best of 3: 324 ms per loop
sage: %timeit g.flow(0,1,use_edge_labels=False, method='LP')
1 loops, best of 3: 2.49 s per loop
sage: %timeit g.flow(0,1,use_edge_labels=False, method='igraph')
10 loops, best of 3: 42.6 ms per loop

In conclusion, igraph is always the fastest, and the improvement is maximum when all edges have capacity 1.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:4

Helloooooo Michele,

Thank you very much for those speedups :-P

Here are the results of my review:

Sorry for being rather absent these days. The "main" excuse is that I am on vacation (worse, on vacations with people around!), but the truth is that I spend all the time I have alone trying to build in Sage all strongly regular graphs promised by Brouwer's database http://www.win.tue.nl/~aeb/graphs/srg/srgtab.html. Which requires a LOT of digging work in old books ;-)

Good luuuck !

Nathann

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:5

Helloooooo!

Thank you very much for your review! The solutions/answers follow!

See you,

Michele

Replying to @nathanncohen:

Helloooooo Michele,

Thank you very much for those speedups :-P

Here are the results of my review:

  • They run a max flow algorithm and silently ignore the weight if they do not understand it? Crazy O_o Sometimes, Sage integer/floats are not compatible with igraph::

P.S.: I just noticed that we do exactly the same: capacity=lambda x: x if x in RR else 1. This is out of the scope of this ticket, but what would you think of getting rid of this 'default' and use 'x' at all times (which would raise an exception when the weights are obviously wrong)?

I agree: I have replaced that line with capacity=lambda x: x. However, this way, some tests were wrong, because they provided a graph with no edge label, implying that the capacity would be 1 for each edge. To solve this issue, I decided to set use_edge_labels to False by default (as it is already done in the min-cut algorithm).

  • "None"? Or None? if method in ["FF", "igraph", "None"]:

Done!

  • The code imports 'igraph' even when not necessary, i.e. when method=None. I tried to figure out how to rewrite it better, but many patterns lead me to write the import/except block twice, which is a bit unpleasant. Instead of importing igraph in order to guess if it is available, what would you think of using is_package_installed("python_igraph")? This way, you do not have to define igraph_available at all, and you can use the output of is_package_installed to decide whether method should be set to igraph. Then, in the elif (method == 'igraph'): block, you can actually import igraph and raise the exception otherwise. What do you think?

I like it! I was also looking for a better solution, but I didn't know the routine is_package_installed.

  • Is there a reason why you import floor from sage.functions.other and not from math (i.e. from python)? Usually 'our' functions can handle symbolics and are much slower as a result.

Yes, there is: I didn't know the function floor :-P... Corrected!

  • Instead of differentiating between 'if integer' and 'else' in the igraph code, wouldn't it be easier to replace the weight function (in all cases) by 'floor'?

Hmmm, could you explain a bit more what you mean? In my opininon, the user should decide whether (s)he wants integer-valued flow or real-valued flow. For instance:

sage: g = Graph([(0,1,0.5)])
sage: g.flow(0,1,use_edge_labels=True)
0.5
sage: g.flow(0,1,use_edge_labels=True,integer=True)
0.0
sage: g = Graph([(0,1,0.5),(0,2,1.5),(1,2,1.5)])
sage: g.flow(0,1,use_edge_labels=True)
2.0
sage: g.flow(0,1,use_edge_labels=True,integer=True)
1.0
  • It worries me everytime I see a line like g_igraph = self.igraph_graph(edge_attrs={'capacity':[float(capacity(e[2])) for e in self.edge_iterator()]}) I always think: are we even sure that the right weight will be associated with the right edge? O_o

In this case, yes: the edges are added in order, and they are assigned an integer, as "explained" by the igraph tutorial [1]. I tried to stick as much as possible with the examples in the tutorial, but if you have a better idea, please let me know! To me, it seems that this is the standard in igraph.

  • Wow. That's even scarier: for e in self.edge_iterator(): f =next(igraph_flow) Could you iterate on the igraph graph's edges instead? If the order of edges changes in their data structure, we would output gibberish, won't we? O_o By the way, you may be interested by zip and izip:
sage: a=[1,2,3]
sage: b=[4,5,6]
sage: a=iter([1,2,3])
sage: b=iter([4,5,6])
sage: from itertools import izip
sage: for x,y in izip(a,b):
....:     print x,y
1 4
2 5
3 6
  • I did not do any profiling so I may be wrong, but I expect that you are making your life a bit too difficult, and the code a bit slower, by working with those Sage edges. It seems that you "mix" Sage's edge objects with igraph flow values in order to build a labelled graph while igraph is integer-labelled. In particular, in the code which builds the flow graph of an undirected graph, you relabel all the edges on the fly. As you use a dictionary, this means 2m dict queries, which can take some time (probably negligible compared to everything else, of course). Instead, you could work to build an integer Sage graph (which would simplify the relabelling) and relabel it (using the relabel function) once that it is built. Relabelling a graph is much faster than doing 2m dict queries. Theoretically, it only needs to query each value once, i.e. n queries.

I have added an attribute 'name' to igraph vertices, and now I iterate on igraph edges, using the attribute when I create the new graph. Do you like it as it is?

[1] !https://python-igraph.readthedocs.org/en/latest/tutorial.html

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

6282386Reviewer's suggestions
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from f2f97d5 to 6282386

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:8

Hello,

P.S.: I just noticed that we do exactly the same: capacity=lambda x: x if x in RR else 1. This is out of the scope of this ticket, but what would you think of getting rid of this 'default' and use 'x' at all times (which would raise an exception when the weights are obviously wrong)?

I agree: I have replaced that line with capacity=lambda x: x. However, this way, some tests were wrong, because they provided a graph with no edge label, implying that the capacity would be 1 for each edge. To solve this issue, I decided to set use_edge_labels to False by default (as it is already done in the min-cut algorithm).

No, please. This is out of the scope of this ticket. This should not be done here. There are many functions that handle labels this way and we should not only change the behaviour of one function (i.e. flow), plus if the default behaviour changes we have to check all functions that call flow, and perhaps think twice whether this should be done. Perhaps only use '1' if the label is None? But please, let us not touch it in this ticket.

  • Instead of differentiating between 'if integer' and 'else' in the igraph code, wouldn't it be easier to replace the weight function (in all cases) by 'floor'?

Hmmm, could you explain a bit more what you mean? In my opininon, the user should decide whether (s)he wants integer-valued flow or real-valued flow.

Sorry, my sentence was highly confusing. When I said 'in all cases', I meant "for all values of 'method' so that there is no need for this block of code

if integer:
    g_igraph = self.igraph_graph(vertex_attrs={'name':self.vertices()}, edge_attrs={'capacity':[int(floor(capacity(e[2]))) for e in self.edge_iterator()]})
else:
    g_igraph = self.igraph_graph(vertex_attrs={'name':self.vertices()}, edge_attrs={'capacity':[float(capacity(e[2])) for e in self.edge_iterator()]})

In this case, yes: the edges are added in order, and they are assigned an integer, as "explained" by the igraph tutorial [1].

Crazy. I don't even want to know how it behaves when you add/remove edges, and the time complexity of it.

I have added an attribute 'name' to igraph vertices, and now I iterate on igraph edges, using the attribute when I create the new graph. Do you like it as it is?

It is not exactly what I had in mind, but do not worry about that: went the other points will be fixed I will add a commit, and we will either take it if you like it or keep with your code if you do not.

Nathann

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:9

Hello!

P.S.: I just noticed that we do exactly the same: capacity=lambda x: x if x in RR else 1. This is out of the scope of this ticket, but what would you think of getting rid of this 'default' and use 'x' at all times (which would raise an exception when the weights are obviously wrong)?

I agree: I have replaced that line with capacity=lambda x: x. However, this way, some tests were wrong, because they provided a graph with no edge label, implying that the capacity would be 1 for each edge. To solve this issue, I decided to set use_edge_labels to False by default (as it is already done in the min-cut algorithm).

No, please. This is out of the scope of this ticket. This should not be done here. There are many functions that handle labels this way and we should not only change the behaviour of one function (i.e. flow), plus if the default behaviour changes we have to check all functions that call flow, and perhaps think twice whether this should be done. Perhaps only use '1' if the label is None? But please, let us not touch it in this ticket.

Oh, sorry, actually you wrote that it was out of the scope of this ticket. I reverted everything.

  • Instead of differentiating between 'if integer' and 'else' in the igraph code, wouldn't it be easier to replace the weight function (in all cases) by 'floor'?

Hmmm, could you explain a bit more what you mean? In my opininon, the user should decide whether (s)he wants integer-valued flow or real-valued flow.

Sorry, my sentence was highly confusing. When I said 'in all cases', I meant "for all values of 'method' so that there is no need for this block of code

if integer:
g_igraph = self.igraph_graph(vertex_attrs={'name':self.vertices()}, edge_attrs={'capacity':[int(floor(capacity(e[2]))) for e in self.edge_iterator()]})
else:
g_igraph = self.igraph_graph(vertex_attrs={'name':self.vertices()}, edge_attrs={'capacity':[float(capacity(e[2])) for e in self.edge_iterator()]})

Done!

In this case, yes: the edges are added in order, and they are assigned an integer, as "explained" by the igraph tutorial [1].

Crazy. I don't even want to know how it behaves when you add/remove edges, and the time complexity of it.

I have added an attribute 'name' to igraph vertices, and now I iterate on igraph edges, using the attribute when I create the new graph. Do you like it as it is?

It is not exactly what I had in mind, but do not worry about that: went the other points will be fixed I will add a commit, and we will either take it if you like it or keep with your code if you do not.

Great! I will wait for your code, then!

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

d363158Reviewer's commit
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 6282386 to d363158

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:11

Oh, sorry, actually you wrote that it was out of the scope of this ticket. I reverted everything.

Thanks. I know that it is unpleasant to do, sorry :-/

Done!

Thanks.

Great! I will wait for your code, then!

I just pushed a commit at public/19003. It does not do much, but I hope that it simplifies the code "a bit". Don't hesitate to tell me if you do not like it. In theory it should also "do less computations", though it probably cannot be measured.

Nathann

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

68053d3trac #19003: build an integer-labelled graph, then relabel it
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from d363158 to 68053d3

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 68053d3 to 0fc03ce

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

0fc03ceBroken link in documentation
a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:14

Thank you Nathann for your work. I have included it, and I have corrected a small mistake in the documentation. All long tests on graphs are successful, documentation is ok, so I think this patch is good to go!

Michele

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:15

Arg. Thanks for the fix ^^;

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

Reviewer: Nathann Cohen

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:16

Hellooooooo,

Thank you Nathann for your work. I have included it, and I have corrected a small mistake in the documentation.

Yepyepyep, thank you for that.

All long tests on graphs are successful, documentation is ok, so I think this patch is good to go!

Same here. Good to go :-)

Nathann

vbraun commented 9 years ago

Changed branch from u/borassi/igraph_max_flow to 0fc03ce