Compute diameter using 2sweep, 4sweep and iFUB

dcoudert commented 9 years ago

This patch implements:

2sweep and 4sweep lower bounds on the diameter of a graph
iFUB, an exact and fast algorithm for computing the diameter of a graph

sage: G = graphs.RandomBarabasiAlbert(1000,2)
sage: %time G.diameter('standard')
CPU times: user 44 ms, sys: 0 ns, total: 44 ms
Wall time: 42.7 ms
7
sage: %time G.diameter('iFUB')
CPU times: user 24 ms, sys: 1 ms, total: 25 ms
Wall time: 23.6 ms
7

sage: G = graphs.RandomBarabasiAlbert(10000,2)
sage: %time G.diameter('standard')
CPU times: user 3.49 s, sys: 1 ms, total: 3.49 s
Wall time: 3.49 s
9
sage: %time G.diameter('iFUB')
CPU times: user 336 ms, sys: 2 ms, total: 338 ms
Wall time: 333 ms
9

The running time of the iFUB method depends on many parameters but is in general faster than previous methods.

sage: G = graphs.Grid2dGraph(50,50)
sage: %time G.diameter('standard')
CPU times: user 138 ms, sys: 2 ms, total: 140 ms
Wall time: 139 ms
98
sage: %time G.diameter('iFUB')
CPU times: user 80 ms, sys: 2 ms, total: 82 ms
Wall time: 80.5 ms
98
sage: V = G.vertices()
sage: shuffle(V)
sage: G.relabel(inplace=True, perm=V)
sage: %time G.diameter('iFUB')
CPU times: user 35 ms, sys: 1 ms, total: 36 ms
Wall time: 33.4 ms
98
sage: shuffle(V)
sage: G.relabel(inplace=True, perm=V)
sage: %time G.diameter('iFUB')
CPU times: user 35 ms, sys: 1 ms, total: 36 ms
Wall time: 34.2 ms
98

CC: @nathanncohen

Component: graph theory

Author: David Coudert

Branch/Commit: 69cb552

Reviewer: Nathann Cohen

Issue created by migration from https://trac.sagemath.org/ticket/17135

dcoudert commented 9 years ago

Author: David Coudert

dcoudert commented 9 years ago

Description changed:

--- 
+++ 
@@ -1 +1,3 @@
-
+This patch implements:
+- 2sweep and 4sweep lower bounds on the diameter of a graph
+- iFUB, an exact and fast algorithm for computing the diameter of a graph

dcoudert commented 9 years ago

Branch: u/dcoudert/help

dcoudert commented 9 years ago

Description changed:

--- 
+++ 
@@ -1,3 +1,53 @@
 This patch implements:
 - 2sweep and 4sweep lower bounds on the diameter of a graph
 - iFUB, an exact and fast algorithm for computing the diameter of a graph
+
+```
+sage: G = graphs.RandomBarabasiAlbert(1000,2)
+sage: %time G.diameter('standard')
+CPU times: user 44 ms, sys: 0 ns, total: 44 ms
+Wall time: 42.7 ms
+7
+sage: %time G.diameter('iFUB')
+CPU times: user 24 ms, sys: 1 ms, total: 25 ms
+Wall time: 23.6 ms
+7
+
+sage: G = graphs.RandomBarabasiAlbert(10000,2)
+sage: %time G.diameter('standard')
+CPU times: user 3.49 s, sys: 1 ms, total: 3.49 s
+Wall time: 3.49 s
+9
+sage: %time G.diameter('iFUB')
+CPU times: user 336 ms, sys: 2 ms, total: 338 ms
+Wall time: 333 ms
+9
+```
+
+The running time of the iFUB method depends on many parameters but is in general faster than previous methods.
+
+```
+sage: G = graphs.Grid2dGraph(50,50)
+sage: %time G.diameter('standard')
+CPU times: user 138 ms, sys: 2 ms, total: 140 ms
+Wall time: 139 ms
+98
+sage: %time G.diameter('iFUB')
+CPU times: user 80 ms, sys: 2 ms, total: 82 ms
+Wall time: 80.5 ms
+98
+sage: V = G.vertices()
+sage: shuffle(V)
+sage: G.relabel(inplace=True, perm=V)
+sage: %time G.diameter('iFUB')
+CPU times: user 35 ms, sys: 1 ms, total: 36 ms
+Wall time: 33.4 ms
+98
+sage: shuffle(V)
+sage: G.relabel(inplace=True, perm=V)
+sage: %time G.diameter('iFUB')
+CPU times: user 35 ms, sys: 1 ms, total: 36 ms
+Wall time: 34.2 ms
+98
+```
+

dcoudert commented 9 years ago

Commit: b449129

dcoudert commented 9 years ago

New commits:

`3dd6911`	`add references`
`d082687`	`add 2sweep, 4sweep, iFUB`
`6c788dc`	`fixed bug with Infinity`
`fb27cf7`	`fixed bug with DiGraph`
`3458a0a`	`fixed bug`
`b449129`	`fixed bugs`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`de15170`	`improve documentation`
`c1a7c95`	`improve documentation`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from b449129 to c1a7c95

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

comment:5

Yo !!

A second pass on this code. With this I guess that the 'technical review' is done, but I haven't checked the correction yet, i.e. looked at the papers and what they prove. I don't really like to make a non-obvious algorithm the default one when we have a much simpler way to compute the same value, but well, if it is faster ...

About seen. "It must be initialized, but there is no need to initialize it".

+    - ``seen`` -- bitset of size ``n`` that must be initialized before calling
+      this method. owever, there is no nead to initialize it.

this method check is all --> checkS iF all
this helps saving --> save
when we already known --> know
About your check for connected graphs: I find your if test_is_connected and len(seen)<n: return big_number a bit weird given that you just ran a BFS which is supposed to return the eccentricity of a vertex (which, in particular, tests if the graph is connected}}}. And actually, while you say that the simple_BFS function returns the eccentricity of source, it only does so when the graph is connected. If it is not, it returns the (finite) eccentricity of source in its connected component. Is there any reason why you should not do that simple_BFS ?

-return distances[waiting_list[waiting_end]]
+return distances[waiting_list[waiting_end]] if waiting_end == waiting_list+n-1 else UINT32_MAX

It would make the simple_BFS function more correct, and it also means that you do not need to define this is_connected parameter as ... checking this is totally free !

This method search for -> searches
diameter_lower_bound_4sweep_select_dichotomy: really the dichotomy should be done through bsearch (http://www.cplusplus.com/reference/cstdlib/bsearch/) but I don't exactly know how to provide distances as an argument. Too bad :-/

By the way, do you really need the if i==j-1 block ? Or would it be sufficient to just replace i=k with i=k+1 ?
In lower_bound_4sweep: you do not need a improved variable. Use a break with a while True. And you can even have a while tmp>LB and no break if you compute tmp at the end of the loop :-P
Is it only me or is the 4sweep function doing much more that 4 sweeps ?
diameter_iFUB frees memory that it did not allocate itself. This feels wrong O_o
I do not think that the comments about the data structure in diameter() are necessary. All that this function does is call others.
There are useless variables in this function

Nathann

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from c1a7c95 to 6eb88ea

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

6eb88ea multiple corrections and simplifications

dcoudert commented 9 years ago

comment:7

Replying to @nathanncohen:

Yo !!

A second pass on this code. With this I guess that the 'technical review' is done, but I haven't checked the correction yet, i.e. looked at the papers and what they prove. I don't really like to make a non-obvious algorithm the default one when we have a much simpler way to compute the same value, but well, if it is faster ...

About seen. "It must be initialized, but there is no need to initialize it".
+    - ``seen`` -- bitset of size ``n`` that must be initialized before calling
+      this method. owever, there is no nead to initialize it.

I have revised the text as follows:

+    - ``seen`` -- bitset of size ``n`` that must be initialized before calling 
       this method (i.e., bitset_init(seen, n)). However, there is no need to 
       clear it.

this method check is all --> checkS iF all

this helps saving --> save

when we already known --> know

About your check for connected graphs: I find your if test_is_connected and len(seen)<n: return big_number a bit weird given that you just ran a BFS which is supposed to return the eccentricity of a vertex (which, in particular, tests if the graph is connected}}}. And actually, while you say that the simple_BFS function returns the eccentricity of source, it only does so when the graph is connected. If it is not, it returns the (finite) eccentricity of source in its connected component. Is there any reason why you should not do that simple_BFS ?
-return distances[waiting_list[waiting_end]]
+return distances[waiting_list[waiting_end]] if waiting_end == waiting_list+n-1 else UINT32_MAX
It would make the simple_BFS function more correct, and it also means that you do not need to define this is_connected parameter as ... checking this is totally free !

You are perfectly right! I have changed the code. I have removed the useless parameter and now return the eccentricity of the source in the graph:

return distances[waiting_list[waiting_end]] if waiting_end==n-1 else UINT32_MAX

This method search for -> searches

ok

diameter_lower_bound_4sweep_select_dichotomy: really the dichotomy should be done through bsearch (http://www.cplusplus.com/reference/cstdlib/bsearch/) but I don't exactly know how to provide distances as an argument. Too bad :-/

By the way, do you really need the if i==j-1 block ? Or would it be sufficient to just replace i=k with i=k+1 ?

In lower_bound_4sweep: you do not need a improved variable. Use a break with a while True. And you can even have a while tmp>LB and no break if you compute tmp at the end of the loop :-P

I agree that calling bsearch would be better, but I don't know how to do it without introducing extra data structure (e.g., array of pairs (vertex, distance)). I have simplified the code and removed the diameter_lower_bound_4sweep_select_dichotomy method.

Is it only me or is the 4sweep function doing much more that 4 sweeps ?

You are right. I don't know why they call it 4sweep instead of multisweep. I can change the name if you think it is more natural.

diameter_iFUB frees memory that it did not allocate itself. This feels wrong O_o

I have removed these instructions.

I do not think that the comments about the data structure in diameter() are necessary. All that this function does is call others.

There are useless variables in this function

I have shortened the comments and remove useless variables. It should be cleaner now.

Nathann

Thanks a lot for all the comments.

New commits:

6eb88ea multiple corrections and simplifications

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

comment:8

You say in the documentation of the 4sweep method that the vertex w should be at the same distance from u and v, but your dichotomy search does not do that. You only check that the vertex you take is at distance LB/2 from u, but you do not know anything about v. For all you know w may be a leaf.

Nathann

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

7b6bfd2 #17135: use correct method for choosing vertex m in 4sweep

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 6eb88ea to 7b6bfd2

dcoudert commented 9 years ago

comment:10

You are perfectly right.

I found no other way than storing the predecessors and then following the path from a destination to the source to find that vertex. Let me know if you have a smarter approach.

Thanks.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

comment:11

Yo !

I found no other way than storing the predecessors and then following the path from a destination to the source to find that vertex. Let me know if you have a smarter approach.

No I don't see anything better :-/

Too bad for the fast search.

Nathann

dcoudert commented 9 years ago

comment:12

Since most of the computation time is in iFUB, this is not so bad.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 7b6bfd2 to 299c323

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

299c323 rename 4sweep into multi_sweep

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

comment:14

Hello again !

Okay, I finally understood how this thing works and it is quite simple and clear. Cool !

Aaaaaand so I added a paragraph of doc that explains how the algorithm works.

I also replace a couple of 4sweeps->multisweep, and remove the references from GenericGraph.diameter as Sphinx complained of having them twice as they are in distances_all_pairs too.

Now, while I updated the doc of distances_all_pairs.diameter I did not update the doc of GenericGraph.diameter because I have a question: why on earth do we define GenericGraph.diameter instead of just importing into GenericGraph the function from distances_all_pairs ? All the documentation is mostly a copy/paste !?

Also in multi-sweep/diameter_iFUB: the documentation says that the int * store the result of a BFS from source, while actually not much can be claimed about their content when the function returns. Just say that it is memory space used by the function while it runs unless you can say something more ?..

I added a commit with my changes at public/17135, tell me what you think !

Nathann

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`b83f306`	`trac #17135: Merged with beta6`
`eb0e4c9`	`trac #17135: Doc`
`2e58f6e`	`17135: remove useless parameters of some methods`
`69cb552`	`17135: import directly diameter method into GenericGraph`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 299c323 to 69cb552

dcoudert commented 9 years ago

comment:16

Hello,

thanks for the improvements. It is much better now.

Following your comments, I'm now importing the new method as GenericGraph.diameter. I have also removed some int * parameters from iFUB and multi-sweep. It avoids confusion. Also, since we only need to allocate tables (no need to initialize them), the extra cost of allocating some tables in 2 different methods is negligible.

David.

PS: git is powerful but not intuitive/easy for non experts.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

comment:17

Hello !

Following your comments, I'm now importing the new method as GenericGraph.diameter. I have also removed some int * parameters from iFUB and multi-sweep. It avoids confusion. Also, since we only need to allocate tables (no need to initialize them), the extra cost of allocating some tables in 2 different methods is negligible.

Well.

1) Looks ready to go 2) Good job 3) Thanks for this patch !

PS: git is powerful but not intuitive/easy for non experts.

Yep. Contributing to Sage is really hard at first :-/

Nathann

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

Reviewer: Nathann Cohen

dcoudert commented 9 years ago

comment:19

Thank you Nathann.

vbraun commented 9 years ago

Changed branch from u/dcoudert/help to 69cb552

sagemath / sage

Compute diameter using 2sweep, 4sweep and iFUB #17135