sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.47k stars 487 forks source link

Refactor Closeness Centrality #19007

Closed a8f3419b-6383-42be-b93c-ecaa87929754 closed 9 years ago

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Now, the closeness centrality algorithm simply calls the corresponding NetworkX routine (only for undirected graphs).

This ticket will move this routine to file generic_graph (closeness centrality is defined also in this case ![1]), and it will modify input/output variables as in routine shortest_path. Then, it will call our shortest path new routines instead of NetworkX routines.

This ticket is a first step towards the inclusion of the algorithm in ![2].

![1] !http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6816651&tag=1

![2] !http://arxiv.org/pdf/1507.01490v1.pdf

Depends on #18931 Depends on #18876 Depends on #18910

CC: @nathanncohen @dcoudert

Component: graph theory

Keywords: Closeness centrality

Author: Michele Borassi

Branch/Commit: 1551c6f

Reviewer: David Coudert

Issue created by migration from https://trac.sagemath.org/ticket/19007

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Author: Michele Borassi

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Changed keywords from none to Closeness centrality

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Description changed:

--- 
+++ 
@@ -1 +1,9 @@
+Now, the closeness centrality algorithm simply calls the corresponding NetworkX routine (only for undirected graphs).

+This ticket will move this routine to file generic_graph (closeness centrality is defined also in this case ![1]), and it will modify input/output variables as in routine shortest_path. Then, it will call our shortest path new routines instead of NetworkX routines.
+
+This ticket is a first step towards the inclusion of the algorithm in ![2].
+
+![1] !http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6816651&tag=1
+
+![2] !http://arxiv.org/pdf/1507.01490v1.pdf
a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Branch: u/borassi/refactor_centrality_closeness

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Dependencies: #18931, #18876, #18910

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:3

Hello!

This is the first version of the refactoring of closeness_centrality. Let me know if you like it!

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Commit: 5c85793

dcoudert commented 9 years ago
comment:4

I don't really understand what you are doing in this commit https://github.com/sagemath/sagetrac-mirror/commits/5c85793ccb8234ac668173620523f45dbae4b985

In the ticket description you say that you will move everything in generic_graph, but it might be more readable/clean to create a new file for centrality measures, no? Also, you moved only references to the methods, right? (or the modifications are hidden in the merge with #18931).

+            - :meth:`~sage.graphs.graph.Graph.centrality_degree`
+            - :meth:`~centrality_betweenness`

David.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

5853eb7Merged with beta2
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 5c85793 to 5853eb7

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 5853eb7 to a19addb

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

a19addbRewrote closeness centrality according to new standards
a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:7

Hello!

Thank you for the comment, I will try to address the main issues raised.

I have put everything in a single commit, which simply removes the closeness centrality routine from graph.py and rewrites it in generic_graph.py. Now the ticket should be much easier to review. Since closeness centrality highly depends on shortest paths, the new routine follows all conventions in ticket #18931.

About the creation of a new file: we already have file centrality.pyx, containing fast algorithms for betweenness centrality. However, these routines are called from a centrality_betweenness routine in file generic_graph.py, which should be used by the user. In this ticket, I just copied this policy.

In my opinion, if we want to create a new file, we should at least move also betweenness centrality and degree centrality: in my opinion, this goes beyond the scope of this ticket. Do you agree?

Best,

Michele

Replying to @dcoudert:

I don't really understand what you are doing in this commit https://github.com/sagemath/sagetrac-mirror/commits/5c85793ccb8234ac668173620523f45dbae4b985

In the ticket description you say that you will move everything in generic_graph, but it might be more readable/clean to create a new file for centrality measures, no? Also, you moved only references to the methods, right? (or the modifications are hidden in the merge with #18931).

+            - :meth:`~sage.graphs.graph.Graph.centrality_degree`
+            - :meth:`~centrality_betweenness`

David.

dcoudert commented 9 years ago
comment:8

Something is wrong here

+            try:
+                v_iter = iter(vert)
+            except TypeError:
+                v_iter = [v_iter]
+                v_iter = iter(vert)

I assume you want first to have `v_vert = iter(list(vert))

The default of calling distances = self.shortest_path_all_pairs(by_weight,algorithm, weight_function, check_weight)[0] is that you get a dictionary of dictionary, that is a huge data structure for large graphs (lost of memory and long to create). At least the second part of this method should be cythonized to get the array of distances, as is done for centrality_betweenness.

David.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:9

Something is wrong here

+            try:
+                v_iter = iter(vert)
+            except TypeError:
+                v_iter = [v_iter]
+                v_iter = iter(vert)

+1.

In some cases a vertex can be a pair, and thus is iterable. This would react badly with this code.

sage: g=graphs.KneserGraph(5,2)
sage: v=g.vertices()[0]
sage: g.centrality_closeness(v)
<crash>

Nathann

dcoudert commented 9 years ago
comment:10

Right. My previous proposal is not a good solution. Something like that might also be unsafe if some labels in vert are not in G, but this can be checked when trying to associate computed values and vertices.

v_vert = iter([vert]) if vert in G else iter(vert)
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

7dfa2d2Corrected v_iter, cythonized Johnson
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from a19addb to 7dfa2d2

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:12

Hello!

I hope you spent a great Assumption day!

With this new commit, (hopefully) I solved the problems with v_vert and I cythonized the Johnson algorithm. In my opinion, it is not worth cythonizing Floyd-Warshall, because it is much slower than Johnson, especially on sparse graphs.

Thank you very much!

Michele

dcoudert commented 9 years ago
comment:13

Hello,

I'm trying to understand the behavior of the method on digraphs. You wrote that the centrality of vertices of degree 0 is not reported. Does it means that for digraphs you don't report the centrality of vertices with out-degree 0? Also, what's the expected behavior on strongly connected digraphs?

David.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

7a4059aImproved documentation
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 7dfa2d2 to 7a4059a

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

5c67f79Small mistake
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 7a4059a to 5c67f79

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:16

Hello!

I tried to improve the documentation, answering your questions.

Michele

Replying to @dcoudert:

You wrote that the centrality of vertices of degree 0 is not reported. Does it means that for digraphs you don't report the centrality of vertices with out-degree 0?

Yes, I changed in the documentation "degree" with "(out)degree"

Also, what's the expected behavior on strongly connected digraphs?

The closeness centrality of a vertex v is 1/farn(v), where farn(v) is the average distance between v and a random vertex w different from v (as in the case of connected graphs). I added an example about this.

dcoudert commented 9 years ago
comment:17

Hello,

This is much better now. The method works well, passes all tests and the doc build properly.

One possible modification is to move the method to file centrality.pyx which is made for gathering centrality methods. Let me know.

Best, David.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 5c67f79 to 48514ad

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

48514adMoved centrality_closeness to centrality.pyx
a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:19

Done!

However, I had to do some "magic" to avoid duplicate references... Hope it is ok!

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 48514ad to cf9cf87

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

cf9cf87Trailing whitespaces
dcoudert commented 9 years ago
comment:21

Yep, I add to make doc-clean && make doc to get it compiled.

I just saw in the doc of centrality_degree in graph.py, in section See also, that the link to centrality_closeness is broken. In fact, the code is :meth:`centrality_closeness`. Could you check that? In fact we should also move this methode to centrality.pyx, but in another ticket.

After that I think it will be ok.

David.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from cf9cf87 to 90638aa

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

90638aaCorrected link
a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:23

Done!

dcoudert commented 9 years ago
comment:24

Hello,

do you understand why the patchbot reports some errors? I have no error when testing this patch on my computer (install, long tests, trials, docbuild html et pdf,...).

Best, David.

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:25

What is the patchbot? How can I find these errors?

dcoudert commented 9 years ago
comment:26

on the top of this page, you can see a blue disk (or red or green, etc.) with a link to http://patchbot.sagemath.org/ticket/19007/ It reports on automatic tests of your patch. Tests are performed on all modules.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

73dd6d5Solve patchbot - first attempt
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 90638aa to 73dd6d5

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

f1c7fd5Solve patchbot - second attempt
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 73dd6d5 to f1c7fd5

dcoudert commented 9 years ago
comment:29

Note that the patchbot needs a lot of time: it is not tested instantaneously and it takes some hours of computations...

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:30

Ok, I will wait until it answers... In my opinion, the problem is that we need to import centrality.pyx in generic_graph at startup, and this slows down the startup time. I tried to change the imports, hoping the total time will decrease. If any of the two attempts work, we are fine, otherwise I probably have to mail sage-devel...

dcoudert commented 9 years ago
comment:31

Well, generic graph is becoming really really big with lots of tests. For instance, do we really need the tests with for i in range(100):. Can't we have only one test we randomly chosen number of edges? That should be enough.

David.

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:32

I'm afraid that all the attempts I have made were unsuccessful. The point is that importing module centrality.pyx at startup is expensive, and I need it to import it to add method centrality_closeness to GenericGraph. Hence, the startup time becomes 2 seconds (too much).

In my opinion, this might be a reason why it is better to leave method centrality_closeness in generic_graph, and put only Cython-related methods inside centrality.pyx, in order to use lazy imports. May I move method centrality_closeness back?

If we still want to move centrality_closeness and related methods to centrality.pyx, I think we should open another ticket, in order to find a way to use lazy imports (I have no idea how, though).

dcoudert commented 9 years ago
comment:33

ok, move it back to generic graph. David.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from f1c7fd5 to 1551c6f

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

1551c6fUndo commit moving closeness_centrality to centrality.pyx. Corrected links.
dcoudert commented 9 years ago

Reviewer: David Coudert