sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.38k stars 471 forks source link

Boost Dominator Tree #18839

Closed a8f3419b-6383-42be-b93c-ecaa87929754 closed 9 years ago

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Use Boost to compute the dominator tree of a digraph.

This function is not available in Sagemath.

Depends on #18811 Depends on #18564

CC: @nathanncohen @dcoudert

Component: graph theory

Keywords: Dominator tree, Boost

Author: Michele Borassi

Branch/Commit: b42de69

Reviewer: Nathann Cohen, David Coudert

Issue created by migration from https://trac.sagemath.org/ticket/18839

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Dependencies: 18811, 18564

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Author: Michele Borassi

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Changed keywords from none to Dominator tree, Boost

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Description changed:

--- 
+++ 
@@ -1 +1,3 @@
+Use Boost to compute the dominator tree of a digraph.

+This function is not available in Sagemath.
6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

Changed dependencies from 18811, 18564 to #18811, #18564

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Branch: u/borassi/boost_dominator_tree

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:5

This is the first draft of the Boost algorithm for computing dominator trees (see https://en.wikipedia.org/wiki/Dominator_%28graph_theory%29 for the definition of dominator tree).


New commits:

283c61b First draft
18bd01a Applied Nathann's suggestions
99bd994 Few corrections in the reference manual
40df844 trac #18811: Review
6c8d49b More reviewer comments
c365d02 Removed "vertices is None" from boost_clustering_coeff
582ce34 Changed for v in G.vertices() => for v in G
6eb8a83 Added a typedef for vertex index (before, it was int).
337726e First draft
4d1b5ac Removed trailing whitespaces
a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago

Commit: 4d1b5ac

dcoudert commented 9 years ago
comment:6

Nicely done.

May be you could combine methods dominator_tree_dictionary and dominator_tree using an optional parameter (e.g., return_dict=False) ?

Actually I don't know which is the most useful: the tree or the dictionary.

Also, I suggest

- edges = [[v,dom_tree_dict[v]] for v in dom_tree_dict.keys() if dom_tree_dict[v] is not None]
+ edges = [[v,dom_v] for v,dom_v in dom_tree_dict.iteritems() if not dom_v is None]
6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:7

May be you could combine methods dominator_tree_dictionary and dominator_tree using an optional parameter (e.g., return_dict=False) ?

+1

Also, you do not need to write two functions (with doc+test) when everything it does is call another function defined outside, in a module. You will find many such examples at the bottom of generic_graph.py.

Nathann

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 4d1b5ac to a6f3f9a

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

a6f3f9aReviewer's suggestions
a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:9

Done everything!

dcoudert commented 9 years ago
comment:10

Install OK, docbuild OK, doc looks good.

However, I have questions on the expected behavior:

sage: G = graphs.PathGraph(1)
sage: G.dominator_tree(0)
{0: None}
sage: G.dominator_tree(0, return_dict=False)
Graph on 0 vertices
sage: G = 2 * graphs.PathGraph(1)
sage: G.dominator_tree(0)
{0: None, 1: None}
sage: G = 2 * graphs.PathGraph(2)
sage: G.dominator_tree(0)
{0: None, 1: 0, 2: None, 3: None}

If this is effectively what we expect, then I will finalize the review.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

81e4ee1Added support if dominator tree is empty
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from a6f3f9a to 81e4ee1

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:12

Helloooooo again,

Is there any reason why you kept two functions with the same name (dominator_tree)? If you moved the 'graph output' to the one defined in boost_graph.pyx, you would only have to import it in the GenericGraph object. No duplication of doc/doctest.

Nathann

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:13

However, I have questions on the expected behavior:

  • with 1-vertex graph:
sage: G = graphs.[wiki:PathGraph](1)
sage: G.dominator_tree(0)
{0: None}
sage: G.dominator_tree(0, return_dict=False)
Graph on 0 vertices

You are right, if the dominator tree is empty (that is, if from the root we cannot reach any other vertex) it is better to output a 1-vertex graph containing the root, not an empty graph. For the dictionary, I think the behavior is correct.

  • With disconnected graphs
sage: G = 2 * graphs.[wiki:PathGraph](1)
sage: G.dominator_tree(0)
{0: None, 1: None}
sage: G = 2 * graphs.[wiki:PathGraph](2)
sage: G.dominator_tree(0)
{0: None, 1: 0, 2: None, 3: None}

If this is effectively what we expect, then I will finalize the review.

Here, the behavior is correct, in my opinion. I have changed the doc to better explain this case, by adding an "OUTPUT" block.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 81e4ee1 to 7b72043

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

7b72043Improved doc (unreachable vertices)
a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:15

You mean that I should move all the code in boost_graph.pyx, and then import this function through something like

GenericGraph.dominator_tree = types.MethodType(sage.graphs.base.boost_graph.dominator_tree(GenericGraph, ???))

Could you tell me the exact command? The three examples I have found in generic_graph.py do not have arguments...

Replying to @nathanncohen:

Helloooooo again,

Is there any reason why you kept two functions with the same name (dominator_tree)? If you moved the 'graph output' to the one defined in boost_graph.pyx, you would only have to import it in the GenericGraph object. No duplication of doc/doctest.

Nathann

dcoudert commented 9 years ago
comment:16

A good example is GenericGraph.diameter = types.MethodType(sage.graphs.distances_all_pairs.diameter, None, GenericGraph). It has several arguments (e.g. method='iFUB').

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:17

Hellooooooo!

I have done it, but there is a problem: in order to obtain this result, generic_graph should import boost_graph, which imports directed_graph, which imports generic_graph, and I have cyclic dependencies. I tried to put

import types
from sage.graphs.generic_graph import GenericGraph
GenericGraph.dominator_tree = types.MethodType(dominator_tree, None, GenericGraph)

inside boost_graph, but it is not imported when Sage starts, so I have to import boost_graph before running dominator_tree (and I don't like it). Any suggestion?

Replying to @dcoudert:

A good example is GenericGraph.diameter = types.MethodType(sage.graphs.distances_all_pairs.diameter, None, GenericGraph). It has several arguments (e.g. method='iFUB').

dcoudert commented 9 years ago

Reviewer: Nathann Cohen, David Coudert

dcoudert commented 9 years ago
comment:18

This is weird since for instance the distances_all_pairs module imports Graph many times.

Could you try the following: in the boost_graph.pyx file, move the from sage.graphs.graph import Graph statement inside all the methods using it. Same for DiGraph etc.

David.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 7b72043 to ec55207

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

344e41bBoost dominator tree
ec55207Removed trailing whitespaces
a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:20

Helloooooo!

Thank you very much: it worked! Now, I have rebased all my commits, so that the history of generic_graph is correct, and I have moved everything to boost_graph.

Replying to @dcoudert:

This is weird since for instance the distances_all_pairs module imports Graph many times.

Could you try the following: in the boost_graph.pyx file, move the from sage.graphs.graph import Graph statement inside all the methods using it. Same for DiGraph etc.

David.

dcoudert commented 9 years ago
comment:21

Something goes wrong with your last commits, some conflicts. I don't know how to solve that (not git expert).

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:22

Strange, in my computer it works correctly... Maybe you have to perform a forced pull, since I had to perform a forced push, or try to remove your local branch and re-download it... Nathann?

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:23

I also see conflicts. And the branch's name of thi ticket appears in red, which is a sign too.

Nathann

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:24

Michele: are you running beta7? If not, that's the reason.

Nathann

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

20a8625Dominator tree through Boost
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from ec55207 to 20a8625

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:26

Now it should work: I have downloaded beta7, and I have added the missing parts.

dcoudert commented 9 years ago
comment:27

Now it's working and I'm trying to understand the expected behavior of the method.

sage: G = graphs.RandomBarabasiAlbert(1000,2)
sage: H = G.dominator_tree(0, return_dict=False)
sage: H.is_tree() and H.diameter()<=2
True
sage: H = G.dominator_tree(193, return_dict=False)
sage: H.is_tree() and H.diameter()<=2
True
sage: G = graphs.Grid2dGraph(4,4)
sage: G.dominator_tree((0,0), return_dict=False).vertices()
[0,
 (0, 1),
 (0, 2),
 (0, 3),
 (1, 0),
 (1, 1),
 (1, 2),
 (1, 3),
 (2, 0),
 (2, 1),
 (2, 2),
 (2, 3),
 (3, 0),
 (3, 1),
 (3, 2),
 (3, 3)]
sage: G.dominator_tree((1,1), return_dict=False).vertices()
[5,
 (0, 0),
 (0, 1),
 (0, 2),
 (0, 3),
 (1, 0),
 (1, 2),
 (1, 3),
 (2, 0),
 (2, 1),
 (2, 2),
 (2, 3),
 (3, 0),
 (3, 1),
 (3, 2),
 (3, 3)]
sage: D = digraphs.DeBruijn(2,4)
sage: D.dominator_tree('0000', return_dict=False).show()

David.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 20a8625 to 512cfdf

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

512cfdfReviewer's comments
a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:29

Replying to @dcoudert:

Now it's working and I'm trying to understand the expected behavior of the method.

  • is it normal that the resulting tree is always a star ?
sage: G = graphs.[wiki:RandomBarabasiAlbert](1000,2)
sage: H = G.dominator_tree(0, return_dict=False)
sage: H.is_tree() and H.diameter()<=2
True
sage: H = G.dominator_tree(193, return_dict=False)
sage: H.is_tree() and H.diameter()<=2
True

Yes, it is: if the input is undirected, the dominator tree is a star if and only the if the graph is biconnected. In a Barabasi-Albert graph, the minimum degree is 2, so I believe it is.

  • I suggest to set parameter return_dict=False by default. Well I don't know which is the most useful

Done!

  • There is a labeling problem with the root.
sage: G = graphs.Grid2dGraph(4,4)
sage: G.dominator_tree((0,0), return_dict=False).vertices()
[0,
(0, 1),
(0, 2),
(0, 3),
(1, 0),
(1, 1),
(1, 2),
(1, 3),
(2, 0),
(2, 1),
(2, 2),
(2, 3),
(3, 0),
(3, 1),
(3, 2),
(3, 3)]
sage: G.dominator_tree((1,1), return_dict=False).vertices()
[5,
(0, 0),
(0, 1),
(0, 2),
(0, 3),
(1, 0),
(1, 2),
(1, 3),
(2, 0),
(2, 1),
(2, 2),
(2, 3),
(3, 0),
(3, 1),
(3, 2),
(3, 3)]
  • Try also this. The output digraph is not connected and you have some labeling problems.
sage: D = digraphs.[wiki:DeBruijn](2,4)
sage: D.dominator_tree('0000', return_dict=False).show()

Yes, you are right, I did not "translate" properly the Boost vertices into Sage labels. I have corrected that line and I have added a test to check that the translation is correct. The new line is:

edges = [[int_to_vertex[result[vertex_to_int[v]]], v] for v in g.vertices() if result[vertex_to_int[v]] != no_parent]

Thank you very much!

dcoudert commented 9 years ago
comment:30

Yes, it is: if the input is undirected, the dominator tree is a star if and only the if the graph is biconnected. In a Barabasi-Albert graph, the minimum degree is 2, so I believe it is.

So for undirected graphs, we can first test if is it connected. If true, then we can directly construct and return the star (or the dict) without using boost, right? If False, then what's the output ?

What for directed graphs?

David.

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:31

Hellooooo!

Replying to @dcoudert:

So for undirected graphs, we can first test if is it connected. If true, then we can directly construct and return the star (or the dict) without using boost, right? If False, then what's the output ?

Well, not connected but biconnected. In any case, probably the dominator tree in the undirected case can be found by computing biconnected components and cut vertices. There is a linear algorithm for biconnected components, but it is not very easy to implement, and the Boost algorithm is O(m log m), which is not much worse than O(m). Is the linear algorithm for BCC implemented in the hyperbolicity algorithm?

What for directed graphs?

Well, it is more complicated... I heard that finding linear algorithms for this problem is an open issue, and the Boost algorithm should be optimal. For more information, see the Boost help page http://www.boost.org/doc/libs/1_58_0/libs/graph/doc/lengauer_tarjan_dominator.htm.

Michele

dcoudert commented 9 years ago
comment:32

Replying to @sagetrac-borassi:

Hellooooo!

Replying to @dcoudert:

So for undirected graphs, we can first test if is it connected. If true, then we can directly construct and return the star (or the dict) without using boost, right? If False, then what's the output ?

Well, not connected but biconnected.

OK, I see. Could you mention this in the description of the method? and perhaps add a simple example of a graph with 2 biconnected components. For Instance

sage: G = 2*graphs.CycleGraph(3)
sage: G.add_edge(0,3)
sage: G.dominator_tree(0, return_dict=True)
{0: None, 1: 0, 2: 0, 3: 0, 4: 3, 5: 3}

In any case, probably the dominator tree in the undirected case can be found by computing biconnected components and cut vertices. There is a linear algorithm for biconnected components, but it is not very easy to implement, and the Boost algorithm is O(m log m), which is not much worse than O(m). Is the linear algorithm for BCC implemented in the hyperbolicity algorithm?

For undirected graphs, we have B,C = G.blocks_and_cut_vertices(), where B is the list of blocks (list of vertices of each biconnected component) and C is the list of cut vertices. It's a Python implementation, but it's efficient. We use it for the hyperbolicity and many other algorithms.

However, now that I have a better understanding of the method, I think it is better to let it as it is. So improve the doc and it should be ok.

David.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

23b1952Improved documentation
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 512cfdf to 23b1952

a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:34

However, now that I have a better understanding of the method, I think it is better to let it as it is. So improve the doc and it should be ok.

Done!

dcoudert commented 9 years ago
comment:35

For me the patch is now good to go!

vbraun commented 9 years ago
comment:36

Fails on 32-bit

sage -t --long src/sage/graphs/base/boost_graph.pyx
**********************************************************************
File "src/sage/graphs/base/boost_graph.pyx", line 101, in sage.graphs.base.boost_graph.edge_connectivity
Failed example:
    edge_connectivity(g)
Expected:
    [4, [(0, 1), (0, 2), (0, 3), (0, 4)]]
Got:
    [4L, [(0L, 1L), (0L, 2L), (0L, 3L), (0L, 4L)]]
**********************************************************************
1 item had failures:
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 23b1952 to 010f213

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

010f213Changed default vertex type to int
a8f3419b-6383-42be-b93c-ecaa87929754 commented 9 years ago
comment:38

I think the problem is that on 32bit the C++ unsigned int is converted to a long. I replaced unsigned int with int for vertices, and now it should work. How can I test it on 32bit?