Add is_partial_cube - Githubissues

jaanos commented 8 years ago

This ticket adds a method is_partial_cube to the Graph class, which checks whether a graph is a partial cube. A partial cube is a graph that can be isometrically embedded into a hypercube, i.e., its vertices can be labelled with (0,1)-vectors of some fixed length such that the distance between any two vertices in the graph equals the Hamming distance of their labels. If requested, this labelling can also be returned by the method.

The code for this method comes from the PADS library by David Eppstein, which is available at http://www.ics.uci.edu/~eppstein/PADS/. The library is available under the MIT license, and I have also obtained permission from the author via email. The algorithm in question has been described in a paper available at http://arxiv.org/pdf/0705.1025.pdf.

The code has been changed to utilize Sage's structures (namely graphs, digraphs and disjoint sets). However, specialized breadth-first and depth-first search methods for graphs from the PADS library have also been added.

CC: @nathanncohen

Component: graph theory

Keywords: graphs partial cubes

Author: Janoš Vidali

Branch: 5a054a6

Reviewer: Nathann Cohen

Issue created by migration from https://trac.sagemath.org/ticket/19985

jaanos commented 8 years ago

Branch: u/jaanos/add_is_partial_cube

jaanos commented 8 years ago

Commit: 35072c3

jaanos commented 8 years ago

comment:2

Before this goes into review, here are some notes and questions:

Currently, there are no docstrings for the new methods - of course I will add them, but the current commit was intended just for testing whether the method works as expected.
Should I mention in the doc that the code has originally been made available under the MIT license?
The new BFS method is named breadth_first_level_search. It generates dictionaries whose keys are vertices at each level of the search, and values are sets of neighbours on the next level for each key vertex. This way, a vertex may appear in multiple sets on the same level - this is needed to correctly produce the labellings.
The new DFS method is named depth_first_traversal. It generates the edges followed on both ways of the depth-first search, together with the information on the direction taken (True for forward, False for backward). The method is used in the final verification step of the algorithm. Although the existing DFS could be adapted to work this way if requested, I chose to keep this a separate method to avoid needlessly degrading the performance of algorithms utilizing the generic DFS.
If a certificate is requested, the method raises an EmptySetError describing what went wrong in case the graph is not a partial cube, and returns a dictionary mapping vertices to (0,1)-strings otherwise. Is this OK, or should I change to returning pairs, as with other methods returning a certificate? Should I use some exception other than EmptySetError? I chose this as it is also used in traveling_salesman_problem.

Janoš

New commits:

`9b38105`	`Add D. Eppstein's algorithm for recognizing partial cubes`
`35072c3`	`Merge branch 'develop' into is_partial_cube`

jaanos commented 8 years ago

Author: Janoš Vidali

jaanos commented 8 years ago

Changed keywords from none to graphs partial cubes

jaanos commented 8 years ago

Description changed:

--- 
+++ 
@@ -1 +1,5 @@
+This ticket adds a method `is_partial_cube` to the `Graph` class, which checks whether a graph is a partial cube. A partial cube is a graph that can be isometrically embedded into a hypercube, i.e., its vertices can be labelled with (0,1)-vectors of some fixed length such that the distance between any two vertices in the graph equals the Hamming distance of their labels. If requested, this labelling can also be returned by the method.

+The code for this method comes from the PADS library by David Eppstein, which is available at http://www.ics.uci.edu/~eppstein/PADS/. The library is available under the MIT license, and I have also obtained permission from the author via email. The algorithm in question has been described in a paper available at http://arxiv.org/pdf/0705.1025.pdf.
+
+The code has been changed to utilize Sage's structures (namely graphs, digraphs and disjoint sets). However, specialized breadth-first and depth-first search methods for graphs from the PADS library have also been added.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 8 years ago

comment:3

Hello,

Should I mention in the doc that the code has originally been made available under the MIT license?

I do not think that you must mention the original license. You can say somewhere that this implementation is an adaptation of his code, however. Though you can ask on sage-devel if you prefer.

The new BFS method is named breadth_first_level_search. It generates dictionaries whose keys are vertices at each level of the search, and values are sets of neighbours on the next level for each key vertex. This way, a vertex may appear in multiple sets on the same level - this is needed to correctly produce the labellings.

This method is very specific to you use, and cannot be a public function of GenericGraph with a name like that. You can turn it into a private function if you prefer.

I have not looked at the code inside, but depending on your performance needs, you could prefer an implementation that does not require you to rewrite the BFS. Something like that:

sage: bfs = list(g.breadth_first_search(0,report_distance=True))
sage: layers = [set() for _ in range(bfs[-1][1]+2)]
sage: for x,d in bfs:
....:     layers[d].add(x)
sage: next_level_neighbors = {v:layers[d+1].intersection(g.neighbors(v)) for d,L in enumerate(layers) for v in L}

The new DFS method is named depth_first_traversal. It generates the edges followed on both ways of the depth-first search, together with the information on the direction taken (True for forward, False for backward). The method is used in the final verification step of the algorithm. Although the existing DFS could be adapted to work this way if requested, I chose to keep this a separate method to avoid needlessly degrading the performance of algorithms utilizing the generic DFS.

Same remark here for the name. We can't have .depth_first_search and .depth_first_traversal simultaneously, and doing different things.

If a certificate is requested, the method raises an EmptySetError describing what went wrong in case the graph is not a partial cube, and returns a dictionary mapping vertices to (0,1)-strings otherwise.

EmptySetError? Isn't the certificate usually set to None or to the empty dictionary when it goes wrong? Your code at #19586 raises no exception.

Is this OK, or should I change to returning pairs, as with other methods returning a certificate?

You don't have to return a list of pairs, a certificate can be whatever you like.

Should I use some exception other than EmptySetError? I chose this as it is also used in traveling_salesman_problem.

Oh. Well, this EmptySetError is a recent trend to mean that "a search occured, and found nothing". Which 'can' make sense for a hamiltonian cycle. Here, an EmptySetError would make me thing that the algorithm is doing something enumerative like trying all labellings.

About the implementation: this code looks long, and perhaps tricky: if you prefer you can also move it to an independent file (like it's done for 'graphs/graph_decompositions/graph_products.pyx'). The other advantage of that is that it give you more room to write doc, if you need it.

Nathann

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 8 years ago

comment:4

Also: please only put the bare minimum in a try/except.

Though I know I can't convince you of the wisdom of that until you fought for hours with an exception that was raised where you did not expect it.

jaanos commented 8 years ago

comment:5

Hi!

Thanks for the suggestions and comments, I will address them when I have time (not today). As I said, the current code closely follows the original (including exceptions) and was mainly meant for testing, so I'm willing to change it to a more acceptable form.

About the implementation: this code looks long, and perhaps tricky: if you prefer you can also move it to an independent file (like it's done for 'graphs/graph_decompositions/graph_products.pyx'). The other advantage of that is that it give you more room to write doc, if you need it.

I agree that this will probably be the best solution - then the file can also contain the search functions. Are there any additional steps needed when adding a new file?

Janoš

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 8 years ago

comment:6

Hello,

I agree that this will probably be the best solution - then the file can also contain the search functions. Are there any additional steps needed when adding a new file?

Not really for the code itself. For the doc, however, there is that:

http://doc.sagemath.org/html/en/developer/sage_manuals.html#adding-a-new-file

Nathann

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 35072c3 to 28cbf3a

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`d85cdc6`	`Add D. Eppstein's algorithm for recognizing partial cubes in a new file`
`28cbf3a`	`Add documentation for partial cubes`

jaanos commented 8 years ago

comment:8

Hi!

I've moved the functions to a new file and added the documentation. However, the last test in the docstring of is_partial_cube is not formatted properly in graph.html when I compile the documentation (but it looks fine in partial_cube.html). Any idea why?

Janoš

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 8 years ago

comment:9

I cannot tell: could you have run the compilation but missed an error? When I compiled your doc, I got something like that:

OSError: [graphs   ] /home/ncohen/.Sage/local/lib/python2.7/site-packages/sage/graphs/partial_cube.py:docstring of sage.graphs.partial_cube.is_partial_cube:19: WARNING: duplicate citation Eppstein2008, other instance in /home/ncohen/.Sage/src/doc/en/reference/graphs/sage/graphs/graph.rst

Perhaps the doc does not compile, and so the page you are looking at is never refreshed? I don't see anything wrong in its code.

By the way: the 'trick' here is to move the 'reference' section into the module's doc. This way, sphinx will not see it defined in two functions: the one in partial_cube, and the one (equal to it) in graph.py.

"Welcome to Sage development. If it's your first night here: you can't trust sphinx"

Nathann

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

5c2284f Move the reference section into the module documentation

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 28cbf3a to 5c2284f

jaanos commented 8 years ago

comment:11

OK, I fixed that, and now it looks fine.

f29946bc-ee7b-48cd-9abc-3445948c551d commented 8 years ago

comment:12

Propably a stupid question, as I know very little graph theory, but: Isn't the definition "isometrically embedded into a hypercube" defined also for non-connected graphs? I.e. it equals to definition by Hamming distance only when the graph is connected.

And a kind of bikeshedding: A user writing G.is_partial_cube? will get help that does not define partial cube but says what is the complexity and who invented algorithm.

jaanos commented 8 years ago

comment:13

Hi!

Replying to @jm58660:

Propably a stupid question, as I know very little graph theory, but: Isn't the definition "isometrically embedded into a hypercube" defined also for non-connected graphs? I.e. it equals to definition by Hamming distance only when the graph is connected.

It is, but clearly no disconnected graph G can be isometrically embeded into a connected graph H, as two vertices in distinct connected components of G will not be connected by any shortest path in H.

And a kind of bikeshedding: A user writing G.is_partial_cube? will get help that does not define partial cube but says what is the complexity and who invented algorithm.

Yes, I should change that.

By the way, should I be worried about the failing plugin on the buildbot? The log is not very helpful in telling me what exactly went wrong.

Janoš

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 5c2284f to d6fc5a3

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

d6fc5a3 Move description of partial cubes to the docstring of the main function

f29946bc-ee7b-48cd-9abc-3445948c551d commented 8 years ago

comment:15

Fails on empty graph:

sage: G = Graph()
sage: G.is_partial_cube()
ZeroDivisionError: integer division or modulo by zero

(Empty set, zero, graph without any edge etc. are corner cases that are almost always good to check.)

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

6b81097 A graph without vertices is a partial cube

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from d6fc5a3 to 6b81097

jaanos commented 8 years ago

comment:17

Hi!

Fails on empty graph:

Whoops, fixed. Thanks for pointing it out!

(Empty set, zero, graph without any edge etc. are corner cases that are almost always good to check.)

Edgeless graphs work - a graph with a single vertex is already a hypercube, while larger empty graphs are disconnected and therefore are not partial cubes.

Janoš

f29946bc-ee7b-48cd-9abc-3445948c551d commented 8 years ago

comment:18

I have no more comments.

I did some tests, like testing that cartesian products of partial cubes are partial cubes, and found no errors.

I read the code. It seems complicated due to nature of algorithm using bitwise operators. However, it has good comments inside.

I am unsure if I can not put this to positive review. Nathann, how rigorously should a reviewer read the code?

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 8 years ago

comment:19

Yo,

I am unsure if I can not put this to positive review. Nathann, how rigorously should a reviewer read the code?

There is no rule about that, as you can expect. The same way that paper reviewers sometimes do a serious job and sometimes only browse through the text. Both are called 'reviews'. Some tickets are also positively reviewed mostly because the reviewer and authors are good friends, and don't want to bother with details.

Here the 'form' of the code seems good (doc formatting, new module, imports, index of functions), and from what I read of Janoš' code I'm the one who needs to double-check when I am about to make a comment.

I would not give this ticket a positive review before I understand the algorithm and its implementation, however. And perhaps outline what it does in the module's doc while I am at it. If you give this patch a positive review, however, that's fine by me and there is nothing I can/will do against it. Everybody has his own way of reviewing things.

We have very strict rules about where double columns should be put, and a ticket is easily set to needs_work because of that. But there is no way to check how honestly a reviewer inspects the mathematical code, so there we have to trust each other.

Life.

Nathann

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 8 years ago

comment:20

Hello,

Here is a first-pass review.

Has a simpler expression:

-    CG = DiGraph({v: {w: (v, w) for w in G[v]} for v in G})
+    CG = DiGraph(G)

That's a full graph copy, right? O_o
```
+        if not Graph(CG).is_bipartite():
```
Wouldn't it be better to maintain the undirected graph too? At this level, I would almost be tempted by GenericGraph.is_bipartite(CG) O_o
CG/UF/NL I saw that they were the names from the original source code, but would you mind 'expanding' them?
Don't know if that interests you, but by relabelling the graph on 0,...,n you could use lists instead of dictionaries, e.g. for bitvec.
Computing the number of connected components is not necessary to initialize the graphs, and has a linear-time cost.
```
-        NG = DiGraph(labeled.connected_components_number())
+        NG = DiGraph()
```

What about NG.has_edge(vi,wi)?

+                if wi in NG.neighbors_out(vi):

As you do not seem to do anything with the dictionaries stored in action, could you turn them into sets instead?
```
-    action = {v: {} for v in g}
+    action = {v: set() for v in g}
```
What is the purpose of all the code that appears after this line?
```
+    g = DiGraph({v: {w: UF.find((v, w)) for w in G[v]} for v in G})
```
It does not appear in partial_cube.py but in medium.py, and I do not see what you need it for.

Nathann

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 6b81097 to 618ab73

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

618ab73 Rename variables and use sets for actions

jaanos commented 8 years ago

comment:22

Hi!

Has a simpler expression:

-    CG = DiGraph({v: {w: (v, w) for w in G[v]} for v in G})
+    CG = DiGraph(G)

The point here is to set edge labels. Simply constructing a DiGraph from G will keep the old edge labels (if any).

That's a full graph copy, right? O_o
+        if not Graph(CG).is_bipartite():
Wouldn't it be better to maintain the undirected graph too? At this level, I would almost be tempted by GenericGraph.is_bipartite(CG) O_o

I think it may be actually enough to check that the original graph is bipartite, and then also all contractions must be bipartite. I will ask the author, just to be sure.

CG/UF/NL I saw that they were the names from the original source code, but would you mind 'expanding' them?

OK, they are now contracted, unionfind and available now, and LG and NG have been changed to level and newgraph, respectively.

Don't know if that interests you, but by relabelling the graph on 0,...,n you could use lists instead of dictionaries, e.g. for bitvec.

I could, but since we might be building a dictionary as a certificate anyway, I don't think there's much gain here.

Computing the number of connected components is not necessary to initialize the graphs, and has a linear-time cost.
-        NG = DiGraph(labeled.connected_components_number())
+        NG = DiGraph()
What about NG.has_edge(vi,wi)?
+                if wi in NG.neighbors_out(vi):

OK, both changed.

As you do not seem to do anything with the dictionaries stored in action, could you turn them into sets instead?
-    action = {v: {} for v in g}
+    action = {v: set() for v in g}

The original algorithm used the values in the dictionaries to reconstruct the graph. Indeed, here we don't use them, so I changed the dictionaries to sets.

What is the purpose of all the code that appears after this line?
+    g = DiGraph({v: {w: UF.find((v, w)) for w in G[v]} for v in G})
It does not appear in partial_cube.py but in medium.py, and I do not see what you need it for.

In the case the graph is not a partial cube, the first part of the algorithm may still produce a labelling, which then needs to be checked (see for example Lemma 11 in the paper). An example would be a graph with graph6 string Fs_Wo - this fails the check that no two edges on the same vertex should have the same label.

Janoš

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 8 years ago

comment:23

Hellooooo,

The point here is to set edge labels. Simply constructing a DiGraph from G will keep the old edge labels (if any).

Dead right, my mistake.

I think it may be actually enough to check that the original graph is bipartite, and then also all contractions must be bipartite. I will ask the author, just to be sure.

Okay !

OK, they are now contracted, unionfind and available now, and LG and NG have been changed to level and newgraph, respectively.

Thanks,

I could, but since we might be building a dictionary as a certificate anyway, I don't think there's much gain here.

Okayokay. I thought a bit about it, and it would be much more troublesome than I first expected indeed.

In the case the graph is not a partial cube, the first part of the algorithm may still produce a labelling, which then needs to be checked (see for example Lemma 11 in the paper). An example would be a graph with graph6 string Fs_Wo - this fails the check that no two edges on the same vertex should have the same label.

I thought a bit about this, and it does not seem very hard to check that a vertex labelling is correct (a sample of code that does it follows) but perhaps this code is 'theoretically' n^2log(n) instead of n^2. Considering the efficiency of Python (the n^2 part) and the efficiency of the log(n) part (bitwise operations), however, I wonder if it could not be faster than another Python n^2 implementation. What do you think?

This code assumes what I believe we can assume: that after the labelling is picked, two adjacent vertices differ by one 'bit' and that this bit is the edge's label. This amounts to being sure that the graph is indeed a subgraph of a high-dimension cube, but not that it is isometric. In order to check that it is isometric, I believe that it is sufficient to run the following code.

def check_labelling(G):
    # Assumptions:
    #
    # - We assume that the vertices are integers, whose base-2 representation
    #   represents the coordinates
    #
    # - We also assume that every edge corresponds to the switching of one bit
    #   in that base-2 representation or a vertex' name

    # This dictionary associates to each vertex the set of bits whose switching
    # yields one of its neighbors:

    action = {v: set() for v in G}
    for u,v in G.edge_iterator(labels=False):
        diff = u^v
        action[u].add(diff)
        action[v].add(diff)

    action_bitset = {v: sum(action[v]) for v in G}

    # We now check that, for every pair u,v of vertices, there is a neighbor u'
    # of u such that:
    #     hamming_distance(u,v) = hammin_distance(u',v) + 1

    for u in G:
        for v in G:
            diff = u^v
            if (action_bitset[u] & diff == 0 and
                u != v):
                return False

    return True

Example in Sage:

sage: g = graphs.CubeGraph(5).relabel(lambda x:Integer(x,base=2),inplace=False)
sage: check_labelling(g)
True
sage: g.delete_edge(8,24)
sage: check_labelling(g)
False

Please tell me what you think,

Nathann

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 8 years ago

comment:24

Err.. My mistake, when the graph is a tree that's a n^3 algorithm :-/

Nathann

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 8 years ago

comment:25

That code was apparently a naive implementation of Lemma 13. Only it isn't easy to read through his fancy definitions. Re-defining an 'oriented rooted tree' is particularly bad move. Anyway. I have to read more.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 8 years ago

comment:26

This being said, isn't this of complexity nm already?

+        for v, w in contracted.edge_iterator(labels = False):
+            diff = bitvec[v]^bitvec[w]

Another small thing: len(contracted[v]) should be contracted.out_degree(v) (does not build the list of neighbors).

jaanos commented 8 years ago

comment:27

Hi!

This being said, isn't this of complexity nm already?
+        for v, w in contracted.edge_iterator(labels = False):
+            diff = bitvec[v]^bitvec[w]

The paper assumes that a machine word can store log(n) bits, so bitwise operations can be done in O(d/log(n)) time, where d is the degree of the root (these sum up to at most n-1 over all iterations of the contraction loop). Since the number of edges is checked to be O(n log(n)), the total time needed for bitwise operations is O(n^2). Of course, failing this assumption, the time complexity here is indeed O(nm) = O(n^2 log(n)).

Another small thing: len(contracted[v]) should be contracted.out_degree(v) (does not build the list of neighbors).

Will take care of that.

Janoš

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 618ab73 to d0eaded

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

d0eaded Only check for bipartiteness once

jaanos commented 8 years ago

comment:29

OK, bipartiteness is now only checked once, without needing to make any copies.

Should we perhaps mention that the algorithm is really O(n^2 log(n)), only for all practical purposes it can be considered a quadratic algorithm?

Janoš

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 8 years ago

comment:30

Hello,

OK, bipartiteness is now only checked once, without needing to make any copies.

Okay, cool !

Should we perhaps mention that the algorithm is really O(n^2 log(n)), only for all practical purposes it can be considered a quadratic algorithm?

Yeah probably, though that assumption is rather 'fair'. Don't bother for the moment. I am not done understanding the algorithm, and I will probably add some doc eventually.

Nathann

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 8 years ago

comment:31

Sorry but what do you mean by Pos in StatesForPos? All I can think of is position and I have no idea of what it means in this context O_o

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 8 years ago

comment:32

Is this right?

activeForState -> state_to_smallest_active_token; and
statesForPos -> active_token_to_states

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 8 years ago

comment:33

Hello again !

I added a commit at public/19985 that contains some doc. I am not done understanding the 'checking' part of the algorithm. You are of course welcome to discuss any of the changes.

Nathann

jaanos commented 8 years ago

Changed commit from d0eaded to cb7b99a

jaanos commented 8 years ago

Changed branch from u/jaanos/add_is_partial_cube to public/19985

jaanos commented 8 years ago

comment:34

Hi!

Is this right?

activeForState -> state_to_smallest_active_token; and

statesForPos -> active_token_to_states

Yes, that seems right. Pos apparently refers to the position in the activeTokens list.

Your changes look fine, so I'm changing to your branch (Sage is still compiling for me though, so I haven't tested yet). Thanks for extending the documentation!

Janoš

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from cb7b99a to d1624e4

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

d1624e4 Rename two variables, small fixes in the documentation

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from d1624e4 to 372d658

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

372d658 trac #19985: Small changes

sagemath / sage

Add is_partial_cube #19985