Closed jaanos closed 8 years ago
Branch: u/jaanos/add_is_partial_cube
Before this goes into review, here are some notes and questions:
breadth_first_level_search
. It generates dictionaries whose keys are vertices at each level of the search, and values are sets of neighbours on the next level for each key vertex. This way, a vertex may appear in multiple sets on the same level - this is needed to correctly produce the labellings.depth_first_traversal
. It generates the edges followed on both ways of the depth-first search, together with the information on the direction taken (True
for forward, False
for backward). The method is used in the final verification step of the algorithm. Although the existing DFS could be adapted to work this way if requested, I chose to keep this a separate method to avoid needlessly degrading the performance of algorithms utilizing the generic DFS.certificate
is requested, the method raises an EmptySetError
describing what went wrong in case the graph is not a partial cube, and returns a dictionary mapping vertices to (0,1)-strings otherwise. Is this OK, or should I change to returning pairs, as with other methods returning a certificate? Should I use some exception other than EmptySetError
? I chose this as it is also used in traveling_salesman_problem
.Janoš
New commits:
9b38105 | Add D. Eppstein's algorithm for recognizing partial cubes |
35072c3 | Merge branch 'develop' into is_partial_cube |
Author: Janoš Vidali
Changed keywords from none to graphs partial cubes
Description changed:
---
+++
@@ -1 +1,5 @@
+This ticket adds a method `is_partial_cube` to the `Graph` class, which checks whether a graph is a partial cube. A partial cube is a graph that can be isometrically embedded into a hypercube, i.e., its vertices can be labelled with (0,1)-vectors of some fixed length such that the distance between any two vertices in the graph equals the Hamming distance of their labels. If requested, this labelling can also be returned by the method.
+The code for this method comes from the PADS library by David Eppstein, which is available at http://www.ics.uci.edu/~eppstein/PADS/. The library is available under the MIT license, and I have also obtained permission from the author via email. The algorithm in question has been described in a paper available at http://arxiv.org/pdf/0705.1025.pdf.
+
+The code has been changed to utilize Sage's structures (namely graphs, digraphs and disjoint sets). However, specialized breadth-first and depth-first search methods for graphs from the PADS library have also been added.
Hello,
- Should I mention in the doc that the code has originally been made available under the MIT license?
I do not think that you must mention the original license. You can say somewhere that this implementation is an adaptation of his code, however. Though you can ask on sage-devel if you prefer.
- The new BFS method is named
breadth_first_level_search
. It generates dictionaries whose keys are vertices at each level of the search, and values are sets of neighbours on the next level for each key vertex. This way, a vertex may appear in multiple sets on the same level - this is needed to correctly produce the labellings.
This method is very specific to you use, and cannot be a public function of GenericGraph with a name like that. You can turn it into a private function if you prefer.
I have not looked at the code inside, but depending on your performance needs, you could prefer an implementation that does not require you to rewrite the BFS. Something like that:
sage: bfs = list(g.breadth_first_search(0,report_distance=True))
sage: layers = [set() for _ in range(bfs[-1][1]+2)]
sage: for x,d in bfs:
....: layers[d].add(x)
sage: next_level_neighbors = {v:layers[d+1].intersection(g.neighbors(v)) for d,L in enumerate(layers) for v in L}
- The new DFS method is named
depth_first_traversal
. It generates the edges followed on both ways of the depth-first search, together with the information on the direction taken (True
for forward,False
for backward). The method is used in the final verification step of the algorithm. Although the existing DFS could be adapted to work this way if requested, I chose to keep this a separate method to avoid needlessly degrading the performance of algorithms utilizing the generic DFS.
Same remark here for the name. We can't have .depth_first_search
and .depth_first_traversal
simultaneously, and doing different things.
- If a
certificate
is requested, the method raises anEmptySetError
describing what went wrong in case the graph is not a partial cube, and returns a dictionary mapping vertices to (0,1)-strings otherwise.
EmptySetError
? Isn't the certificate usually set to None
or to the empty dictionary when it goes wrong? Your code at #19586 raises no exception.
Is this OK, or should I change to returning pairs, as with other methods returning a certificate?
You don't have to return a list of pairs, a certificate can be whatever you like.
Should I use some exception other than
EmptySetError
? I chose this as it is also used intraveling_salesman_problem
.
Oh. Well, this EmptySetError
is a recent trend to mean that "a search occured, and found nothing". Which 'can' make sense for a hamiltonian cycle. Here, an EmptySetError
would make me thing that the algorithm is doing something enumerative like trying all labellings.
About the implementation: this code looks long, and perhaps tricky: if you prefer you can also move it to an independent file (like it's done for 'graphs/graph_decompositions/graph_products.pyx'). The other advantage of that is that it give you more room to write doc, if you need it.
Nathann
Also: please only put the bare minimum in a try/except.
Though I know I can't convince you of the wisdom of that until you fought for hours with an exception that was raised where you did not expect it.
Hi!
Thanks for the suggestions and comments, I will address them when I have time (not today). As I said, the current code closely follows the original (including exceptions) and was mainly meant for testing, so I'm willing to change it to a more acceptable form.
About the implementation: this code looks long, and perhaps tricky: if you prefer you can also move it to an independent file (like it's done for 'graphs/graph_decompositions/graph_products.pyx'). The other advantage of that is that it give you more room to write doc, if you need it.
I agree that this will probably be the best solution - then the file can also contain the search functions. Are there any additional steps needed when adding a new file?
Janoš
Hello,
I agree that this will probably be the best solution - then the file can also contain the search functions. Are there any additional steps needed when adding a new file?
Not really for the code itself. For the doc, however, there is that:
http://doc.sagemath.org/html/en/developer/sage_manuals.html#adding-a-new-file
Nathann
Hi!
I've moved the functions to a new file and added the documentation. However, the last test in the docstring of is_partial_cube
is not formatted properly in graph.html
when I compile the documentation (but it looks fine in partial_cube.html
). Any idea why?
Janoš
I cannot tell: could you have run the compilation but missed an error? When I compiled your doc, I got something like that:
OSError: [graphs ] /home/ncohen/.Sage/local/lib/python2.7/site-packages/sage/graphs/partial_cube.py:docstring of sage.graphs.partial_cube.is_partial_cube:19: WARNING: duplicate citation Eppstein2008, other instance in /home/ncohen/.Sage/src/doc/en/reference/graphs/sage/graphs/graph.rst
Perhaps the doc does not compile, and so the page you are looking at is never refreshed? I don't see anything wrong in its code.
By the way: the 'trick' here is to move the 'reference' section into the module's doc. This way, sphinx will not see it defined in two functions: the one in partial_cube
, and the one (equal to it) in graph.py
.
"Welcome to Sage development. If it's your first night here: you can't trust sphinx"
Nathann
Nathann
Branch pushed to git repo; I updated commit sha1. New commits:
5c2284f | Move the reference section into the module documentation |
OK, I fixed that, and now it looks fine.
Propably a stupid question, as I know very little graph theory, but: Isn't the definition "isometrically embedded into a hypercube" defined also for non-connected graphs? I.e. it equals to definition by Hamming distance only when the graph is connected.
And a kind of bikeshedding: A user writing G.is_partial_cube?
will get help that does not define partial cube but says what is the complexity and who invented algorithm.
Hi!
Replying to @jm58660:
Propably a stupid question, as I know very little graph theory, but: Isn't the definition "isometrically embedded into a hypercube" defined also for non-connected graphs? I.e. it equals to definition by Hamming distance only when the graph is connected.
It is, but clearly no disconnected graph G can be isometrically embeded into a connected graph H, as two vertices in distinct connected components of G will not be connected by any shortest path in H.
And a kind of bikeshedding: A user writing
G.is_partial_cube?
will get help that does not define partial cube but says what is the complexity and who invented algorithm.
Yes, I should change that.
By the way, should I be worried about the failing plugin on the buildbot? The log is not very helpful in telling me what exactly went wrong.
Janoš
Branch pushed to git repo; I updated commit sha1. New commits:
d6fc5a3 | Move description of partial cubes to the docstring of the main function |
Fails on empty graph:
sage: G = Graph()
sage: G.is_partial_cube()
ZeroDivisionError: integer division or modulo by zero
(Empty set, zero, graph without any edge etc. are corner cases that are almost always good to check.)
Branch pushed to git repo; I updated commit sha1. New commits:
6b81097 | A graph without vertices is a partial cube |
Hi!
Fails on empty graph:
Whoops, fixed. Thanks for pointing it out!
(Empty set, zero, graph without any edge etc. are corner cases that are almost always good to check.)
Edgeless graphs work - a graph with a single vertex is already a hypercube, while larger empty graphs are disconnected and therefore are not partial cubes.
Janoš
I have no more comments.
I did some tests, like testing that cartesian products of partial cubes are partial cubes, and found no errors.
I read the code. It seems complicated due to nature of algorithm using bitwise operators. However, it has good comments inside.
I am unsure if I can not put this to positive review. Nathann, how rigorously should a reviewer read the code?
Yo,
I am unsure if I can not put this to positive review. Nathann, how rigorously should a reviewer read the code?
There is no rule about that, as you can expect. The same way that paper reviewers sometimes do a serious job and sometimes only browse through the text. Both are called 'reviews'. Some tickets are also positively reviewed
mostly because the reviewer and authors are good friends, and don't want to bother with details.
Here the 'form' of the code seems good (doc formatting, new module, imports, index of functions), and from what I read of Janoš' code I'm the one who needs to double-check when I am about to make a comment.
I would not give this ticket a positive review before I understand the algorithm and its implementation, however. And perhaps outline what it does in the module's doc while I am at it. If you give this patch a positive review, however, that's fine by me and there is nothing I can/will do against it. Everybody has his own way of reviewing things.
We have very strict rules about where double columns should be put, and a ticket is easily set to needs_work
because of that. But there is no way to check how honestly a reviewer inspects the mathematical code, so there we have to trust each other.
Life.
Nathann
Hello,
Here is a first-pass review.
Has a simpler expression:
- CG = DiGraph({v: {w: (v, w) for w in G[v]} for v in G})
+ CG = DiGraph(G)
That's a full graph copy, right? O_o
+ if not Graph(CG).is_bipartite():
Wouldn't it be better to maintain the undirected graph too? At this level, I
would almost be tempted by GenericGraph.is_bipartite(CG)
O_o
CG/UF/NL
I saw that they were the names from the original source code, but
would you mind 'expanding' them?
Don't know if that interests you, but by relabelling the graph on 0,...,n
you could use lists instead of dictionaries, e.g. for bitvec
.
Computing the number of connected components is not necessary to initialize the graphs, and has a linear-time cost.
- NG = DiGraph(labeled.connected_components_number())
+ NG = DiGraph()
What about NG.has_edge(vi,wi)
?
+ if wi in NG.neighbors_out(vi):
As you do not seem to do anything with the dictionaries stored in action
,
could you turn them into sets instead?
- action = {v: {} for v in g}
+ action = {v: set() for v in g}
What is the purpose of all the code that appears after this line?
+ g = DiGraph({v: {w: UF.find((v, w)) for w in G[v]} for v in G})
It does not appear in partial_cube.py
but in medium.py
, and I do not see
what you need it for.
Nathann
Branch pushed to git repo; I updated commit sha1. New commits:
618ab73 | Rename variables and use sets for actions |
Hi!
Has a simpler expression:
- CG = DiGraph({v: {w: (v, w) for w in G[v]} for v in G}) + CG = DiGraph(G)
The point here is to set edge labels. Simply constructing a DiGraph
from G
will keep the old edge labels (if any).
That's a full graph copy, right?
O_o
+ if not Graph(CG).is_bipartite():
Wouldn't it be better to maintain the undirected graph too? At this level, I would almost be tempted by
GenericGraph.is_bipartite(CG)
O_o
I think it may be actually enough to check that the original graph is bipartite, and then also all contractions must be bipartite. I will ask the author, just to be sure.
CG/UF/NL
I saw that they were the names from the original source code, but would you mind 'expanding' them?
OK, they are now contracted
, unionfind
and available
now, and LG
and NG
have been changed to level
and newgraph
, respectively.
- Don't know if that interests you, but by relabelling the graph on
0,...,n
you could use lists instead of dictionaries, e.g. forbitvec
.
I could, but since we might be building a dictionary as a certificate anyway, I don't think there's much gain here.
Computing the number of connected components is not necessary to initialize the graphs, and has a linear-time cost.
- NG = DiGraph(labeled.connected_components_number()) + NG = DiGraph()
What about
NG.has_edge(vi,wi)
?+ if wi in NG.neighbors_out(vi):
OK, both changed.
As you do not seem to do anything with the dictionaries stored in
action
, could you turn them into sets instead?- action = {v: {} for v in g} + action = {v: set() for v in g}
The original algorithm used the values in the dictionaries to reconstruct the graph. Indeed, here we don't use them, so I changed the dictionaries to sets.
What is the purpose of all the code that appears after this line?
+ g = DiGraph({v: {w: UF.find((v, w)) for w in G[v]} for v in G})
It does not appear in
partial_cube.py
but inmedium.py
, and I do not see what you need it for.
In the case the graph is not a partial cube, the first part of the algorithm may still produce a labelling, which then needs to be checked (see for example Lemma 11 in the paper). An example would be a graph with graph6 string Fs_Wo
- this fails the check that no two edges on the same vertex should have the same label.
Janoš
Hellooooo,
The point here is to set edge labels. Simply constructing a
DiGraph
fromG
will keep the old edge labels (if any).
Dead right, my mistake.
I think it may be actually enough to check that the original graph is bipartite, and then also all contractions must be bipartite. I will ask the author, just to be sure.
Okay !
OK, they are now
contracted
,unionfind
andavailable
now, andLG
andNG
have been changed tolevel
andnewgraph
, respectively.
Thanks,
I could, but since we might be building a dictionary as a certificate anyway, I don't think there's much gain here.
Okayokay. I thought a bit about it, and it would be much more troublesome than I first expected indeed.
In the case the graph is not a partial cube, the first part of the algorithm may still produce a labelling, which then needs to be checked (see for example Lemma 11 in the paper). An example would be a graph with graph6 string
Fs_Wo
- this fails the check that no two edges on the same vertex should have the same label.
I thought a bit about this, and it does not seem very hard to check that a vertex labelling is correct (a sample of code that does it follows) but perhaps this code is 'theoretically' n^2log(n)
instead of n^2
. Considering the efficiency of Python (the n^2
part) and the efficiency of the log(n)
part (bitwise operations), however, I wonder if it could not be faster than another Python n^2
implementation. What do you think?
This code assumes what I believe we can assume: that after the labelling is picked, two adjacent vertices differ by one 'bit' and that this bit is the edge's label. This amounts to being sure that the graph is indeed a subgraph of a high-dimension cube, but not that it is isometric. In order to check that it is isometric, I believe that it is sufficient to run the following code.
def check_labelling(G):
# Assumptions:
#
# - We assume that the vertices are integers, whose base-2 representation
# represents the coordinates
#
# - We also assume that every edge corresponds to the switching of one bit
# in that base-2 representation or a vertex' name
# This dictionary associates to each vertex the set of bits whose switching
# yields one of its neighbors:
action = {v: set() for v in G}
for u,v in G.edge_iterator(labels=False):
diff = u^v
action[u].add(diff)
action[v].add(diff)
action_bitset = {v: sum(action[v]) for v in G}
# We now check that, for every pair u,v of vertices, there is a neighbor u'
# of u such that:
# hamming_distance(u,v) = hammin_distance(u',v) + 1
for u in G:
for v in G:
diff = u^v
if (action_bitset[u] & diff == 0 and
u != v):
return False
return True
Example in Sage:
sage: g = graphs.CubeGraph(5).relabel(lambda x:Integer(x,base=2),inplace=False)
sage: check_labelling(g)
True
sage: g.delete_edge(8,24)
sage: check_labelling(g)
False
Please tell me what you think,
Nathann
Err.. My mistake, when the graph is a tree that's a n^3
algorithm :-/
Nathann
That code was apparently a naive implementation of Lemma 13. Only it isn't easy to read through his fancy definitions. Re-defining an 'oriented rooted tree' is particularly bad move. Anyway. I have to read more.
This being said, isn't this of complexity nm
already?
+ for v, w in contracted.edge_iterator(labels = False):
+ diff = bitvec[v]^bitvec[w]
Another small thing: len(contracted[v])
should be contracted.out_degree(v)
(does not build the list of neighbors).
Hi!
This being said, isn't this of complexity
nm
already?+ for v, w in contracted.edge_iterator(labels = False): + diff = bitvec[v]^bitvec[w]
The paper assumes that a machine word can store log(n)
bits, so bitwise operations can be done in O(d/log(n))
time, where d
is the degree of the root (these sum up to at most n-1
over all iterations of the contraction loop). Since the number of edges is checked to be O(n log(n))
, the total time needed for bitwise operations is O(n^2)
. Of course, failing this assumption, the time complexity here is indeed O(nm) = O(n^2 log(n))
.
Another small thing:
len(contracted[v])
should becontracted.out_degree(v)
(does not build the list of neighbors).
Will take care of that.
Janoš
Branch pushed to git repo; I updated commit sha1. New commits:
d0eaded | Only check for bipartiteness once |
OK, bipartiteness is now only checked once, without needing to make any copies.
Should we perhaps mention that the algorithm is really O(n^2 log(n))
, only for all practical purposes it can be considered a quadratic algorithm?
Janoš
Hello,
OK, bipartiteness is now only checked once, without needing to make any copies.
Okay, cool !
Should we perhaps mention that the algorithm is really
O(n^2 log(n))
, only for all practical purposes it can be considered a quadratic algorithm?
Yeah probably, though that assumption is rather 'fair'. Don't bother for the moment. I am not done understanding the algorithm, and I will probably add some doc eventually.
Nathann
Sorry but what do you mean by Pos
in StatesForPos
? All I can think of is position
and I have no idea of what it means in this context O_o
Is this right?
activeForState -> state_to_smallest_active_token
; andstatesForPos -> active_token_to_states
Hello again !
I added a commit at public/19985 that contains some doc. I am not done understanding the 'checking' part of the algorithm. You are of course welcome to discuss any of the changes.
Nathann
Changed branch from u/jaanos/add_is_partial_cube to public/19985
Hi!
Is this right?
activeForState -> state_to_smallest_active_token
; andstatesForPos -> active_token_to_states
Yes, that seems right. Pos
apparently refers to the position in the activeTokens
list.
Your changes look fine, so I'm changing to your branch (Sage is still compiling for me though, so I haven't tested yet). Thanks for extending the documentation!
Janoš
Branch pushed to git repo; I updated commit sha1. New commits:
d1624e4 | Rename two variables, small fixes in the documentation |
Branch pushed to git repo; I updated commit sha1. New commits:
372d658 | trac #19985: Small changes |
This ticket adds a method
is_partial_cube
to theGraph
class, which checks whether a graph is a partial cube. A partial cube is a graph that can be isometrically embedded into a hypercube, i.e., its vertices can be labelled with (0,1)-vectors of some fixed length such that the distance between any two vertices in the graph equals the Hamming distance of their labels. If requested, this labelling can also be returned by the method.The code for this method comes from the PADS library by David Eppstein, which is available at http://www.ics.uci.edu/~eppstein/PADS/. The library is available under the MIT license, and I have also obtained permission from the author via email. The algorithm in question has been described in a paper available at http://arxiv.org/pdf/0705.1025.pdf.
The code has been changed to utilize Sage's structures (namely graphs, digraphs and disjoint sets). However, specialized breadth-first and depth-first search methods for graphs from the PADS library have also been added.
CC: @nathanncohen
Component: graph theory
Keywords: graphs partial cubes
Author: Janoš Vidali
Branch:
5a054a6
Reviewer: Nathann Cohen
Issue created by migration from https://trac.sagemath.org/ticket/19985