Closed dcoudert closed 9 years ago
Author: David Coudert
Description changed:
---
+++
@@ -1 +1,4 @@
+This patch adds two cutting rules, one for the `'basic'` algorithm and one for the `'cuts'` algorithm.
+* the new `'basic+'` algorithm uses the new cutting rule and is always faster than the `'basic'` one.
+* the `'cuts+'` algorithm is now an exact algorithm using the notion of far-apart pairs to reduce the size of the search space. It is sometimes faster than the `'cuts'` algorithm although it has a longer pre-processing time. It is faster for instance on rectangular grids, petersen, and on graphs such that the hyperbolicity is small compared to the diameter/2.
Some examples
sage: from sage.graphs.hyperbolicity import hyperbolicity
sage: G = graphs.PetersenGraph()
sage: %time hyperbolicity(G, algorithm='basic')
CPU times: user 2 ms, sys: 0 ns, total: 2 ms
Wall time: 1.16 ms
(1/2, [0, 1, 2, 3], 1/2)
sage: %time hyperbolicity(G, algorithm='basic+')
CPU times: user 1 ms, sys: 0 ns, total: 1 ms
Wall time: 1.13 ms
(1/2, [0, 1, 2, 3], 1/2)
sage: %time hyperbolicity(G, algorithm='cuts')
CPU times: user 2 ms, sys: 0 ns, total: 2 ms
Wall time: 1.19 ms
(1/2, [0, 1, 2, 3], 1/2)
sage: %time hyperbolicity(G, algorithm='cuts+')
CPU times: user 1e+03 µs, sys: 0 ns, total: 1e+03 µs
Wall time: 776 µs
(1/2, [0, 1, 2, 3], 1/2)
sage: G = graphs.Grid2dGraph(20,10)
sage: %time hyperbolicity(G, algorithm='basic')
CPU times: user 681 ms, sys: 3 ms, total: 684 ms
Wall time: 682 ms
(9, [(0, 0), (0, 9), (9, 0), (9, 9)], 9)
sage: %time hyperbolicity(G, algorithm='basic+')
CPU times: user 52 ms, sys: 0 ns, total: 52 ms
Wall time: 49 ms
(9, [(0, 0), (0, 9), (9, 0), (9, 9)], 9)
sage: %time hyperbolicity(G, algorithm='cuts')
CPU times: user 22 ms, sys: 1 ms, total: 23 ms
Wall time: 21.5 ms
(9, [(0, 0), (0, 9), (19, 0), (19, 9)], 9)
sage: %time hyperbolicity(G, algorithm='cuts+')
CPU times: user 17 ms, sys: 2 ms, total: 19 ms
Wall time: 16.2 ms
(9, [(0, 0), (0, 9), (19, 0), (19, 9)], 9)
New commits:
ee216e2 | #17227: adds far apart pairs and cleanup the code |
Please split this commit into thematic changes. Reviewing code like that takes a lot of time and you make it more complicated by doing everything at once.
As I do not know what exactly you do in this patch, all I can do is look at the diff, and try to understand. Please look at this diff, you will see what the reviewer has to go through.
Thanks,
Nathann
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
f6ad07d | trac #17227: update module documentation |
b0c2006 | trac #17227: adds basic+ algorithm |
d7de77a | trac #17227: adds verbose parameter to greedy dominating set |
6b61e14 | trac #17227: add a method to compute both distances and far-apart pairs |
18dfcf2 | trac #17227: change the behavior of the cuts+ method that now uses far-apart pairs |
632bbb6 | trac #17227: remove methods that are no longer used |
329e8bf | trac #17227: remove other methods that are no longer used |
5529a33 | trac #17227: fix various indentation errors and small bugs |
Hello Nathann.
I hope this patch will be easier to review now. I tried to do more thematic commits.
I'm still having a hard time with git. I had to force the push... but it should be ok now.
Thanks.
Hello !
It is normal that you had to force the update, no problem there. Git only accepts "additional commits" without complaints, for everything else you have to push.
I am backpacking in Mumbai right now but I go home (=Chennai) tomorrow and I will be back to work next Monday, and on this review in particular.
Also, note that your branch does not merge with the latest release.
Nathann
I unsuccessfully tried to rebase on up-to-date develop branch :( How can I solve this problem??
The conflict is only an import statement at the beginning of distance_all_pairs
, so no problem, you can just merge your branch with the latest beta.
If you want to know how your rebasing problem can be solved I need more than "I unsuccesfully tried", though.
Nathann
Changed branch from u/dcoudert/add_new_cutting_rules_for_computing_hyperbolicity to public/17227
New commits:
70d5c4f | trac #17227: Merged with 6.5.beta0 |
Branch pushed to git repo; I updated commit sha1. New commits:
4b87328 | trac #17227: change include bitset to comply with patch 17196 |
Thanks a lot Nathann for resolving my problem. The terms used for git are not obvious for me, and so all actions are complicated.
David.
Yo David !
I just finished the first review. It is 16h38 right now, and I have been working on this code for the last 4h hours. It is too much. This ticket should have been implemented into several small tickets. It is also very very difficult to keep track of what is happening, as the main functions handle many different algorithms at the same time. Please think for a while about how you could make it more modular, so that it becomes easier to understand in the future.
Besides: the function hyperbolicity
contains the following algorithms:
"'basic'", "'basic+'", "'cuts'", "'cuts+'", "'dom'"
. Not only the names
are unclear, but that is probably too much. I do not expect that all will be
useful to users, or useful for us to keep, especially when handling them all
makes the code so complicated. Could you think about those we do not really
need, to remove them in another ticket ?
Besides making the comments below, I implemented some modifications to understand the code a bit better. The most important changes, in terms of lines of code, only move "malloc" together, so that memory allocation problems are all checked at the same time.
Here is the review:
For each of the three methods explained in the section "Algorithms and
complexity" of the module header, could you mention the corresponding flag of
hyperbolicity
?
basic+
: you should have kept this for a different patch. It is not
necessary to implement it with the rest and it makes the review of the big
thing harder.
You say in the doc that Soto proved that hyp(a,b,c,d)<= dist(a,b)/2
but in
the code we have
if 2*distances[a][b] <= h_LB:
continue
I expected distances[a][b]/2 <= h_LB
. Or actually, maybe the bound proved by
Soto was actually hyp(a,b,c,d)<= 2*dist(a,b)
(in which case the code is
correct and the doc is wrong), as otherwise his bound is better than ours.
It is more compact to make all memory allocations at the same place, and only
have one block of code to free everything if there was a problem. free(NULL)
is valid.
your unsigned short ** c_far_apart
only stores 0s and 1 (the amount of
unused memory is 15/16=93%
:-P
). We have something in mic/binary_matrix
that you may like (though it may not be worth the extra complexity in this
case).
The function which returns all far-apart pairs considers points in different connected components to be a far-away pair. Is that what you want ? Either way, please document it.
The documentation of __hyperbolicity__
claims that the function implements an
algorithm from CCL12. Also, the comment just above the function talks about a
decreasing pair ordering.
There are many variables that you specialise to uint32_t when actually nobody cares. The code would be easier to read if they were int variables.
__hyperbolicity__
is only meant for connected graphs (you ask for a
diameter) -- could you write this explicitly somewhere ?
... after something like two hours of review: this patch is too big.
What about renaming this "STOP" thing to "GOTO_RETURN" ?
Please do not write a commit that changes the indentation of a block of code. Modify the first commit itself.
Is there any reason why this block
# Termination if required approximation is found
Is not on the same level as
# If we cannot improve further, we stop
It feels like the two play the same role in the algorithm.
You define two values for 'STOP' but you never use them. Was that meant to only leave one loop instead or two, and remove one of the two copies of the blocks:
# If we cannot improve further, we stop
# Termination if required approximation is found
It would be a nice change indeed, the implementation is already sufficiently complicated.
h_UB if STOP==2 else h
: why don't you just set the right value of h_UB
right before setting STOP==2
?
In "hyperbolicity" a large portion of the code deals with the block-decomposition. Why couldn't you just call hyperbolicity on each block ? You would not have to allocate memory in a loop, and the is_clique test would be done automatically too (+the others)... You may lose a bit of time as the block_decomposition would be recomputed on each block, but I do not think that it is such a big problem compared to the cost of the algorithm. Actually, I doubt that many real instances have a cut-vertex. That would remove a big chunck of code, too.
while len(DOM)<4: DOM.append(H.random_vertex())
what if that new vertex
was in DOM
already ?
Nathann
Replying to @nathanncohen:
Yo David !
I just finished the first review. It is 16h38 right now, and I have been working on this code for the last 4h hours. It is too much. This ticket should have been implemented into several small tickets. It is also very very difficult to keep track of what is happening, as the main functions handle many different algorithms at the same time. Please think for a while about how you could make it more modular, so that it becomes easier to understand in the future.
Really sorry about that. Next time I will try to make small patches only, although it is not always easy to do.
Besides: the function
hyperbolicity
contains the following algorithms:"'basic'", "'basic+'", "'cuts'", "'cuts+'", "'dom'"
. Not only the names are unclear, but that is probably too much. I do not expect that all will be useful to users, or useful for us to keep, especially when handling them all makes the code so complicated. Could you think about those we do not really need, to remove them in another ticket ?
Well, since we have 'basic+', we don't need 'basic' anymore. We could also rename 'cuts' as 'CCL', but I don't know if it is more explicit... We can try in another ticket.
Besides making the comments below, I implemented some modifications to understand the code a bit better. The most important changes, in terms of lines of code, only move "malloc" together, so that memory allocation problems are all checked at the same time.
Here is the review:
- For each of the three methods explained in the section "Algorithms and complexity" of the module header, could you mention the corresponding flag of
hyperbolicity
?
done
You say in the doc that Soto proved that
hyp(a,b,c,d)<= dist(a,b)/2
but in the code we haveif 2*distances[a][b] <= h_LB: continue
I expected
distances[a][b]/2 <= h_LB
. Or actually, maybe the bound proved by Soto was actuallyhyp(a,b,c,d)<= 2*dist(a,b)
(in which case the code is correct and the doc is wrong), as otherwise his bound is better than ours.
I have corrected the documentation. Soto proved that hyp(a,b,c,d) <= 2*dist(a,b)
.
So we can skip 4-tuples such that 2*dist(a,b) <= h_LB
.
It is more compact to make all memory allocations at the same place, and only have one block of code to free everything if there was a problem.
free(NULL)
is valid.your
unsigned short ** c_far_apart
only stores 0s and 1 (the amount of unused memory is15/16=93%
:-P
). We have something inmic/binary_matrix
that you may like (though it may not be worth the extra complexity in this case).
I'm clearly consuming much more memory than necessary. If the extra cost is not too high, I may propose the modification in another patch.
- The function which returns all far-apart pairs considers points in different connected components to be a far-away pair. Is that what you want ? Either way, please document it.
far-apart pairs are defined only in connected graphs. I have updated the documentation of the method to precise that it assumes that the input graph is connected.
- The documentation of
__hyperbolicity__
claims that the function implements an algorithm from CCL12. Also, the comment just above the function talks about a decreasing pair ordering.
I have changed the comment to Compute the hyperbolicity using a the algorithm of [CCL12]_
.
There are many variables that you specialise to uint32_t when actually nobody cares. The code would be easier to read if they were int variables.
__hyperbolicity__
is only meant for connected graphs (you ask for a diameter) -- could you write this explicitly somewhere ?
I added a comment.
... after something like two hours of review: this patch is too big.
Really, really, really, sorry about that.
What about renaming this "STOP" thing to "GOTO_RETURN" ?
Please do not write a commit that changes the indentation of a block of code. Modify the first commit itself.
Is there any reason why this block
# Termination if required approximation is found
Is not on the same level as
# If we cannot improve further, we stop
It feels like the two play the same role in the algorithm.
You define two values for 'STOP' but you never use them. Was that meant to only leave one loop instead or two, and remove one of the two copies of the blocks:
# If we cannot improve further, we stop # Termination if required approximation is found
It would be a nice change indeed, the implementation is already sufficiently complicated.
h_UB if STOP==2 else h
: why don't you just set the right value of h_UB right before settingSTOP==2
?
Following your comments, I have:
STOP==2
test by setting h_UB = h
when optimality is proven.I don't see how to remove one of the two blocks of tests since one is used each time the upper bound is reduced and the other each time the lower bound is improved.
- In "hyperbolicity" a large portion of the code deals with the block-decomposition. Why couldn't you just call hyperbolicity on each block ? You would not have to allocate memory in a loop, and the is_clique test would be done automatically too (+the others)... You may lose a bit of time as the block_decomposition would be recomputed on each block, but I do not think that it is such a big problem compared to the cost of the algorithm. Actually, I doubt that many real instances have a cut-vertex. That would remove a big chunck of code, too.
Actually, many real instances have many cut-vertices! For instance, the graphs of the autonomous systems of the Internet collected by CAIDA have a large number of cut-vertices. The largest 2-connected component has 2/3 of the vertices only. The same hold for co-authors graphs and so on.
I have however modified the method. This is easier to read this way. It is also faster when the input graph is 2-connected since we avoid a useless copy of the graph.
while len(DOM)<4: DOM.append(H.random_vertex())
what if that new vertex was inDOM
already ?
I fixed this issue using a set instead of a list for DOM.
Nathann
Many, many thanks Nathann.
David.
PS: sorry for the multiple commits. I'm still lost with git. For instance, I don't know how to modify a commit without creating a new commit.
Reviewer: Nathann Cohen
Branch pushed to git repo; I updated commit sha1. New commits:
2e9e2b2 | trac #17227: improve module documentation. |
21b84e4 | trac #17227: clarify the inputs of the method computing far-apart pairs. |
9cf7f56 | trac #17227: simplify tests in __hyperbolicity__ |
40a7038 | trac #17227: reshape method hyperbolicity |
7ca92d1 | trac #17227: fix issue with greedy dominating set |
e8968ca | trac #17227: fix upper bound with cuts+ |
6f5df4a | trac #17227: fix another problem with approximations and DOM |
PS: sorry for the multiple commits. I'm still lost with git. For instance, I don't know how to modify a commit without creating a new commit.
1) If you want to merge the current (staged) modifications to the last commit, do "git commit --amend" (or "git commit -a --amend" to include all changes, as usual)
2) If you want to merge some commits on an existing branch, checkout this branch then "git rebase -i the_older_commit_of_the_branch"
Note that both do what is called "rewriting history". If you then want to push your changes to a Sage branch you will need to add a '-f' flag to 'git push'.
Nathann
Hello David !
Well, since we have 'basic+', we don't need 'basic' anymore.
+1
I have corrected the documentation. Soto proved that
hyp(a,b,c,d) <= 2*dist(a,b)
It seems that you only fixed the text and not the math formula that follows.
I'm clearly consuming much more memory than necessary. If the extra cost is not too high, I may propose the modification in another patch.
Okay. Perhaps the removal of 'basic' will make things slightly clearer already.
I have updated the documentation of the method to precise that it assumes that the input graph is connected.
Would you be willing to add a check right at the beginning of the function, just in case the graph is not connected ? The complexity of the algorithm will not be hurt by one more BFS ^^;
I have changed the comment to
Compute the hyperbolicity using a the algorithm of [CCL12]_
Sorry, my remark was not very clear: what I meant is that the function that follows does not only implement the algorithm from CCL12. It also deals with far-apart pairs, and unless I make a mistake there was no mention of them in that paper. The second line of the function's doc say the same.
Other comments:
In this block of code:
if hh > hyp or (hh==hyp and not certificate):
hyp = hh
hyp_UB = hh_UB
certificate = certif
Shouldn't we have hyp_UB = max(hyp_UB,hh_UB)
instead ? Or is it always
correct for some other reason ? Also, if the 'min' is necessary, could there
be a situation in which hh < hyp
but hh_UB > hyp_UB
? In that case the
upper bound would need to be updated, but not hyp
.
About return (h, certificate, h_UB if GOTO_RETURN else h)
: note that in
Python you can have a for/else:
sage: for i in range(40):
....: pass
....: else:
....: print "Hey"
....:
Hey
sage: for i in range(40):
....: break
....: else:
....: print "Hey"
....:
sage:
By the way, thank you for adressing all those remarks methodically and answering
on each point. While the branch is heavy it makes this review less painful than
others in which the authors "forget" some questions :-P
Cheers,
Nathann
Branch pushed to git repo; I updated commit sha1. New commits:
e44ca66 | trac #17227: small corrections for comment 18 |
Replying to @nathanncohen:
Hello David !
Well, since we have 'basic+', we don't need 'basic' anymore.
+1
I have corrected the documentation. Soto proved that
hyp(a,b,c,d) <= 2*dist(a,b)
It seems that you only fixed the text and not the math formula that follows.
Corrected.
I'm clearly consuming much more memory than necessary. If the extra cost is not too high, I may propose the modification in another patch.
Okay. Perhaps the removal of 'basic' will make things slightly clearer already.
I have updated the documentation of the method to precise that it assumes that the input graph is connected.
Would you be willing to add a check right at the beginning of the function, just in case the graph is not connected ? The complexity of the algorithm will not be hurt by one more BFS ^^;
Method __hyperbolicity__
takes as input the distance matrix but not the graph itself. So the only way to check that the graph is connected would be to check that all distances from node 0 in the distance matrix are in the range [0,N-1].
Should I add such a test?
I have changed the comment to
Compute the hyperbolicity using a the algorithm of [CCL12]_
Sorry, my remark was not very clear: what I meant is that the function that follows does not only implement the algorithm from CCL12. It also deals with far-apart pairs, and unless I make a mistake there was no mention of them in that paper. The second line of the function's doc say the same.
In fact all details are in a paper under submission. We can wait until the paper get accepted to replace the reference. This can be done in the patch removing the 'basic' method.
Other comments:
In this block of code:
if hh > hyp or (hh==hyp and not certificate): hyp = hh hyp_UB = hh_UB certificate = certif
Shouldn't we have
hyp_UB = max(hyp_UB,hh_UB)
instead ? Or is it always correct for some other reason ? Also, if the 'min' is necessary, could there be a situation in whichhh < hyp
buthh_UB > hyp_UB
? In that case the upper bound would need to be updated, but nothyp
.
Right. I have changed to hyp_UB = max(hyp_UB,hh_UB)
.
About
return (h, certificate, h_UB if GOTO_RETURN else h)
: note that in Python you can have a for/else:sage: for i in range(40): ....: pass ....: else: ....: print "Hey" ....: Hey sage: for i in range(40): ....: break ....: else: ....: print "Hey" ....: sage:
I know that, but I don't know how it can be used to simplify the code. Indeed, the else
part will always be executed, even without pass
or break
.
Thanks Nathann.
New commits:
e44ca66 | trac #17227: small corrections for comment 18 |
Helloooooo !
Method
__hyperbolicity__
takes as input the distance matrix but not the graph itself. So the only way to check that the graph is connected would be to check that all distances from node 0 in the distance matrix are in the range [0,N-1]. Should I add such a test?
Yes, please. It will only be a l-dimension loop, so it will really be cheap.
In fact all details are in a paper under submission. We can wait until the paper get accepted to replace the reference. This can be done in the patch removing the 'basic' method.
Okay, +1.
Shouldn't we have
hyp_UB = max(hyp_UB,hh_UB)
instead ? Or is it always correct for some other reason ? Also, if the 'min' is necessary, could there be a situation in whichhh < hyp
buthh_UB > hyp_UB
? In that case the upper bound would need to be updated, but nothyp
.Right. I have changed to
hyp_UB = max(hyp_UB,hh_UB)
.
Okay. My question had a second part, however: is it possible that even though hh < hyp
we have hh_UB > hyp_UB
? In this situation the upper bound would need to be updated, even though hyp
would stay the same.
I know that, but I don't know how it can be used to simplify the code. Indeed, the
else
part will always be executed, even withoutpass
orbreak
.
Sorry sorry, I just meant this as a general remark. While it could be used in this code, it is already sufficiently complicated to not use Python-specific instructions unless there is a good reason (that I do not see here).
I set the ticket to needs_work
, because of the connectivity test and the question related to the upper bound. We will be done soon.
Nathann
Replying to @nathanncohen:
Helloooooo !
Method
__hyperbolicity__
takes as input the distance matrix but not the graph itself. So the only way to check that the graph is connected would be to check that all distances from node 0 in the distance matrix are in the range [0,N-1]. Should I add such a test?Yes, please. It will only be a l-dimension loop, so it will really be cheap.
I have added a test at the beginning of method __hyperbolicity__
. I have also forced initialization of distances to largest possible value in method distances_and_far_apart_pairs
, although it is clearly written that this method ensures a connected graph. Now we should be on the safe side.
Okay. My question had a second part, however: is it possible that even though
hh < hyp
we havehh_UB > hyp_UB
? In this situation the upper bound would need to be updated, even thoughhyp
would stay the same.
You are right, such situation may happen. Suppose for instance that the graph has 2 biconnected components, a (k+1)(k+1)-grid and a k-hyperbolic graph with large diameter and plenty of far-apart pairs with various distances. The upper bound for the grid will be k, but for the other component the upper bound could be large (ck or k+c).
I have moved the update of the upper bound (hyp_UB = max(hyp_UB, hh_UB)
) outside of the if...
to address such case.
I set the ticket to
needs_work
, because of the connectivity test and the question related to the upper bound. We will be done soon.Nathann
I hope it's better now.
Branch pushed to git repo; I updated commit sha1. New commits:
1b35416 | trac #17227: small corrections for comment 21 |
I hope it's better now.
Yep ! At long last, this ticket is...positively reviewed :-P
Nathann
Thanks a lot Nathann!
Changed branch from public/17227 to 1b35416
This patch adds two cutting rules, one for the
'basic'
algorithm and one for the'cuts'
algorithm.'basic+'
algorithm uses the new cutting rule and is always faster than the'basic'
one.'cuts+'
algorithm is now an exact algorithm using the notion of far-apart pairs to reduce the size of the search space. It is sometimes faster than the'cuts'
algorithm although it has a longer pre-processing time. It is faster for instance on rectangular grids, petersen, and on graphs such that the hyperbolicity is small compared to the diameter/2.CC: @nathanncohen
Component: graph theory
Author: David Coudert
Branch/Commit:
1b35416
Reviewer: Nathann Cohen
Issue created by migration from https://trac.sagemath.org/ticket/17227