sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.42k stars 477 forks source link

Add new cutting rules for computing hyperbolicity #17227

Closed dcoudert closed 9 years ago

dcoudert commented 10 years ago

This patch adds two cutting rules, one for the 'basic' algorithm and one for the 'cuts' algorithm.

CC: @nathanncohen

Component: graph theory

Author: David Coudert

Branch/Commit: 1b35416

Reviewer: Nathann Cohen

Issue created by migration from https://trac.sagemath.org/ticket/17227

dcoudert commented 10 years ago

Author: David Coudert

dcoudert commented 10 years ago

Branch: u/dcoudert/add_new_cutting_rules_for_computing_hyperbolicity

dcoudert commented 10 years ago

Description changed:

--- 
+++ 
@@ -1 +1,4 @@
+This patch adds two cutting rules, one for the `'basic'` algorithm and one for the `'cuts'` algorithm.
+* the new `'basic+'` algorithm uses the new cutting rule and is always faster than the `'basic'` one.
+* the `'cuts+'` algorithm is now an exact algorithm using the notion of far-apart pairs to reduce the size of the search space. It is sometimes faster than the `'cuts'` algorithm although it has a longer pre-processing time. It is faster for instance on rectangular grids, petersen, and on graphs such that the hyperbolicity is small compared to the diameter/2.
dcoudert commented 10 years ago

Commit: ee216e2

dcoudert commented 10 years ago
comment:3

Some examples

sage: from sage.graphs.hyperbolicity import hyperbolicity
sage: G = graphs.PetersenGraph()
sage: %time hyperbolicity(G, algorithm='basic')
CPU times: user 2 ms, sys: 0 ns, total: 2 ms
Wall time: 1.16 ms
(1/2, [0, 1, 2, 3], 1/2)
sage: %time hyperbolicity(G, algorithm='basic+')
CPU times: user 1 ms, sys: 0 ns, total: 1 ms
Wall time: 1.13 ms
(1/2, [0, 1, 2, 3], 1/2)
sage: %time hyperbolicity(G, algorithm='cuts')
CPU times: user 2 ms, sys: 0 ns, total: 2 ms
Wall time: 1.19 ms
(1/2, [0, 1, 2, 3], 1/2)
sage: %time hyperbolicity(G, algorithm='cuts+')
CPU times: user 1e+03 µs, sys: 0 ns, total: 1e+03 µs
Wall time: 776 µs
(1/2, [0, 1, 2, 3], 1/2)
sage: G = graphs.Grid2dGraph(20,10)
sage: %time hyperbolicity(G, algorithm='basic')
CPU times: user 681 ms, sys: 3 ms, total: 684 ms
Wall time: 682 ms
(9, [(0, 0), (0, 9), (9, 0), (9, 9)], 9)
sage: %time hyperbolicity(G, algorithm='basic+')
CPU times: user 52 ms, sys: 0 ns, total: 52 ms
Wall time: 49 ms
(9, [(0, 0), (0, 9), (9, 0), (9, 9)], 9)
sage: %time hyperbolicity(G, algorithm='cuts')
CPU times: user 22 ms, sys: 1 ms, total: 23 ms
Wall time: 21.5 ms
(9, [(0, 0), (0, 9), (19, 0), (19, 9)], 9)
sage: %time hyperbolicity(G, algorithm='cuts+')
CPU times: user 17 ms, sys: 2 ms, total: 19 ms
Wall time: 16.2 ms
(9, [(0, 0), (0, 9), (19, 0), (19, 9)], 9)

New commits:

ee216e2#17227: adds far apart pairs and cleanup the code
6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 10 years ago
comment:4

Please split this commit into thematic changes. Reviewing code like that takes a lot of time and you make it more complicated by doing everything at once.

As I do not know what exactly you do in this patch, all I can do is look at the diff, and try to understand. Please look at this diff, you will see what the reviewer has to go through.

Thanks,

Nathann

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from ee216e2 to 5529a33

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

f6ad07dtrac #17227: update module documentation
b0c2006trac #17227: adds basic+ algorithm
d7de77atrac #17227: adds verbose parameter to greedy dominating set
6b61e14trac #17227: add a method to compute both distances and far-apart pairs
18dfcf2trac #17227: change the behavior of the cuts+ method that now uses far-apart pairs
632bbb6trac #17227: remove methods that are no longer used
329e8bftrac #17227: remove other methods that are no longer used
5529a33trac #17227: fix various indentation errors and small bugs
dcoudert commented 9 years ago
comment:6

Hello Nathann.

I hope this patch will be easier to review now. I tried to do more thematic commits.

I'm still having a hard time with git. I had to force the push... but it should be ok now.

Thanks.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:7

Hello !

It is normal that you had to force the update, no problem there. Git only accepts "additional commits" without complaints, for everything else you have to push.

I am backpacking in Mumbai right now but I go home (=Chennai) tomorrow and I will be back to work next Monday, and on this review in particular.

Also, note that your branch does not merge with the latest release.

Nathann

dcoudert commented 9 years ago
comment:8

I unsuccessfully tried to rebase on up-to-date develop branch :( How can I solve this problem??

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:9

The conflict is only an import statement at the beginning of distance_all_pairs, so no problem, you can just merge your branch with the latest beta.

If you want to know how your rebasing problem can be solved I need more than "I unsuccesfully tried", though.

Nathann

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

Changed branch from u/dcoudert/add_new_cutting_rules_for_computing_hyperbolicity to public/17227

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

Changed commit from 5529a33 to 70d5c4f

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

New commits:

70d5c4ftrac #17227: Merged with 6.5.beta0
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

4b87328trac #17227: change include bitset to comply with patch 17196
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 70d5c4f to 4b87328

dcoudert commented 9 years ago
comment:12

Thanks a lot Nathann for resolving my problem. The terms used for git are not obvious for me, and so all actions are complicated.

David.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 4b87328 to 4834573

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

1ffeb68trac #17227: Merged with 6.5.beta1
4834573trac #17227: Reviewer's commit
6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:14

Yo David !

I just finished the first review. It is 16h38 right now, and I have been working on this code for the last 4h hours. It is too much. This ticket should have been implemented into several small tickets. It is also very very difficult to keep track of what is happening, as the main functions handle many different algorithms at the same time. Please think for a while about how you could make it more modular, so that it becomes easier to understand in the future.

Besides: the function hyperbolicity contains the following algorithms: "'basic'", "'basic+'", "'cuts'", "'cuts+'", "'dom'". Not only the names are unclear, but that is probably too much. I do not expect that all will be useful to users, or useful for us to keep, especially when handling them all makes the code so complicated. Could you think about those we do not really need, to remove them in another ticket ?

Besides making the comments below, I implemented some modifications to understand the code a bit better. The most important changes, in terms of lines of code, only move "malloc" together, so that memory allocation problems are all checked at the same time.

Here is the review:

... after something like two hours of review: this patch is too big.

Nathann

dcoudert commented 9 years ago
comment:15

Replying to @nathanncohen:

Yo David !

I just finished the first review. It is 16h38 right now, and I have been working on this code for the last 4h hours. It is too much. This ticket should have been implemented into several small tickets. It is also very very difficult to keep track of what is happening, as the main functions handle many different algorithms at the same time. Please think for a while about how you could make it more modular, so that it becomes easier to understand in the future.

Really sorry about that. Next time I will try to make small patches only, although it is not always easy to do.

Besides: the function hyperbolicity contains the following algorithms: "'basic'", "'basic+'", "'cuts'", "'cuts+'", "'dom'". Not only the names are unclear, but that is probably too much. I do not expect that all will be useful to users, or useful for us to keep, especially when handling them all makes the code so complicated. Could you think about those we do not really need, to remove them in another ticket ?

Well, since we have 'basic+', we don't need 'basic' anymore. We could also rename 'cuts' as 'CCL', but I don't know if it is more explicit... We can try in another ticket.

Besides making the comments below, I implemented some modifications to understand the code a bit better. The most important changes, in terms of lines of code, only move "malloc" together, so that memory allocation problems are all checked at the same time.

Here is the review:

  • For each of the three methods explained in the section "Algorithms and complexity" of the module header, could you mention the corresponding flag of hyperbolicity ?

done

  • You say in the doc that Soto proved that hyp(a,b,c,d)<= dist(a,b)/2 but in the code we have

    if 2*distances[a][b] <= h_LB:
     continue

    I expected distances[a][b]/2 <= h_LB. Or actually, maybe the bound proved by Soto was actually hyp(a,b,c,d)<= 2*dist(a,b) (in which case the code is correct and the doc is wrong), as otherwise his bound is better than ours.

I have corrected the documentation. Soto proved that hyp(a,b,c,d) <= 2*dist(a,b). So we can skip 4-tuples such that 2*dist(a,b) <= h_LB.

  • It is more compact to make all memory allocations at the same place, and only have one block of code to free everything if there was a problem. free(NULL) is valid.

  • your unsigned short ** c_far_apart only stores 0s and 1 (the amount of unused memory is 15/16=93% :-P). We have something in mic/binary_matrix that you may like (though it may not be worth the extra complexity in this case).

I'm clearly consuming much more memory than necessary. If the extra cost is not too high, I may propose the modification in another patch.

  • The function which returns all far-apart pairs considers points in different connected components to be a far-away pair. Is that what you want ? Either way, please document it.

far-apart pairs are defined only in connected graphs. I have updated the documentation of the method to precise that it assumes that the input graph is connected.

  • The documentation of __hyperbolicity__ claims that the function implements an algorithm from CCL12. Also, the comment just above the function talks about a decreasing pair ordering.

I have changed the comment to Compute the hyperbolicity using a the algorithm of [CCL12]_.

  • There are many variables that you specialise to uint32_t when actually nobody cares. The code would be easier to read if they were int variables.

  • __hyperbolicity__ is only meant for connected graphs (you ask for a diameter) -- could you write this explicitly somewhere ?

I added a comment.

... after something like two hours of review: this patch is too big.

Really, really, really, sorry about that.

  • What about renaming this "STOP" thing to "GOTO_RETURN" ?

  • Please do not write a commit that changes the indentation of a block of code. Modify the first commit itself.

  • Is there any reason why this block

    # Termination if required approximation is found

    Is not on the same level as

    # If we cannot improve further, we stop

    It feels like the two play the same role in the algorithm.

  • You define two values for 'STOP' but you never use them. Was that meant to only leave one loop instead or two, and remove one of the two copies of the blocks:

    # If we cannot improve further, we stop
    # Termination if required approximation is found

    It would be a nice change indeed, the implementation is already sufficiently complicated.

  • h_UB if STOP==2 else h : why don't you just set the right value of h_UB right before setting STOP==2 ?

Following your comments, I have:

I don't see how to remove one of the two blocks of tests since one is used each time the upper bound is reduced and the other each time the lower bound is improved.

  • In "hyperbolicity" a large portion of the code deals with the block-decomposition. Why couldn't you just call hyperbolicity on each block ? You would not have to allocate memory in a loop, and the is_clique test would be done automatically too (+the others)... You may lose a bit of time as the block_decomposition would be recomputed on each block, but I do not think that it is such a big problem compared to the cost of the algorithm. Actually, I doubt that many real instances have a cut-vertex. That would remove a big chunck of code, too.

Actually, many real instances have many cut-vertices! For instance, the graphs of the autonomous systems of the Internet collected by CAIDA have a large number of cut-vertices. The largest 2-connected component has 2/3 of the vertices only. The same hold for co-authors graphs and so on.

I have however modified the method. This is easier to read this way. It is also faster when the input graph is 2-connected since we avoid a useless copy of the graph.

  • while len(DOM)<4: DOM.append(H.random_vertex()) what if that new vertex was in DOM already ?

I fixed this issue using a set instead of a list for DOM.

Nathann

Many, many thanks Nathann.

David.

PS: sorry for the multiple commits. I'm still lost with git. For instance, I don't know how to modify a commit without creating a new commit.

dcoudert commented 9 years ago

Reviewer: Nathann Cohen

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 4834573 to 6f5df4a

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

2e9e2b2trac #17227: improve module documentation.
21b84e4trac #17227: clarify the inputs of the method computing far-apart pairs.
9cf7f56trac #17227: simplify tests in __hyperbolicity__
40a7038trac #17227: reshape method hyperbolicity
7ca92d1trac #17227: fix issue with greedy dominating set
e8968catrac #17227: fix upper bound with cuts+
6f5df4atrac #17227: fix another problem with approximations and DOM
6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:17

PS: sorry for the multiple commits. I'm still lost with git. For instance, I don't know how to modify a commit without creating a new commit.

1) If you want to merge the current (staged) modifications to the last commit, do "git commit --amend" (or "git commit -a --amend" to include all changes, as usual)

2) If you want to merge some commits on an existing branch, checkout this branch then "git rebase -i the_older_commit_of_the_branch"

Note that both do what is called "rewriting history". If you then want to push your changes to a Sage branch you will need to add a '-f' flag to 'git push'.

Nathann

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:18

Hello David !

Well, since we have 'basic+', we don't need 'basic' anymore.

+1

I have corrected the documentation. Soto proved that hyp(a,b,c,d) <= 2*dist(a,b)

It seems that you only fixed the text and not the math formula that follows.

I'm clearly consuming much more memory than necessary. If the extra cost is not too high, I may propose the modification in another patch.

Okay. Perhaps the removal of 'basic' will make things slightly clearer already.

I have updated the documentation of the method to precise that it assumes that the input graph is connected.

Would you be willing to add a check right at the beginning of the function, just in case the graph is not connected ? The complexity of the algorithm will not be hurt by one more BFS ^^;

I have changed the comment to Compute the hyperbolicity using a the algorithm of [CCL12]_

Sorry, my remark was not very clear: what I meant is that the function that follows does not only implement the algorithm from CCL12. It also deals with far-apart pairs, and unless I make a mistake there was no mention of them in that paper. The second line of the function's doc say the same.

Other comments:

By the way, thank you for adressing all those remarks methodically and answering on each point. While the branch is heavy it makes this review less painful than others in which the authors "forget" some questions :-P

Cheers,

Nathann

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

e44ca66trac #17227: small corrections for comment 18
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 6f5df4a to e44ca66

dcoudert commented 9 years ago
comment:20

Replying to @nathanncohen:

Hello David !

Well, since we have 'basic+', we don't need 'basic' anymore.

+1

I have corrected the documentation. Soto proved that hyp(a,b,c,d) <= 2*dist(a,b)

It seems that you only fixed the text and not the math formula that follows.

Corrected.

I'm clearly consuming much more memory than necessary. If the extra cost is not too high, I may propose the modification in another patch.

Okay. Perhaps the removal of 'basic' will make things slightly clearer already.

I have updated the documentation of the method to precise that it assumes that the input graph is connected.

Would you be willing to add a check right at the beginning of the function, just in case the graph is not connected ? The complexity of the algorithm will not be hurt by one more BFS ^^;

Method __hyperbolicity__ takes as input the distance matrix but not the graph itself. So the only way to check that the graph is connected would be to check that all distances from node 0 in the distance matrix are in the range [0,N-1]. Should I add such a test?

I have changed the comment to Compute the hyperbolicity using a the algorithm of [CCL12]_

Sorry, my remark was not very clear: what I meant is that the function that follows does not only implement the algorithm from CCL12. It also deals with far-apart pairs, and unless I make a mistake there was no mention of them in that paper. The second line of the function's doc say the same.

In fact all details are in a paper under submission. We can wait until the paper get accepted to replace the reference. This can be done in the patch removing the 'basic' method.

Other comments:

  • In this block of code:

    if hh > hyp or (hh==hyp and not certificate):
      hyp = hh
      hyp_UB = hh_UB
      certificate = certif

    Shouldn't we have hyp_UB = max(hyp_UB,hh_UB) instead ? Or is it always correct for some other reason ? Also, if the 'min' is necessary, could there be a situation in which hh < hyp but hh_UB > hyp_UB ? In that case the upper bound would need to be updated, but not hyp.

Right. I have changed to hyp_UB = max(hyp_UB,hh_UB).

  • About return (h, certificate, h_UB if GOTO_RETURN else h): note that in Python you can have a for/else:

    sage: for i in range(40):
    ....:     pass
    ....: else:
    ....:     print "Hey"
    ....:
    Hey
    sage: for i in range(40):
    ....:     break
    ....: else:
    ....:     print "Hey"
    ....:
    sage:

I know that, but I don't know how it can be used to simplify the code. Indeed, the else part will always be executed, even without pass or break.

Thanks Nathann.


New commits:

e44ca66trac #17227: small corrections for comment 18
6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:21

Helloooooo !

Method __hyperbolicity__ takes as input the distance matrix but not the graph itself. So the only way to check that the graph is connected would be to check that all distances from node 0 in the distance matrix are in the range [0,N-1]. Should I add such a test?

Yes, please. It will only be a l-dimension loop, so it will really be cheap.

In fact all details are in a paper under submission. We can wait until the paper get accepted to replace the reference. This can be done in the patch removing the 'basic' method.

Okay, +1.

Shouldn't we have hyp_UB = max(hyp_UB,hh_UB) instead ? Or is it always correct for some other reason ? Also, if the 'min' is necessary, could there be a situation in which hh < hyp but hh_UB > hyp_UB ? In that case the upper bound would need to be updated, but not hyp.

Right. I have changed to hyp_UB = max(hyp_UB,hh_UB).

Okay. My question had a second part, however: is it possible that even though hh < hyp we have hh_UB > hyp_UB ? In this situation the upper bound would need to be updated, even though hyp would stay the same.

I know that, but I don't know how it can be used to simplify the code. Indeed, the else part will always be executed, even without pass or break.

Sorry sorry, I just meant this as a general remark. While it could be used in this code, it is already sufficiently complicated to not use Python-specific instructions unless there is a good reason (that I do not see here).

I set the ticket to needs_work, because of the connectivity test and the question related to the upper bound. We will be done soon.

Nathann

dcoudert commented 9 years ago
comment:22

Replying to @nathanncohen:

Helloooooo !

Method __hyperbolicity__ takes as input the distance matrix but not the graph itself. So the only way to check that the graph is connected would be to check that all distances from node 0 in the distance matrix are in the range [0,N-1]. Should I add such a test?

Yes, please. It will only be a l-dimension loop, so it will really be cheap.

I have added a test at the beginning of method __hyperbolicity__. I have also forced initialization of distances to largest possible value in method distances_and_far_apart_pairs, although it is clearly written that this method ensures a connected graph. Now we should be on the safe side.

Okay. My question had a second part, however: is it possible that even though hh < hyp we have hh_UB > hyp_UB ? In this situation the upper bound would need to be updated, even though hyp would stay the same.

You are right, such situation may happen. Suppose for instance that the graph has 2 biconnected components, a (k+1)(k+1)-grid and a k-hyperbolic graph with large diameter and plenty of far-apart pairs with various distances. The upper bound for the grid will be k, but for the other component the upper bound could be large (ck or k+c).

I have moved the update of the upper bound (hyp_UB = max(hyp_UB, hh_UB)) outside of the if... to address such case.

I set the ticket to needs_work, because of the connectivity test and the question related to the upper bound. We will be done soon.

Nathann

I hope it's better now.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

1b35416trac #17227: small corrections for comment 21
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from e44ca66 to 1b35416

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:25

I hope it's better now.

Yep ! At long last, this ticket is...positively reviewed :-P

Nathann

dcoudert commented 9 years ago
comment:26

Thanks a lot Nathann!

vbraun commented 9 years ago

Changed branch from public/17227 to 1b35416