Closed b4a46f46-8d8a-415c-ad35-64549def2993 closed 9 years ago
Description changed:
---
+++
@@ -1 +1,3 @@
+Implement the efficient 3 connectivity algorithm.
+R. E. Bixby, W. H. Cunningham, Matroids, Graphs, and 3-Connectivity. In Graph theory and related topics (Proc. Conf., Univ. Waterloo, Waterloo, ON, 1977), 91-103
Commit: c4f38eb
New commits:
c4f38eb | basic algorithm framework for further modification |
Branch pushed to git repo; I updated commit sha1. New commits:
5167f3d | remove excess code from copy & paste |
Branch pushed to git repo; I updated commit sha1. New commits:
23cfab1 | complete special cases in stage 0 |
Running the following code shows the new implementation matches the current behavior on named matroids.
def test(M):
return M.is_3connected_beta()==M.is_3connected()
results = []
for M in matroids.named_matroids.__dict__.values():
if callable(M):
results.append(test(M()))
all(results)
I will generate some larger examples to test the running time and start optimizing.
For the base case, I need to access Uniform(r,n). A direct import seems to cause circular dependency. If I use
from sage.matroids.catalog import Uniform
# some code that calls Uniform(r,n)
then the error ImportError: cannot import name Uniform
appears during runtime.
Somehow using from sage.matroids.catalog import *
and write sage.matroids.catalog.Uniform(r,n)
for every call works out. I wonder if there are better ways.
Replying to @chaoxu:
For the base case, I need to access Uniform(r,n). A direct import seems to cause circular dependency. If I use
from sage.matroids.catalog import Uniform # some code that calls Uniform(r,n)
then the error
ImportError: cannot import name Uniform
appears during runtime. Somehow usingfrom sage.matroids.catalog import *
and writesage.matroids.catalog.Uniform(r,n)
for every call works out. I wonder if there are better ways.
A matroid of rank 1 is uniform if and only if it has no loops, and dually a matroid of corank 1 is uniform iff it has no coloops. So you could also replace your test
if (self.size() <= 1 or
self.is_isomorphic(sage.matroids.catalog.Uniform(1,2)) or
self.is_isomorphic(sage.matroids.catalog.Uniform(1,3)) or
self.is_isomorphic(sage.matroids.catalog.Uniform(2,3))):
return True
with this simpler one
if self.size() <= 1:
return True
if self.size() <= 3 and self.full_rank()==1 and not self.loops():
return True
if self.size() <= 3 and self.full_corank()==1 and not self.coloops():
return True
While I agree that Rudi's method is better (isomorphism testing is VERY expensive), let's look at the import issue too. Was your import inside the method instead of global? You might want to look at the way it was done in, for instance, the _minor method in matroid.pyx. I think you can't do it better than that.
Branch pushed to git repo; I updated commit sha1. New commits:
0033334 | Optimization: Clever choice of basis and fundamental cocircuits. |
Thanks Rudi, I have incorporated your suggestions. It's much faster now.
One can use the following code to test the running time of the current implementation and the original one.
def bench(beta=False):
for M in matroids.named_matroids.__dict__.values():
if callable(M):
if beta:
M().is_3connected_beta()
else:
M().is_3connected()
sage: %time bench(False)
CPU times: user 298 ms, sys: 10.5 ms, total: 309 ms
Wall time: 301 ms
sage: %time bench(True)
CPU times: user 36.9 ms, sys: 1.83 ms, total: 38.8 ms
Wall time: 37.5 ms
I also benchmarked on 3-connected binary matroids listed here. The old algorithm takes 925 ms, the new one takes 51 ms.
For correctness, I will try to see if the original and the new implementation agrees for all matroids with at most 9 elements. (data from Database of Matroids).
all matroids on n<= 9 elements with rank <= n/2
Attachment: matroids9_low_rank.sobj.gz
Replying to @chaoxu:
One can use the following code to test the running time of the current implementation and the original one.
sage: %time bench(False) CPU times: user 298 ms, sys: 10.5 ms, total: 309 ms Wall time: 301 ms sage: %time bench(True) CPU times: user 36.9 ms, sys: 1.83 ms, total: 38.8 ms Wall time: 37.5 ms
I also benchmarked on 3-connected binary matroids listed here. The old algorithm takes 925 ms, the new one takes 51 ms.
That really is very nice work. Cheers!
What part of your code is currently using most of the time on these test cases?
For correctness, I will try to see if the original and the new implementation agrees for all matroids with at most 9 elements. (data from Database of Matroids).
I have attached the set of matroids on at most 9 elements, perhaps that will save you some time translating their data. If you want I can also send the Sage code I used to generate these matroids, just let me know.
Branch pushed to git repo; I updated commit sha1. New commits:
e4917ce | fix bug. |
Branch pushed to git repo; I updated commit sha1. New commits:
84a81fd | reduce number of candidate cocircuits |
Replying to @sagetrac-Rudi:
I have attached the set of matroids on at most 9 elements, perhaps that will save you some time translating their data.
Thanks, that's really helpful. I found a bug and now it have been fixed. Now it works for all test cases. I'm pretty convinced of the correctness.
What part of your code is currently using most of the time on these test cases?
For the named cases, _corank and minor operation taking the most of the time. For the small 3 connected binary matroids, most time was spent on is_circuit, is_cosimple, is_simple. For matroids with large rank (I generated a random rank 500 binary matroid), components take the most time. Here is an time breakdown for a rank 500 binary matroid.
136529 function calls in 18.426 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
501 6.947 0.014 11.109 0.022 matroid.pyx:4477(components)
125515 4.162 0.000 4.162 0.000 matroid.pyx:1351(circuit)
1 3.968 3.968 3.968 3.968 matroid.pyx:4441(is_cosimple)
500 3.216 0.006 3.246 0.006 matroid.pyx:3438(minor)
1 0.027 0.027 14.356 14.356 matroid.pyx:4807(_is_3connected_beta)
1000 0.026 0.000 0.044 0.000 matroid.pyx:1772(fundamental_cocircuit)
1 0.026 0.026 0.026 0.026 matroid.pyx:4408(is_simple)
1000 0.014 0.000 0.014 0.000 matroid.pyx:820(_is_basis)
500 0.013 0.000 0.013 0.000 {method '_max_coindependent' of 'sage.matroids.basis_exchange_matroid.BasisExchangeMatroid' objects}
500 0.005 0.000 0.030 0.000 utilities.py:184(sanitize_contractions_deletions)
1 0.005 0.005 0.005 0.005 matroid.pyx:2040(coloops)
500 0.005 0.000 0.005 0.000 {method '_max_independent' of 'sage.matroids.basis_exchange_matroid.BasisExchangeMatroid' objects}
1000 0.003 0.000 0.017 0.000 matroid.pyx:1887(is_basis)
1000 0.003 0.000 0.003 0.000 {method 'issuperset' of 'frozenset' objects}
1000 0.002 0.000 0.002 0.000 {method 'difference' of 'frozenset' objects}
1000 0.002 0.000 0.002 0.000 {method 'union' of 'frozenset' objects}
500 0.001 0.000 3.248 0.006 matroid.pyx:3631(delete)
500 0.000 0.000 11.060 0.022 matroid.pyx:4518(is_connected)
500 0.000 0.000 0.000 0.000 {method 'isdisjoint' of 'frozenset' objects}
1000 0.000 0.000 0.000 0.000 {method 'groundset' of 'sage.matroids.basis_exchange_matroid.BasisExchangeMatroid' objects}
1 0.000 0.000 0.000 0.000 matroid.pyx:1811(loops)
1 0.000 0.000 18.431 18.431 <string>:1(<module>)
1 0.000 0.000 18.431 18.431 {method 'is_3connected_beta' of 'sage.matroids.matroid.Matroid' objects}
1 0.000 0.000 18.431 18.431 matroid.pyx:4740(is_3connected_beta)
3 0.000 0.000 0.000 0.000 matroid.pyx:1216(size)
1 0.000 0.000 0.000 0.000 matroid.pyx:1236(rank)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
I will get it to return a separator and write some docs for the next update.
Replying to @chaoxu:
What part of your code is currently using most of the time on these test cases?
For the named cases, _corank and minor operation taking the most of the time. For the small 3 connected binary matroids, most time was spent on is_circuit, is_cosimple, is_simple. For matroids with large rank (I generated a random rank 500 binary matroid), components take the most time. Here is an time breakdown for a rank 500 binary matroid.
Thanks for sending that data. The many calls to circuits() are probably happening in the Matroids.components() routine. I think I may have written that routine, and looks as if I did not really optimize it for speed.
circuit()
is used to get a fundamental circuit, and I think we later created that optimized method for that: fundamental_circuit
So the following tweaks to components()
may make a difference.
B = self.basis()
components = []
for e in self.groundset() - B:
C = set(self._fundamental_cocircuit(B,e))
components2 = []
for comp in components:
if (C & comp):
C.update(comp)
else:
components2.append(comp)
components2.append(frozenset(C))
components = components2
components.extend([frozenset([e]) for e in self.loops()])
return components
You could also consider rewriting a bit more, use the above when r(M) <= r^*(M)
and the dual version (circuits) in the other case. That way the inner loop always has the least number of iterations. And if this routine really stays a bottleneck, a bitpacked version is always an option.
Since your routine is_3connected_beta also computes fundamental cocircuits, you would also gain a bit by using the unguarded _fundamental_cocircuit. It is perhaps wasteful to compute those fundamental cocircuits again in components
and indirectly, in is_connected
. You could consider rewriting these routines so that they take a set of fundamental circuits and/or a set of fundamental cociruits as input. Note that if Y is a fundamental cocircuit as in step 2, then the fundamental cocircuits of self.delete(Y) are obtained by deleting Y from the other fundamental cocircuits of self. So you could even avoid creating self.delete(Y) in step 2.
It is very, very strange indeed that a single call to is_cosimple
takes that much time. I cannot explain how it can take any more time than is_simple. I'll look into it.
A quicker method to create the parallel class of e is:
parallel_class = M._closure([e])
The minor-taking/deletion is also expensive in your method. Improving the inefficiency of minor
itself is probably more involved, and this would really deserve it's own ticket. It may be possible to reduce the number of times you construct a minor, e.g. by testing the rank of the minor before actually constructing it.
Hi Chao,
somehow the inefficiency of components() became a major earworm for me today, so I coded a more efficient version for BasisExchangeMatroid (ticket #18591) to get it out of my system. The real inefficiency turned out to be in the set union operations, and calculating with bitsets rather than python sets works miracles for that.
I also felt a bit bad for luring you into my bad habits of premature code optimization, but by just using my patch you can stay focussed on testing correctness and enhancing functionality, which is probably better.
I realized why is_cosimple() is that much slower than is_simple() for BinaryMatroids (and Ternary and Quaternary). There is a specialized ultra fast method to get a fundamental cocircuit for such matroids, essentially copying the row of a bitpacked matrix (which is a bitset) into another bitset, which is the internal representation of the cocircuit. For circuits you are back to touching each of the groundset elements one by one, which is way slower. This asymmetry makes closure() faster that coclosure, and in turn is_simple faster than is_cosimple. I still need to see what is a clean way to solve that.
Rudi
Replying to @sagetrac-Rudi:
The real inefficiency turned out to be in the set union operations, and calculating with bitsets rather than python sets works miracles for that.
Great. Thanks I will review that code soon. What is the abstract way to find components from a set of fundamental cocircuits? This code didn't return the same output as the original components.
B = self.basis()
components = []
for e in self.groundset() - B:
C = set(self._fundamental_cocircuit(B,e))
components2 = []
for comp in components:
if (C & comp):
C.update(comp)
else:
components2.append(comp)
components2.append(frozenset(C))
components = components2
components.extend([frozenset([e]) for e in self.loops()])
return components
Replying to @chaoxu:
This code didn't return the same output as the original components.
It should be:
for e in B:
C = set(self._fundamental_cocircuit(B,e))
or
for e in self.groundset() - B:
C = set(self._fundamental_circuit(B,e))
I mixed them up.
What is the abstract way to find components from a set of fundamental cocircuits?
Consider the hypergraph H whose ground set is E(M) and whose edges are the fundamental cocircuits of M relative to a fixed basis B. Then the components of H are the components of M.
Hi Chao,
I have reduced the amount of time spent on is_cosimple() as well, see ticket #18605. Together with the patch for components() this leaves minor() as the main bottleneck. Not so sure if I can make that one much more efficient.
Looking at the effect of my patches on various types of matroids I ran into a bug. Try this:
sage: M=Matroid(graphs.RandomRegular(3,200))
sage: M.is_3connected_beta()
It's random, but nearly always results in a error.
Hi Chao,
there is an import from sage.matroids.catalog import *
in your matroid.pyx
which causes errors and breaks the doctest (my guess is this is probably due to circular imports). You don't seem to need this import anymore since you stopped checking isomorphism with uniform matroids, so perhaps it is best if you remove this line.
Cheers, Rudi
Hi Chao,
I worked a bit on _minor(), a patch is in ticket #18660. Together with #18638 and #18605 that will probably take the time needed to solve your 500x1000 binary matroid to below 1 second.
It should be possible to make your method another ~5x faster, seeing that essentially all the work is happening in Step 2. It is possible to eliminate all the minor-taking from that part of your method. The set of fundamental circuits by itself is enough to detect that disconnecting cocircuit, and so a routine which solves that problem using bitpacked fundamental cocircuits could do all that in about the time now spent on components(). I will perhaps write such a specialised routine once your code is in final form, in a follow-up ticket.
Cheers, Rudi
Hi Rudi,
This is great news, I will take a look.
We could create a function that finds the components of a hypergraph given the edges. Then both components() and the code in Step 2 can just call that function.
Replying to @chaoxu:
We could create a function that finds the components of a hypergraph given the edges. Then both components() and the code in Step 2 can just call that function.
That is true, and it would not be that much work either to make such a more elementary function for SetSystem. But I'm afraid that just filling these hypergraphs with python sets may become the bottleneck. But maybe if I also write a routine for SetSystem to delete a fixed subset from all subsets in the system.
I'll whip something up.
I wrote .is_connected() for SetSystem here #18682, and it seems to work, have look. But looking at the timings just this one command may not be enough for what you need. The connectivity routine is fast enough, but filling the SetSystems may not be. Or maybe you can experiment with it a bit.
If not enough, what exactly would suffice for your purposes? A method component(e) (to compute the component containing e) and a method to restrict all subsets to a given subset?
It's fairly strange that filling in the SetSystem takes that long... I will check if it can be optimized.
If not enough, what exactly would suffice for your purposes? A method component(e) (to compute the component containing e) and a method to restrict all subsets to a given subset?
Yes. But instead of component(e), components() is enough, since we have to iterate over all components if it's not connected.
This is the completed code for bridge based 3-connectivity algorithm.
It doesn't return a separator. (The ticket:18687 will implement an alternative algorithm that returns a separator).
Hi Chao,
I don't think I should be reviewing your code since Stefan is your mentor on the gsoc project, but I can help a little.
So I tested your code to see if it would detect a separation if it exist, using the following:
def binary_matroid_with_2_separation(r,n):
A=MatrixSpace(GF(2),r,n).random_element()
s = ZZ.random_element(2,r-3)
m = ZZ.random_element(2,n-2)
for i in range(s+1, r):
for j in range(m):
A[i,j] =0
for i in range(s):
for j in range(m, n):
A[i,j] =0
return Matroid(A)
and
for i in range(10000):
r=ZZ.random_element(6,20)
n=ZZ.random_element(r+6, 5*r)
M = binary_matroid_with_2_separation(20,50)
if M.is_3connected():
print M
break
Your code seems to withstand such testing all the time.
Cheers, Rudi
Rudi, feel free to review any GSoC tickets.
FWIW, here's a slightly more generic test where the 2-separation is a truly random partition:
def test_3c(n,m):
A = Matrix(GF(2), n, m)
k = ZZ.random_element(n)
R = set([ZZ.random_element(n) for i in range(k)])
k = ZZ.random_element(m)
C = set([ZZ.random_element(m) for i in range(k)])
if len(R) + m - len(C) < 2 or len(C) + n - len(R) < 2:
print "boundary case"
return False
# Rank-1 submatrix indexed by R, C
Rs = {x: ZZ.random_element(2) for x in R}
Cs = {x: ZZ.random_element(2) for x in C}
for i in range(n):
for j in range(m):
if i in R and j in C:
A[i,j] = Rs[i] * Cs[j]
elif i in R or j in C:
A[i,j] = ZZ.random_element(2)
else:
A[i,j] = 0
return Matroid(field=GF(2), reduced_matrix=A).is_3connected()
which I ran a bunch of times:
%time
for n in range(10,20):
for m in range(15,25):
for k in range(1000):
if test_3c(n,m):
print "Whoops"
print "ok"
Took under 36 seconds on my laptop.
Now, for the review of this ticket.
./sage -t src/sage/matroids/*.py*
from the Sage home directory. You can test if your code satisfies the requirements by running ./sage -coverage src/sage/matroids/*.py*
Apart from that, the code looks clean. If you make these changes, it can get a positive review.
Author: Chao Xu
Replying to @sagetrac-Stefan:
- Maybe name the helper function _is_3connected_BC to indicate which algorithm it implements? That way, we can also have an _is_3connected_shifting version.
Is this the best way to implement it? Would an optional argument algorithm='??' (that defaults to whichever algorithm ends up being faster) work better?
Replying to @sagetrac-yomcat:
Would an optional argument algorithm='??' (that defaults to whichever algorithm ends up being faster) work better?
I think such an optional argument would be best. So you would have
is_3connected(certify = False, algorithm = None)
and helper functions _is_3_connected_BC(certify = False), your new algorithm, throwing NotImplemented when certify =True _is_3_connected_CE(certify = False), the algorithm using matroid intersection you removed (Cunningham - Edmonds?)
Replying to @sagetrac-yomcat:
Replying to @sagetrac-Stefan:
- Maybe name the helper function _is_3connected_BC to indicate which algorithm it implements? That way, we can also have an _is_3connected_shifting version.
Is this the best way to implement it? Would an optional argument algorithm='??' (that defaults to whichever algorithm ends up being faster) work better?
The public-facing function would work like you describe, and call one of the private ones either using an internal decision process, or using the algorithm argument. A separate method for each algorithm allows you to do things like recursion (in this case relevant). So I guess we are all on the same page.
The standard name for the certifying argument in the Graph methods is "certificate", which I'd use as an alternate to the "separation" argument. So the signature would be
cpdef is_3connected(self, certificate=False, algorithm=None, separation=False):
where you start with
certificate = certificate or separation
, and put in a deprecation warning when the "separation" argument is set to True.
Branch pushed to git repo; I updated commit sha1. New commits:
b7cfd15 | incorporate review comments, improve test coverage |
I have made the suggested changes.
Rudi changed the naive algorithm in #18429 (which I missed) to one based on matroid intersection. You should update names and docstrings to reflect that.
I also think (but am happy to be overruled) that something like this is more appropriate:
``None`` -- The most appropriate algorithm is chosen automatically.
+ - ``"bridges"`` -- Bixby and Cunningham's algorithm, based on bridges [BC79]_. Note that this cannot return a separator.
+ - ``"intersection"`` An algorithm based on matroid intersection.
Branch pushed to git repo; I updated commit sha1. New commits:
6a3484d | update names and doc string |
Implement the efficient 3 connectivity algorithm.
R. E. Bixby, W. H. Cunningham, Matroids, Graphs, and 3-Connectivity. In Graph theory and related topics (Proc. Conf., Univ. Waterloo, Waterloo, ON, 1977), 91-103
Depends on #18448
CC: @sagetrac-Stefan @sagetrac-yomcat @sagetrac-Rudi
Component: matroid theory
Author: Chao Xu
Branch/Commit:
482ce85
Reviewer: Michael Welsh, Rudi Pendavingh
Issue created by migration from https://trac.sagemath.org/ticket/18539