Closed 6bdad4c1-1e26-4f2f-a442-a01a2292c181 closed 9 years ago
Commit: e780f71
New commits:
e780f71 | trac #18250: G.triangles_count speedup |
Description changed:
---
+++
@@ -33,11 +33,15 @@
Before
- +sage: g = graphs.RandomGNP(500,.5) +sage: %timeit g.triangles_count() +1 loops, best of 3: 9.78 s per loop
After
- +sage: g = graphs.RandomGNP(500,.5) +sage: %timeit g.triangles_count() +10 loops, best of 3: 162 ms per loop
Branch: public/18250
Branch pushed to git repo; I updated commit sha1. New commits:
ed8bb3c | Trac 18250: documentation |
Hello,
Small commit for documentation at public/18250
.
The algorithm 'iter'
returns int
while dense_copy
returns Integer
. This would better be uniform.
You always perform a copy?! Isn't it possible to have a graph whose backend is already a static_sparse_graph
or a static_dense_graph
?
Vincent
Branch pushed to git repo; I updated commit sha1. New commits:
fa6115d | trac #18250: Integer return type and avoid useless copies |
Yooooooooooooooo,
Small commit for documentation at
public/18250
.
Oh, right. Thanks.
The algorithm
'iter'
returnsint
whiledense_copy
returnsInteger
. This would better be uniform.
Done.
You always perform a copy?! Isn't it possible to have a graph whose backend is already a
static_sparse_graph
or astatic_dense_graph
?
Well, in this case copies are eally cheap, so I personally prefer a simpler code to one that avoids that copy.
Still, I implemented it. There is a 'trick' somewhere:
1) I first build a copy of the graph with G.copy(immutable=True)
(which makes no copy if the graph is already immutable), then access its internal data structure
2) If I do g[0] = G.copy(immutable=True)._backend._cg.g[0]
I get a segfault because G.copy() is immediately deallocated. So I first do G = G.copy(immutable=True)._backend
, and then access its internal data.
Fortunately the copy G
is not deleted until the end of the function. If Cython ever notices that (as far as it is concerned) it is never used again in the function, then it will be deallocated and we will get a segfault again.
Nathann
Replying to @nathanncohen:
You always perform a copy?! Isn't it possible to have a graph whose backend is already a
static_sparse_graph
or astatic_dense_graph
?Fortunately the copy
G
is not deleted until the end of the function. If Cython ever notices that (as far as it is concerned) it is never used again in the function, then it will be deallocated and we will get a segfault again.
Cython does not perform deallocation, it is Python job. Though Cython does increment/decrement the reference counter. And precisely, doing G = G.copy(immutable=True)
precisely increment the reference counter!
Looks good! I am running the tests...
Vincent
Cython does not perform deallocation, it is Python job.
Well if you prefer, but this code is translated into C, and gcc may notice that the object is totally useless. They must have done something to keep those Python variables until the end.
Nathann
Replying to @nathanncohen:
Cython does not perform deallocation, it is Python job.
Well if you prefer, but this code is translated into C, and gcc may notice that the object is totally useless. They must have done something to keep those Python variables until the end.
Nope. gcc will not do that. An affectation G = H
is not one instruction if H
is a Python object. Cython translates it into something like
cdef PyObject * G = H;
Py_INCREF(G);
whick prevents deallocation of H until G is garbage collected (typically at the end of the function or until a call of del G
).
Vincent
Reviewer: Vincent Delecroix
I do not like the default choice being sparse copy
. Especially if G
is implemented as a static dense graph! Wouldn't it be better to have a default algorithm=None
and be more clever about choosing the right algorithm?
I don't think that anybody in Sage plays with data structure except me, in Sage's subfunctions. I can do it, but I am rather convinced that it is useless. It may be useful in one or two years if I can rewrite and clean all this stuff.
Will do.
Nathann
Replying to @nathanncohen:
I don't think that anybody in Sage plays with data structure except me, in Sage's subfunctions. I can do it, but I am rather convinced that it is useless. It may be useful in one or two years if I can rewrite and clean all this stuff.
Ok. Let say something based on the density.
Ok. Let say something based on the density.
O_o
Nononono. That's for the user to decide. I don't want ugly density>0.3
which may only make sense on a specific architecture, and on the kind of graphs that were used for the tests.
Branch pushed to git repo; I updated commit sha1. New commits:
fd88c98 | trac #18250: algorithm=None by default and better 'wrong values' check |
Hello,
The method matrix
does work for non simple graphs, right? (after removing the loops)
Vincent
The method
matrix
does work for non simple graphs, right? (after removing the loops)
After removing loops and multiple edges, yes.
Nathann
Replying to @nathanncohen:
The method
matrix
does work for non simple graphs, right? (after removing the loops)After removing loops and multiple edges, yes.
Please, consider the following right answers
sage: G = Graph([(0,1),(0,1),(1,2),(2,0)], multiedges=True)
sage: (G.adjacency_matrix()**3).trace() / 6
2
sage: sage: G = Graph([(0,1),(0,1),(1,2), (1,2), (2,0)], multiedges=True)
sage: (G.adjacency_matrix()**3).trace()//6
4
What do you mean?
Replying to @nathanncohen:
What do you mean?
That the method algorithm=matrix
is valid for non simple graphs (once you removed the loops).
Depends what you call a triangle. For me it is a set of three points adjacent to each other.
If it is a 'closed walk of length 3' then you should keep the loops inside.
Replying to @nathanncohen:
If it is a 'closed walk of length 3' then you should keep the loops inside.
It is not. It is C3: three vertices and three edges! The edges are part of the data.
Ahahah. So if you have three vertices and 4 edges in between, then it is not a triangle? No problem, yet another definition
Nathann
Replying to @nathanncohen:
Ahahah. So if you have three vertices and 4 edges in between, then it is not a triangle? No problem, yet another definition
No. It is not a triangle. But it does contain many triangles. It is very much like on the sphere. For each (non degenerate) triple of points you have 4 triangles whose vertices are exactly these points.
See, I am extremely aware of this kind of problems. There is one thousand different ways of defining stuff, and everybody has his own notion of it, depending on where you come from. I'm pretty sure that guys from word theory/automata would think that a triangle is just "a path of length three that leads you back to where you started" (that corresponds to the 'closed walk' I mentionned above).
I don't want people to believe stuff and end up with something else than what they had in mind. If two different persons can expect different result from the same function, I want either none of them or a way to force them to tell me what exactly they expect.
Oh, and then you will also have the guys who have a weight on their edges and want to count the number of weighted triangles.
Replying to @nathanncohen:
Oh, and then you will also have the guys who have a weight on their edges and want to count the number of weighted triangles.
This is what I am thinking about when I consider a multi graph, it is just a graph with (positive) integer labels. The weight of a triangle is the product of the labels on the three edges. And to count triangles, you just sum the weights. It is what you get from the matrix trace (when you removed the loops).
But I agree that there is one ambiguity: you might want to count induced C3.
Vincent
Replying to @nathanncohen:
I'm pretty sure that guys from word theory/automata would think that a triangle is just "a path of length three that leads you back to where you started" (that corresponds to the 'closed walk' I mentionned above).
Some people have wrong ideas. I know.
Some people have wrong ideas. I know.
And in my infinite Christ-like generosity I still care for them, and they shouldn't get wrong results just because they are sinners.
Nathann
More seriously, I would prefer if the triangles_count
in static_dense_graph
would only accept backend being static_dense_graph
(and move the copy step inside the generic_graph
). Would make more sense. But that's very minor. I just want to know your feelings about how you would handle specificities of the backends in the future.
More seriously, I would prefer if the
triangles_count
instatic_dense_graph
would only accept backend beingstatic_dense_graph
(and move the copy step inside thegeneric_graph
)
Well.. generic_graph.py
cannot handle C
types so it would have required to move the function to generic_graph_pyx.pyx
(what a mess >_<
), but I think that I was also influenced by the function is_strongly_regular
immediately above.
Also, at first this function was building a copy of the graph every time, because I really did not think that it mattered. There are many functions like that in distance_all_pairs
too. It contains every function which "relies heavily on the distance_all_pairs functions", and most of them accept Sage graphs directly (though some of them have pure C couterparts).
I just want to know your feelings about how you would handle specificities of the backends in the future.
Can you tell me what you mean by "specificities of the backends"?
Nathann
(Thanks for the review btw!)
Can you tell me what you mean by "specificities of the backends"?
If you actually mean "specificities of the backens" (and not about the Multi,looped,edge-labelled) graph), then what I want to do is merely to make it simple to switch from one to the other, for the user or for the coders. I am trying to clean the constructor first (#18185 in needs_review
), then I will try to make it a bit faster (creating immutable copies is unnecessarily complicated), to distribute them a bit (into subfunctions).
The main problem I haven't solved is this: what should the default backend for Sage graphs be?
For small graphs I expect that the current one is right. But being able to run "has_edge" in log(n)
time, or being able to remove edges has a huge cost. If we just want to add edges and vertices (i.e. to build graphs, from an algorithm or from a file) then it can be MUCH cheaper. For analysis of large networks it is something that you cannot afford to pay, and then this backend must be different.
So if we want to make no choice at all, it really has to become much simpler and natural to pick a backend for a graph.
Also, we may want to have a 'virtual' backend like in Gap for graphs defined from a group action. No graph would actually be stored, yet all algorithms should be available.
Well, that's what I think of these days.
Nathann
Changed branch from public/18250 to fd88c98
I thought that we had a decent implementation of
G.triangles_count
. Turns out that we did not.Here is what the branch does:
Exception on non-simple graphs. The digraph version can handle loops (it cannot handle multiple edges) and the graph version cannot handle any of the two.
(obvious) speedup in the 'iter' algorithm. Before:
After:
Added
triangles_count
instatic_sparse_graph
and another one instatic_dense_graph
. They first copy the graph in a proper data structure and count the triangles on the copy. They can take require more memory (though they are 100000x cheaper than the default data structure)The default algorithm is 'iter' in the code but 'matrix' in the doc. Fixed:
sparse_copy
is now the default.Overall, comparing default implementations:
Before
After
CC: @sagetrac-azi @videlec @dcoudert
Component: graph theory
Author: Nathann Cohen
Branch/Commit:
fd88c98
Reviewer: Vincent Delecroix
Issue created by migration from https://trac.sagemath.org/ticket/18250