Closed jdemeyer closed 2 years ago
Description changed:
---
+++
@@ -1 +1,3 @@
Why does `Graph.vertices()` need to *sort* the list of vertices?
+
+Sorting is going to break badly in Python 3.
Description changed:
---
+++
@@ -1,3 +1,3 @@
-Why does `Graph.vertices()` need to *sort* the list of vertices?
+Why does `Graph.vertices()` need to *sort* the list of vertices? If the user wants a sorted list, they can always call `sorted()` anyway.
Sorting is going to break badly in Python 3.
Description changed:
---
+++
@@ -1,3 +1,3 @@
-Why does `Graph.vertices()` need to *sort* the list of vertices? If the user wants a sorted list, they can always call `sorted()` anyway.
+Why does `Graph.vertices()` need to *sort* the list of vertices? If the user wants a sorted list, they can always call `sorted()` anyway. The same for `Graph.edges()`.
Sorting is going to break badly in Python 3.
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
1c3a584 | Deprecate default sorting of Graph vertices and edges |
Commit: 1c3a584
What you propose to do is not an easy task. I suspect that many algorithms assume that the vertices are sorted. For sure, many algorithms assume that the ordering is always the same when calling vertices()
and vertex_iterator()
, so I hope that what you propose preserves this strong assumption.
Is your long term plan to remove parameter sort
from the methods or just to set its default value to false ?
I don't know the potential problems of sorting with Python 3, so I'm not sure of the motivation.
Many many tests are broken
sage -t src/sage/graphs/generic_graph.py # 45 doctests failed
sage -t src/sage/graphs/generators/smallgraphs.py # 4 doctests failed
sage -t src/sage/graphs/strongly_regular_db.pyx # 1 doctest failed
sage -t src/sage/graphs/graph.py # 11 doctests failed
sage -t src/sage/graphs/base/static_sparse_graph.pyx # 1 doctest failed
sage -t src/sage/graphs/generators/families.py # 3 doctests failed
sage -t src/sage/graphs/digraph.py # 10 doctests failed
sage -t src/sage/graphs/digraph_generators.py # 3 doctests failed
sage -t src/sage/graphs/base/graph_backends.pyx # 2 doctests failed
sage -t src/sage/graphs/generic_graph_pyx.pyx # 3 doctests failed
sage -t src/sage/graphs/comparability.pyx # 1 doctest failed
sage -t src/sage/graphs/bipartite_graph.py # 3 doctests failed
sage -t src/sage/graphs/base/c_graph.pyx # 1 doctest failed
sage -t src/sage/graphs/graph_decompositions/vertex_separation.pyx # 4 doctests failed
sage -t src/sage/graphs/schnyder.py # 1 doctest failed
sage -t src/sage/graphs/line_graph.py # 3 doctests failed
sage -t src/sage/graphs/isgci.py # 1 doctest failed
sage -t src/sage/graphs/graph_decompositions/graph_products.pyx # 1 doctest failed
sage -t src/sage/quivers/path_semigroup.py # 3 doctests failed
sage -t src/sage/groups/perm_gps/partn_ref/refinement_graphs.pyx # 1 doctest failed
for instance (random copy/paste)
File "src/sage/graphs/line_graph.py", line 466, in sage.graphs.line_graph.root_graph
Failed example:
root_graph(graphs.DiamondGraph())
Expected:
(Graph on 4 vertices, {0: (0, 3), 1: (0, 1), 2: (0, 2), 3: (1, 2)})
Got:
(Graph on 4 vertices, {0: (1, 2), 1: (0, 1), 2: (0, 2), 3: (0, 3)})
**********************************************************************
File "src/sage/graphs/base/graph_backends.pyx", line 802, in sage.graphs.base.graph_backends.NetworkXGraphDeprecated.mutate
Failed example:
G.edges(sort=False)
Exception raised:
Traceback (most recent call last):
File "/Users/dcoudert/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 498, in _run
self.compile_and_execute(example, compiler, test.globs)
File "/Users/dcoudert/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 861, in compile_and_execute
exec(compiled, globs)
File "<doctest sage.graphs.base.graph_backends.NetworkXGraphDeprecated.mutate[5]>", line 1, in <module>
G.edges(sort=False)
TypeError: edges() got an unexpected keyword argument 'sort'
**********************************************************************
File "src/sage/graphs/digraph.py", line 3424, in sage.graphs.digraph.DiGraph.flow_polytope
Failed example:
fl.vertices(sort=False)
Exception raised:
Traceback (most recent call last):
File "/Users/dcoudert/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 498, in _run
self.compile_and_execute(example, compiler, test.globs)
File "/Users/dcoudert/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 861, in compile_and_execute
exec(compiled, globs)
File "<doctest sage.graphs.digraph.DiGraph.flow_polytope[19]>", line 1, in <module>
fl.vertices(sort=False)
TypeError: __call__() got an unexpected keyword argument 'sort'
**********************************************************************
File "src/sage/graphs/generic_graph.py", line 20558, in sage.graphs.generic_graph.GenericGraph.?
Failed example:
d.automorphism_group(algorithm='sage')
Expected:
Permutation Group with generators [('01','02')('10','20')('11','22')('12','21')]
Got:
Permutation Group with generators [()]
**********************************************************************
File "src/sage/graphs/generic_graph.py", line 20197, in sage.graphs.generic_graph.GenericGraph.degree_to_cell
Failed example:
G.degree_to_cell('111', cell)
Expected:
0
Got:
1
Replying to @dcoudert:
What you propose to do is not an easy task.
I never claimed that it was easy. But it has to be done in order to support Python 3.
For sure, many algorithms assume that the ordering is always the same when calling
vertices()
andvertex_iterator()
, so I hope that what you propose preserves this strong assumption.
It should remain the same, yes.
Is your long term plan to remove parameter
sort
from the methods or just to set its default value to false ?
I don't care much. I guess once the default is False
, there will be little reason left to keep the sort
keyword.
I don't know the potential problems of sorting with Python 3, so I'm not sure of the motivation.
The problem is that you cannot sort arbitrary things in Python 3. For example:
>>> sorted([1, "hello"])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: str() < int()
Now I understand the need for this big change.
Is there a way to sort arbitrary things in Python 3 ? some trick ?
Replying to @dcoudert:
Is there a way to sort arbitrary things in Python 3 ?
Of course there is. The question is: what would you expect from such "arbitrary" sorting?
And if it's arbitrary anyway: why would you want to "sort" in the first place?
As far as I understand, there are at least two different issues here:
Additionally, it may well be the case that some algorithms also rely on vertices of a certain type (say tuples of integers) automatically coming out sorted in a particular order, I'm not sure.
Dependencies: #22383
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
aa0691f | Deprecate default sorting of Graph vertices and edges |
Replying to @dcoudert:
Now I understand the need for this big change.
Is there a way to sort arbitrary things in Python 3 ? some trick ?
one way is
sorted( ..., key=str )
You could of course use another key function that uses some sorting heuristic (something like the key function used by https://pypi.python.org/pypi/natsort)
Display functions in IPython have hooks for nice sorting (used for dicts and sets), so if graphs need to display their keys and edges in a nice way, that's probably the hook to go for.
Algorithms that need their edges sorted should either insist on sortable labels or sort by index tuples or something like that (something that is guaranteed to be sortable regardless of what the user decides to use as labels).
Replying to @mezzarobba:
- in interactive use, when there is a natural order on the vertices (in practice, mostly when they are integers, strings, or possibly nested tuples of such things), people want to see the vertices, edges, etc. listed in an order that makes the output easy to read, and it would be a significant regression to take that away from them.
In that case, the best solution is probably to use a custom class for this. It would behave exactly like a Python list
(or tuple
) in all possible ways, except that it would be sorted on display.
Replying to @nbruin:
Display functions in IPython have hooks for nice sorting (used for dicts and sets), so if graphs need to display their keys and edges in a nice way, that's probably the hook to go for.
Nils, I only just saw your comment now. But yes, this was exactly my point too.
Replying to @nbruin:
Algorithms that need their edges sorted should either insist on sortable labels or sort by index tuples or something like that (something that is guaranteed to be sortable regardless of what the user decides to use as labels).
They could simply sort by id()
, provided that the same label is not represented by several physically distinct objects that compare equal. Unfortunately, this probably does happen (see #22374).
Replying to @mezzarobba:
They could simply sort by
id()
If you want to sort by id()
, what is the purpose of sorting at all? I mean, it's not sorting, it's putting in a deterministic (but otherwise arbitrary) order. Assuming that methods like vertices()
return the vertices in a deterministic order, I don't see the point.
Replying to @jdemeyer:
Replying to @mezzarobba:
They could simply sort by
id()
If you want to sort by
id()
, what is the purpose of sorting at all? I mean, it's not sorting, it's putting in a deterministic (but otherwise arbitrary) order. Assuming that methods likevertices()
return the vertices in a deterministic order, I don't see the point.
All I'm saying is that some algorithms do want to access the vertices in some arbitrary deterministic order, typically in order to iterate over the unordered pairs {v,v'}, and, as far as I remember, currently rely on the output of vertices()
etc. being sorted to do that. If all the methods these algorithms use to access the vertices return them in the same deterministic order, then indeed, there is no need to do anything, otherwise, sorting by id()
may be a better option than requiring the labels to be ordered.
Changed keywords from none to richcmp
Work Issues: merge conflicts
Changed work issues from merge conflicts to none
Changed dependencies from #22383 to none
Apart from the need for ensuring that the internal order of vertices/edges during iterations is always the same (many algorithms assume that we have such a fixed order, whatever it is) and doing our best to preserve a nice display for interactive use, I think we have another issue:
(u, v, label)
in an undirected graph, we usually have u <= v
. So somehow we have a sorting here and many algorithms assume that vertices are ordered (not all). If I understand well, it is a problem with Python3 if vertex labels are of different types. We must find a solution for that.if u < v
. Again a big problem.
So what can we do ?The only good news here is that vertices must be immutable.
Do we already have a plan for this ticket? I know that there have many graph-related Python 3 tickets, but I'm a bit lost on the overall plan.
I am asking because there are now only a handful of doctest failures remaining in #22029 and they are almost all due to sorting in graphs.
There are 3 different points here: 1) sorting the list of vertices, 2) sorting the list of edges, and 3) sorting the vertices of an edge.
For 1), we made huge progresses in reducing the dependency of methods to .vertices()
. I'm not sure how many methods still rely on it. Most of them now use the ordering given by list(G)
without making specific assumption on that order. We create a mapping when needed, and pass it to some methods. We can certainly completely avoid that dependency, but can we do that without deprecation ?
For 2), similarly, we have reduced the number of methods relying on .edges()
, plus we can use sort=False
parameter. We can try and see...
3) is more tricky and I don't know how to do that yet.
For 1) there is still an issue with relabel()
: #27027
And one more with graph_isom_equivalent_non_edge_labeled_graph()
: #27029
Description changed:
---
+++
@@ -1,3 +1,3 @@
Why does `Graph.vertices()` need to *sort* the list of vertices? If the user wants a sorted list, they can always call `sorted()` anyway. The same for `Graph.edges()`.
-Sorting is going to break badly in Python 3.
+Sorting is broken badly now that #22029 is merged.
We have now drastically reduced the dependency to .vertices()
and .edges()
in the graph module (except in doctests, but we can specify sort=True
or just change the expected output).
We must now identify the remaining tasks to do and schedule the work. I propose to do that in #26640 to get a full picture.
Here is a concrete proposal for this ticket:
If no sort
option is given (the default), give a deprecation and try to sort but fail gracefully if the input is not sortable.
If a sort
option is given, do as before.
I also like the idea that somebody posted on sage-devel (I don't remember who) that .vertices()
and .edges()
should return a "view" which could have various operations on it.
It's Thierry (this message). I also like this idea. Should we do that here, before in another ticket, or later ?
Networkx now also use views networkx.Graph.edges.
A proposal for an EdgeView
in #27408.
Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)
As the Sage-8.8 release milestone is pending, we should delete the sage-8.8 milestone for tickets that are not actively being worked on or that still require significant work to move forward. If you feel that this ticket should be included in the next Sage release at the soonest please set its milestone to the next release milestone (sage-8.9).
Now that we have EdgesView #27408 and Python3, we have few remaining methods methods effectively relying on the order of the vertices and/or edges. So we should restart working on this ticket for 9.2.
With #30145, we propose to deprecate edge_iterator
, thus generalizing the use of .edges
method and so of EdgesView
. After that, it should be easier to deprecate sorting edges by default.
Setting new milestone based on a cursory review of ticket status, priority, and last modification date.
Actually, #27408 has silently (not mentioned in the ticket) introduced a deprecation warning for sorting edges by default, but it is used in method .edges(...)
only when sort is None
and the default value of sort
is True
. So I think no-one is aware of it.
I'm not sure how to make this deprecation warning visible without breaking numerous doctests.
Replying to @dcoudert:
Actually, #27408 has silently (not mentioned in the ticket) introduced a deprecation warning for sorting edges by default, but it is used in method
.edges(...)
only whensort is None
and the default value ofsort
isTrue
. So I think no-one is aware of it.I'm not sure how to make this deprecation warning visible without breaking numerous doctests.
I'd be interested in seeing this ticket resolved, perhaps you could ask about how to go about this deprecation on sage-devel? Or I am happy to follow this up if you'd prefer.
The difficulty is that we still have some methods relying on .vertices()
, and I don't know how to avoid that.
For instance in graph.py
: effective_resistance_matrix
, common_neighbors_matrix
, orientations
(but this one can be avoided easily), chromatic_quasisymmetric_function
, twograph
, write_to_eps
. And there might be other modules using graphs and relying on the ordering of the vertices given by .vertices()
(e.g., combinat
, geometry
, groups
, matroids
, schemes
, topology
, etc..
So the first step might be to identify methods/modules relying on .vertices()
and try to modify them. Once done, we will be able to safely deprecate sorting by default.
The .vertices()
method is probably used lots of times, but one fix could be:
sort=False
or sort=True
as necessary,sort=False
.Maybe it's easier with edges: add the explicit argument everywhere and then change the default to None
.
Or is that too naïve? I tried this change:
diff --git a/src/sage/graphs/graph.py b/src/sage/graphs/graph.py
index 1579f23ad8..5dbb937b6b 100644
--- a/src/sage/graphs/graph.py
+++ b/src/sage/graphs/graph.py
@@ -3480,7 +3480,7 @@ class Graph(GenericGraph):
name = 'An orientation of ' + name
if not self.size():
- D = DiGraph(data=[self.vertices(), []],
+ D = DiGraph(data=[self.vertices(sort=False), []],
format='vertices_and_edges',
name=name,
pos=self._pos,
@@ -3494,7 +3494,7 @@ class Graph(GenericGraph):
E = [[(u,v,label), (v,u,label)] if u != v else [(u,v,label)]
for u,v,label in self.edge_iterator()]
- verts = self.vertices()
+ verts = self.vertices(sort=False)
for edges in itertools.product(*E):
D = DiGraph(data=[verts, edges],
format='vertices_and_edges',
@@ -4009,7 +4009,7 @@ class Graph(GenericGraph):
R = t.parent()
M = QuasiSymmetricFunctions(R).M()
ret = M.zero()
- V = self.vertices()
+ V = self.vertices(sort=False)
def asc(sigma):
stat = 0
@@ -6185,7 +6185,7 @@ class Graph(GenericGraph):
T.append([x, y, z])
T = TwoGraph(T)
- T.relabel({i: v for i,v in enumerate(self.vertices())})
+ T.relabel({i: v for i,v in enumerate(self.vertices(sort=False))})
return T
@@ -6221,7 +6221,7 @@ class Graph(GenericGraph):
if filename[-4:] != '.eps':
filename += '.eps'
f = open(filename, 'w')
- f.write( print_graph_eps(self.vertices(), self.edge_iterator(), pos) )
+ f.write( print_graph_eps(self.vertices(sort=False), self.edge_iterator(), pos) )
f.close()
@doc_index("Algorithmically hard stuff")
@@ -7940,7 +7940,7 @@ class Graph(GenericGraph):
D = None
elif self.order() == 1:
D = create_prime_node()
- D.children.append(create_normal_node(self.vertices()[0]))
+ D.children.append(create_normal_node(self.vertices(sort=False)[0]))
else:
D = habib_maurer_algorithm(self)
@@ -9225,7 +9225,7 @@ class Graph(GenericGraph):
if not n:
raise ValueError('unable to compute effective resistance for an empty Graph object')
if vertices is None:
- vertices = self.vertices()
+ vertices = self.vertices(sort=False)
self._scream_if_not_simple()
if not self.is_connected():
raise ValueError('the Graph is not a connected graph')
@@ -9458,7 +9458,7 @@ class Graph(GenericGraph):
"""
self._scream_if_not_simple()
if vertices is None:
- vertices = self.vertices()
+ vertices = self.vertices(sort=False)
set_immutable = kwds.pop('immutable', False)
A = self.adjacency_matrix(vertices=vertices, base_ring=base_ring, immutable=True, **kwds)
M = A**2
and all tests in the graphs directory still passed.
We cannot blindly use .vertices(sort=False)
. It will clearly break badly some methods, for instance methods building adjacency or laplacian matrices, etc.
We already pave the way by reducing the number of situations in which we call .vertices()
, but we still have work to do to reduce the dependency. We did the same for .edges()
.
The safe way is to use only .vertices(sort=True)
, and whenever possible, avoid calls to .vertices()
. For instance, I see in generic_graph.py
a call to self.vertices(sort=False)[0]
while we have self.order() == 1
. So we can avoid this call.
Then, we can set the default to None
with a deprecation warning that the default will be changed to False
.
Replying to @dcoudert:
We cannot blindly use
.vertices(sort=False)
.
That's not what I suggested. Please reread the first bullet point.
Right, I missed some parts. I agree with your proposal, but we need an appropriate deprecation warning to inform users of the change. It's a big change.
The best I can think of right now is:
Make the default sort=None
and raise a deprecation warning if this default gets used, thereby requiring an explicit argument everywhere in the Sage library.
So use sort=False
or sort=True
as appropriate in the library.
Then later we can change the default to sort=False
.
Why does
Graph.vertices()
need to sort the list of vertices? If the user wants a sorted list, they can always callsorted()
anyway. The same forGraph.edges()
.Sorting is broken badly now that #22029 is merged.
CC: @sagetrac-Stefan @jm58660 @jhpalmieri
Component: graph theory
Keywords: richcmp
Author: John Palmieri, David Coudert
Branch/Commit:
fe93716
Reviewer: David Coudert, John Palmieri
Issue created by migration from https://trac.sagemath.org/ticket/22349