sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.44k stars 481 forks source link

Deprecate sorting of Graph vertices and edges by default #22349

Closed jdemeyer closed 2 years ago

jdemeyer commented 7 years ago

Why does Graph.vertices() need to sort the list of vertices? If the user wants a sorted list, they can always call sorted() anyway. The same for Graph.edges().

Sorting is broken badly now that #22029 is merged.

CC: @sagetrac-Stefan @jm58660 @jhpalmieri

Component: graph theory

Keywords: richcmp

Author: John Palmieri, David Coudert

Branch/Commit: fe93716

Reviewer: David Coudert, John Palmieri

Issue created by migration from https://trac.sagemath.org/ticket/22349

jdemeyer commented 7 years ago

Description changed:

--- 
+++ 
@@ -1 +1,3 @@
 Why does `Graph.vertices()` need to *sort* the list of vertices?
+
+Sorting is going to break badly in Python 3.
jdemeyer commented 7 years ago

Description changed:

--- 
+++ 
@@ -1,3 +1,3 @@
-Why does `Graph.vertices()` need to *sort* the list of vertices?
+Why does `Graph.vertices()` need to *sort* the list of vertices? If the user wants a sorted list, they can always call `sorted()` anyway.

 Sorting is going to break badly in Python 3.
jdemeyer commented 7 years ago

Description changed:

--- 
+++ 
@@ -1,3 +1,3 @@
-Why does `Graph.vertices()` need to *sort* the list of vertices? If the user wants a sorted list, they can always call `sorted()` anyway.
+Why does `Graph.vertices()` need to *sort* the list of vertices? If the user wants a sorted list, they can always call `sorted()` anyway. The same for `Graph.edges()`.

 Sorting is going to break badly in Python 3.
jdemeyer commented 7 years ago

Branch: u/jdemeyer/graph_vertices___should_not_be_sorted

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 7 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

1c3a584Deprecate default sorting of Graph vertices and edges
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 7 years ago

Commit: 1c3a584

dcoudert commented 7 years ago
comment:9

What you propose to do is not an easy task. I suspect that many algorithms assume that the vertices are sorted. For sure, many algorithms assume that the ordering is always the same when calling vertices() and vertex_iterator(), so I hope that what you propose preserves this strong assumption.

Is your long term plan to remove parameter sort from the methods or just to set its default value to false ? I don't know the potential problems of sorting with Python 3, so I'm not sure of the motivation.

Many many tests are broken

sage -t src/sage/graphs/generic_graph.py  # 45 doctests failed
sage -t src/sage/graphs/generators/smallgraphs.py  # 4 doctests failed
sage -t src/sage/graphs/strongly_regular_db.pyx  # 1 doctest failed
sage -t src/sage/graphs/graph.py  # 11 doctests failed
sage -t src/sage/graphs/base/static_sparse_graph.pyx  # 1 doctest failed
sage -t src/sage/graphs/generators/families.py  # 3 doctests failed
sage -t src/sage/graphs/digraph.py  # 10 doctests failed
sage -t src/sage/graphs/digraph_generators.py  # 3 doctests failed
sage -t src/sage/graphs/base/graph_backends.pyx  # 2 doctests failed
sage -t src/sage/graphs/generic_graph_pyx.pyx  # 3 doctests failed
sage -t src/sage/graphs/comparability.pyx  # 1 doctest failed
sage -t src/sage/graphs/bipartite_graph.py  # 3 doctests failed
sage -t src/sage/graphs/base/c_graph.pyx  # 1 doctest failed
sage -t src/sage/graphs/graph_decompositions/vertex_separation.pyx  # 4 doctests failed
sage -t src/sage/graphs/schnyder.py  # 1 doctest failed
sage -t src/sage/graphs/line_graph.py  # 3 doctests failed
sage -t src/sage/graphs/isgci.py  # 1 doctest failed
sage -t src/sage/graphs/graph_decompositions/graph_products.pyx  # 1 doctest failed
sage -t src/sage/quivers/path_semigroup.py  # 3 doctests failed
sage -t src/sage/groups/perm_gps/partn_ref/refinement_graphs.pyx  # 1 doctest failed

for instance (random copy/paste)

File "src/sage/graphs/line_graph.py", line 466, in sage.graphs.line_graph.root_graph
Failed example:
    root_graph(graphs.DiamondGraph())
Expected:
    (Graph on 4 vertices, {0: (0, 3), 1: (0, 1), 2: (0, 2), 3: (1, 2)})
Got:
    (Graph on 4 vertices, {0: (1, 2), 1: (0, 1), 2: (0, 2), 3: (0, 3)})
**********************************************************************
File "src/sage/graphs/base/graph_backends.pyx", line 802, in sage.graphs.base.graph_backends.NetworkXGraphDeprecated.mutate
Failed example:
    G.edges(sort=False)
Exception raised:
    Traceback (most recent call last):
      File "/Users/dcoudert/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 498, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/Users/dcoudert/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 861, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.graphs.base.graph_backends.NetworkXGraphDeprecated.mutate[5]>", line 1, in <module>
        G.edges(sort=False)
    TypeError: edges() got an unexpected keyword argument 'sort'
**********************************************************************
File "src/sage/graphs/digraph.py", line 3424, in sage.graphs.digraph.DiGraph.flow_polytope
Failed example:
    fl.vertices(sort=False)
Exception raised:
    Traceback (most recent call last):
      File "/Users/dcoudert/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 498, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/Users/dcoudert/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 861, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.graphs.digraph.DiGraph.flow_polytope[19]>", line 1, in <module>
        fl.vertices(sort=False)
    TypeError: __call__() got an unexpected keyword argument 'sort'
**********************************************************************
File "src/sage/graphs/generic_graph.py", line 20558, in sage.graphs.generic_graph.GenericGraph.?
Failed example:
    d.automorphism_group(algorithm='sage')
Expected:
    Permutation Group with generators [('01','02')('10','20')('11','22')('12','21')]
Got:
    Permutation Group with generators [()]
**********************************************************************
File "src/sage/graphs/generic_graph.py", line 20197, in sage.graphs.generic_graph.GenericGraph.degree_to_cell
Failed example:
    G.degree_to_cell('111', cell)
Expected:
    0
Got:
    1
jdemeyer commented 7 years ago
comment:10

Replying to @dcoudert:

What you propose to do is not an easy task.

I never claimed that it was easy. But it has to be done in order to support Python 3.

For sure, many algorithms assume that the ordering is always the same when calling vertices() and vertex_iterator(), so I hope that what you propose preserves this strong assumption.

It should remain the same, yes.

Is your long term plan to remove parameter sort from the methods or just to set its default value to false ?

I don't care much. I guess once the default is False, there will be little reason left to keep the sort keyword.

I don't know the potential problems of sorting with Python 3, so I'm not sure of the motivation.

The problem is that you cannot sort arbitrary things in Python 3. For example:

>>> sorted([1, "hello"])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: str() < int()
dcoudert commented 7 years ago
comment:11

Now I understand the need for this big change.

Is there a way to sort arbitrary things in Python 3 ? some trick ?

jdemeyer commented 7 years ago
comment:12

Replying to @dcoudert:

Is there a way to sort arbitrary things in Python 3 ?

Of course there is. The question is: what would you expect from such "arbitrary" sorting?

And if it's arbitrary anyway: why would you want to "sort" in the first place?

mezzarobba commented 7 years ago
comment:13

As far as I understand, there are at least two different issues here:

Additionally, it may well be the case that some algorithms also rely on vertices of a certain type (say tuples of integers) automatically coming out sorted in a particular order, I'm not sure.

jdemeyer commented 7 years ago

Dependencies: #22383

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 7 years ago

Changed commit from 1c3a584 to aa0691f

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 7 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

aa0691fDeprecate default sorting of Graph vertices and edges
nbruin commented 7 years ago
comment:17

Replying to @dcoudert:

Now I understand the need for this big change.

Is there a way to sort arbitrary things in Python 3 ? some trick ?

one way is

sorted( ..., key=str )

You could of course use another key function that uses some sorting heuristic (something like the key function used by https://pypi.python.org/pypi/natsort)

Display functions in IPython have hooks for nice sorting (used for dicts and sets), so if graphs need to display their keys and edges in a nice way, that's probably the hook to go for.

Algorithms that need their edges sorted should either insist on sortable labels or sort by index tuples or something like that (something that is guaranteed to be sortable regardless of what the user decides to use as labels).

jdemeyer commented 7 years ago
comment:18

Replying to @mezzarobba:

  • in interactive use, when there is a natural order on the vertices (in practice, mostly when they are integers, strings, or possibly nested tuples of such things), people want to see the vertices, edges, etc. listed in an order that makes the output easy to read, and it would be a significant regression to take that away from them.

In that case, the best solution is probably to use a custom class for this. It would behave exactly like a Python list (or tuple) in all possible ways, except that it would be sorted on display.

jdemeyer commented 7 years ago
comment:19

Replying to @nbruin:

Display functions in IPython have hooks for nice sorting (used for dicts and sets), so if graphs need to display their keys and edges in a nice way, that's probably the hook to go for.

Nils, I only just saw your comment now. But yes, this was exactly my point too.

mezzarobba commented 7 years ago
comment:20

Replying to @nbruin:

Algorithms that need their edges sorted should either insist on sortable labels or sort by index tuples or something like that (something that is guaranteed to be sortable regardless of what the user decides to use as labels).

They could simply sort by id(), provided that the same label is not represented by several physically distinct objects that compare equal. Unfortunately, this probably does happen (see #22374).

jdemeyer commented 7 years ago
comment:21

Replying to @mezzarobba:

They could simply sort by id()

If you want to sort by id(), what is the purpose of sorting at all? I mean, it's not sorting, it's putting in a deterministic (but otherwise arbitrary) order. Assuming that methods like vertices() return the vertices in a deterministic order, I don't see the point.

mezzarobba commented 7 years ago
comment:22

Replying to @jdemeyer:

Replying to @mezzarobba:

They could simply sort by id()

If you want to sort by id(), what is the purpose of sorting at all? I mean, it's not sorting, it's putting in a deterministic (but otherwise arbitrary) order. Assuming that methods like vertices() return the vertices in a deterministic order, I don't see the point.

All I'm saying is that some algorithms do want to access the vertices in some arbitrary deterministic order, typically in order to iterate over the unordered pairs {v,v'}, and, as far as I remember, currently rely on the output of vertices() etc. being sorted to do that. If all the methods these algorithms use to access the vertices return them in the same deterministic order, then indeed, there is no need to do anything, otherwise, sorting by id() may be a better option than requiring the labels to be ordered.

jdemeyer commented 6 years ago

Changed keywords from none to richcmp

saraedum commented 6 years ago

Work Issues: merge conflicts

jdemeyer commented 6 years ago

Changed work issues from merge conflicts to none

fchapoton commented 6 years ago

Changed dependencies from #22383 to none

dcoudert commented 6 years ago
comment:31

Apart from the need for ensuring that the internal order of vertices/edges during iterations is always the same (many algorithms assume that we have such a fixed order, whatever it is) and doing our best to preserve a nice display for interactive use, I think we have another issue:

The only good news here is that vertices must be immutable.

dcoudert commented 6 years ago
comment:32

26169 Trial for not sorting multiple edges

jdemeyer commented 5 years ago
comment:33

Do we already have a plan for this ticket? I know that there have many graph-related Python 3 tickets, but I'm a bit lost on the overall plan.

jdemeyer commented 5 years ago
comment:34

I am asking because there are now only a handful of doctest failures remaining in #22029 and they are almost all due to sorting in graphs.

dcoudert commented 5 years ago
comment:35

There are 3 different points here: 1) sorting the list of vertices, 2) sorting the list of edges, and 3) sorting the vertices of an edge.

For 1), we made huge progresses in reducing the dependency of methods to .vertices(). I'm not sure how many methods still rely on it. Most of them now use the ordering given by list(G) without making specific assumption on that order. We create a mapping when needed, and pass it to some methods. We can certainly completely avoid that dependency, but can we do that without deprecation ?

For 2), similarly, we have reduced the number of methods relying on .edges(), plus we can use sort=False parameter. We can try and see...

3) is more tricky and I don't know how to do that yet.

jdemeyer commented 5 years ago
comment:36

For 1) there is still an issue with relabel(): #27027

jdemeyer commented 5 years ago
comment:37

And one more with graph_isom_equivalent_non_edge_labeled_graph(): #27029

jdemeyer commented 5 years ago

Description changed:

--- 
+++ 
@@ -1,3 +1,3 @@
 Why does `Graph.vertices()` need to *sort* the list of vertices? If the user wants a sorted list, they can always call `sorted()` anyway. The same for `Graph.edges()`.

-Sorting is going to break badly in Python 3.
+Sorting is broken badly now that #22029 is merged.
dcoudert commented 5 years ago
comment:39

We have now drastically reduced the dependency to .vertices() and .edges() in the graph module (except in doctests, but we can specify sort=True or just change the expected output). We must now identify the remaining tasks to do and schedule the work. I propose to do that in #26640 to get a full picture.

jdemeyer commented 5 years ago
comment:40

Here is a concrete proposal for this ticket:

  1. If no sort option is given (the default), give a deprecation and try to sort but fail gracefully if the input is not sortable.

  2. If a sort option is given, do as before.

jdemeyer commented 5 years ago
comment:41

I also like the idea that somebody posted on sage-devel (I don't remember who) that .vertices() and .edges() should return a "view" which could have various operations on it.

dcoudert commented 5 years ago
comment:42

It's Thierry (this message). I also like this idea. Should we do that here, before in another ticket, or later ?

Networkx now also use views networkx.Graph.edges.

dcoudert commented 5 years ago
comment:43

A proposal for an EdgeView in #27408.

embray commented 5 years ago
comment:44

Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)

embray commented 5 years ago
comment:45

As the Sage-8.8 release milestone is pending, we should delete the sage-8.8 milestone for tickets that are not actively being worked on or that still require significant work to move forward. If you feel that this ticket should be included in the next Sage release at the soonest please set its milestone to the next release milestone (sage-8.9).

dcoudert commented 4 years ago
comment:46

Now that we have EdgesView #27408 and Python3, we have few remaining methods methods effectively relying on the order of the vertices and/or edges. So we should restart working on this ticket for 9.2.

dcoudert commented 4 years ago
comment:47

With #30145, we propose to deprecate edge_iterator, thus generalizing the use of .edges method and so of EdgesView. After that, it should be easier to deprecate sorting edges by default.

mkoeppe commented 3 years ago
comment:49

Setting new milestone based on a cursory review of ticket status, priority, and last modification date.

dcoudert commented 2 years ago
comment:52

Actually, #27408 has silently (not mentioned in the ticket) introduced a deprecation warning for sorting edges by default, but it is used in method .edges(...) only when sort is None and the default value of sort is True. So I think no-one is aware of it.

I'm not sure how to make this deprecation warning visible without breaking numerous doctests.

alexjbest commented 2 years ago
comment:54

Replying to @dcoudert:

Actually, #27408 has silently (not mentioned in the ticket) introduced a deprecation warning for sorting edges by default, but it is used in method .edges(...) only when sort is None and the default value of sort is True. So I think no-one is aware of it.

I'm not sure how to make this deprecation warning visible without breaking numerous doctests.

I'd be interested in seeing this ticket resolved, perhaps you could ask about how to go about this deprecation on sage-devel? Or I am happy to follow this up if you'd prefer.

dcoudert commented 2 years ago
comment:55

The difficulty is that we still have some methods relying on .vertices(), and I don't know how to avoid that.

For instance in graph.py: effective_resistance_matrix, common_neighbors_matrix, orientations (but this one can be avoided easily), chromatic_quasisymmetric_function, twograph, write_to_eps. And there might be other modules using graphs and relying on the ordering of the vertices given by .vertices() (e.g., combinat, geometry, groups, matroids, schemes, topology, etc..

So the first step might be to identify methods/modules relying on .vertices() and try to modify them. Once done, we will be able to safely deprecate sorting by default.

jhpalmieri commented 2 years ago
comment:56

The .vertices() method is probably used lots of times, but one fix could be:

Maybe it's easier with edges: add the explicit argument everywhere and then change the default to None.

Or is that too naïve? I tried this change:

diff --git a/src/sage/graphs/graph.py b/src/sage/graphs/graph.py
index 1579f23ad8..5dbb937b6b 100644
--- a/src/sage/graphs/graph.py
+++ b/src/sage/graphs/graph.py
@@ -3480,7 +3480,7 @@ class Graph(GenericGraph):
             name = 'An orientation of ' + name

         if not self.size():
-            D = DiGraph(data=[self.vertices(), []],
+            D = DiGraph(data=[self.vertices(sort=False), []],
                         format='vertices_and_edges',
                         name=name,
                         pos=self._pos,
@@ -3494,7 +3494,7 @@ class Graph(GenericGraph):

         E = [[(u,v,label), (v,u,label)] if u != v else [(u,v,label)]
              for u,v,label in self.edge_iterator()]
-        verts = self.vertices()
+        verts = self.vertices(sort=False)
         for edges in itertools.product(*E):
             D = DiGraph(data=[verts, edges],
                         format='vertices_and_edges',
@@ -4009,7 +4009,7 @@ class Graph(GenericGraph):
             R = t.parent()
         M = QuasiSymmetricFunctions(R).M()
         ret = M.zero()
-        V = self.vertices()
+        V = self.vertices(sort=False)

         def asc(sigma):
             stat = 0
@@ -6185,7 +6185,7 @@ class Graph(GenericGraph):
                 T.append([x, y, z])

         T = TwoGraph(T)
-        T.relabel({i: v for i,v in enumerate(self.vertices())})
+        T.relabel({i: v for i,v in enumerate(self.vertices(sort=False))})

         return T

@@ -6221,7 +6221,7 @@ class Graph(GenericGraph):
         if filename[-4:] != '.eps':
             filename += '.eps'
         f = open(filename, 'w')
-        f.write( print_graph_eps(self.vertices(), self.edge_iterator(), pos) )
+        f.write( print_graph_eps(self.vertices(sort=False), self.edge_iterator(), pos) )
         f.close()

     @doc_index("Algorithmically hard stuff")
@@ -7940,7 +7940,7 @@ class Graph(GenericGraph):
             D = None
         elif self.order() == 1:
             D = create_prime_node()
-            D.children.append(create_normal_node(self.vertices()[0]))
+            D.children.append(create_normal_node(self.vertices(sort=False)[0]))
         else:
             D = habib_maurer_algorithm(self)

@@ -9225,7 +9225,7 @@ class Graph(GenericGraph):
         if not n:
             raise ValueError('unable to compute effective resistance for an empty Graph object')
         if vertices is None:
-            vertices = self.vertices()
+            vertices = self.vertices(sort=False)
         self._scream_if_not_simple()
         if not self.is_connected():
             raise ValueError('the Graph is not a connected graph')
@@ -9458,7 +9458,7 @@ class Graph(GenericGraph):
         """
         self._scream_if_not_simple()
         if vertices is None:
-            vertices = self.vertices()
+            vertices = self.vertices(sort=False)
         set_immutable = kwds.pop('immutable', False)
         A = self.adjacency_matrix(vertices=vertices, base_ring=base_ring, immutable=True, **kwds)
         M = A**2

and all tests in the graphs directory still passed.

dcoudert commented 2 years ago
comment:57

We cannot blindly use .vertices(sort=False). It will clearly break badly some methods, for instance methods building adjacency or laplacian matrices, etc. We already pave the way by reducing the number of situations in which we call .vertices(), but we still have work to do to reduce the dependency. We did the same for .edges().

The safe way is to use only .vertices(sort=True), and whenever possible, avoid calls to .vertices(). For instance, I see in generic_graph.py a call to self.vertices(sort=False)[0] while we have self.order() == 1. So we can avoid this call.

Then, we can set the default to None with a deprecation warning that the default will be changed to False.

jhpalmieri commented 2 years ago
comment:58

Replying to @dcoudert:

We cannot blindly use .vertices(sort=False).

That's not what I suggested. Please reread the first bullet point.

dcoudert commented 2 years ago
comment:59

Right, I missed some parts. I agree with your proposal, but we need an appropriate deprecation warning to inform users of the change. It's a big change.

jhpalmieri commented 2 years ago
comment:60

The best I can think of right now is:

Then later we can change the default to sort=False.