sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.32k stars 453 forks source link

py3: new doctest failures in homology #26966

Closed jhpalmieri closed 5 years ago

jhpalmieri commented 5 years ago

With Python 3, before #26931:

sage -t src/sage/homology/simplicial_complex.py  # 22 doctests failed
sage -t src/sage/homology/examples.py  # 7 doctests failed

After #26931:

sage -t src/sage/homology/examples.py  # 14 doctests failed
sage -t src/sage/homology/simplicial_complex.py  # 35 doctests failed
sage -t src/sage/homology/simplicial_set.py  # 6 doctests failed
sage -t src/sage/homology/delta_complex.py  # 3 doctests failed
sage -t src/sage/homology/simplicial_complexes_catalog.py  # 3 doctests failed

After this branch:

sage -t src/sage/homology/simplicial_complex.py  # 10 doctests failed
sage -t src/sage/homology/examples.py  # 1 doctest failed

Component: python3

Author: John Palmieri

Branch/Commit: 3da1d0f

Reviewer: Jeroen Demeyer, Travis Scrimshaw

Issue created by migration from https://trac.sagemath.org/ticket/26966

jhpalmieri commented 5 years ago
comment:1

The failures come from trying to sort vertices. Before #26931, there was a fallback to use str as a key, but that was removed.

File "src/sage/homology/simplicial_complexes_catalog.py", line 57, in sage.homology.simplicial_complexes_catalog
Failed example:
    simplicial_complexes.SurfaceOfGenus(3)
Exception raised:
    Traceback (most recent call last):
      File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/PYTHON3/sage-8.5/local/lib/python3.6/site-packages/sage/doctest/forker.py", line 671, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/PYTHON3/sage-8.5/local/lib/python3.6/site-packages/sage/doctest/forker.py", line 1086, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.homology.simplicial_complexes_catalog[2]>", line 1, in <module>
        simplicial_complexes.SurfaceOfGenus(Integer(3))
      File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/PYTHON3/sage-8.5/local/lib/python3.6/site-packages/sage/homology/examples.py", line 451, in SurfaceOfGenus
        S = S.connected_sum(T)
      File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/PYTHON3/sage-8.5/local/lib/python3.6/site-packages/sage/homology/simplicial_complex.py", line 2770, in connected_sum
        return SimplicialComplex(facet_set, is_mutable=is_mutable)
      File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/PYTHON3/sage-8.5/local/lib/python3.6/site-packages/sage/homology/simplicial_complex.py", line 1040, in __init__
        vertex_set = sorted(vertex_set)
    TypeError: '<' not supported between instances of 'str' and 'int'
jhpalmieri commented 5 years ago
comment:2

I don't know if this branch is the correct approach, but it helps. If it is the correct approach, it is incomplete: at #26966, I included a list of some of the methods which need attention paid to how vertices are sorted:

This list may not be complete, and the branch here only deals with a few of these.

jhpalmieri commented 5 years ago

Branch: u/jhpalmieri/sorting-simplicial-complexes

jdemeyer commented 5 years ago

Commit: b41b33c

jdemeyer commented 5 years ago
comment:4

I don't agree with

diff --git a/src/sage/homology/simplicial_complex.py b/src/sage/homology/simplicial_complex.py
index 946eda0..fd88c23 100644
--- a/src/sage/homology/simplicial_complex.py
+++ b/src/sage/homology/simplicial_complex.py
@@ -1037,7 +1037,10 @@ class SimplicialComplex(Parent, GenericCellComplex):
             vertex_set = range(n + 1)

         if sort_facets is True:
-            vertex_set = sorted(vertex_set)
+            try:
+                vertex_set = sorted(vertex_set)
+            except TypeError:
+                vertex_set = sorted(vertex_set, key=str)
         elif callable(sort_facets):
             vertex_set = sorted(vertex_set, key=sort_facets)
         elif not sort_facets:

because it's rather arbitrary and ill-defined.


New commits:

b41b33ctrac 26966: clean up sorting for some simplicial complex methods.
jdemeyer commented 5 years ago
comment:5

Do you know why unsortable lists of vertices are so common in simplicial complexes? Is there a single place where these mixed int/str vertices come from?

jdemeyer commented 5 years ago
comment:6

For example, in MooreSpace, the obvious solution to me is to use only strings as vertices, i.e. replace 1 by "1".

jhpalmieri commented 5 years ago
comment:7

Replying to @jdemeyer:

Do you know why unsortable lists of vertices are so common in simplicial complexes? Is there a single place where these mixed int/str vertices come from?

I don't think we should impose restrictions on what users might choose to do. We can choose good defaults for the specific examples (like MooreSpace), but if someone wants to form a disjoint union from a complex whose vertices are integers with another whose vertices are strings, we should allow that.

jhpalmieri commented 5 years ago
comment:8

Replying to @jdemeyer:

I don't agree with

diff --git a/src/sage/homology/simplicial_complex.py b/src/sage/homology/simplicial_complex.py
index 946eda0..fd88c23 100644
--- a/src/sage/homology/simplicial_complex.py
+++ b/src/sage/homology/simplicial_complex.py
@@ -1037,7 +1037,10 @@ class SimplicialComplex(Parent, GenericCellComplex):
             vertex_set = range(n + 1)

         if sort_facets is True:
-            vertex_set = sorted(vertex_set)
+            try:
+                vertex_set = sorted(vertex_set)
+            except TypeError:
+                vertex_set = sorted(vertex_set, key=str)
         elif callable(sort_facets):
             vertex_set = sorted(vertex_set, key=sort_facets)
         elif not sort_facets:

because it's rather arbitrary and ill-defined.

On the other hand, it's the old behavior, so it's safe. We could throw in a warning if the except clause ever kicks in, to try to discourage it.

jdemeyer commented 5 years ago
comment:9

Replying to @jhpalmieri:

Replying to @jdemeyer:

Do you know why unsortable lists of vertices are so common in simplicial complexes? Is there a single place where these mixed int/str vertices come from?

I don't think we should impose restrictions on what users might choose to do.

I didn't say that we should. I'm just saying that, if the user does that, they have to explicitly specify the sorting key. Otherwise, it's too fragile (for example, code will behave differently on Python 2 and Python 3).

jhpalmieri commented 5 years ago
comment:10

I feel like the sort key should mainly be internal, and users should not have to worry about it. Sorting the vertices needs to be done consistently for each simplicial complex, but maybe it's not important if, as the simplicial complex changes, the sorting changes. The ticket description at #26931 sounds compelling, but in retrospect, I'm not convinced. I could easily imagine a user doing this:

sage: T = SimplicialComplex([range(1,5)])
sage: T.add_face([0,1,'*']) # add a new distinguished vertex
sage: T.homology()
...
TypeError: '<' not supported between instances of 'str' and 'int'
jdemeyer commented 5 years ago
comment:11

Replying to @jhpalmieri:

I feel like the sort key should mainly be internal, and users should not have to worry about it.

Let me ask an even more basic question: why do we need to sort anything at all? If it's not to hard to drop that, we should go for it. Especially if it's meant to be internal as you say, there shouldn't be a fundamental reason why vertices need to be sorted.

jhpalmieri commented 5 years ago
comment:12

In order to compute homology, each simplex needs to be sorted consistently with the other simplices, and the easiest way to achieve that is a total ordering on the vertices. I don't know a minimal example, but if you turn off sorting, then the homology of simplicial_complexes.ComplexProjectivePlane() is wrong. That is:

S = SimplicialComplex([[1, 2, 4, 5, 6], [2, 3, 5, 6, 4], [3, 1, 6, 4, 5],
         [1, 2, 4, 5, 9], [2, 3, 5, 6, 7], [3, 1, 6, 4, 8],
         [2, 3, 6, 4, 9], [3, 1, 4, 5, 7], [1, 2, 5, 6, 8],
         [3, 1, 5, 6, 9], [1, 2, 6, 4, 7], [2, 3, 4, 5, 8],
         [4, 5, 7, 8, 9], [5, 6, 8, 9, 7], [6, 4, 9, 7, 8],
         [4, 5, 7, 8, 3], [5, 6, 8, 9, 1], [6, 4, 9, 7, 2],
         [5, 6, 9, 7, 3], [6, 4, 7, 8, 1], [4, 5, 8, 9, 2],
         [6, 4, 8, 9, 3], [4, 5, 9, 7, 1], [5, 6, 7, 8, 2],
         [7, 8, 1, 2, 3], [8, 9, 2, 3, 1], [9, 7, 3, 1, 2],
         [7, 8, 1, 2, 6], [8, 9, 2, 3, 4], [9, 7, 3, 1, 5],
         [8, 9, 3, 1, 6], [9, 7, 1, 2, 4], [7, 8, 2, 3, 5],
         [9, 7, 2, 3, 6], [7, 8, 3, 1, 4], [8, 9, 1, 2, 5]],
        sort_facets=False)
S.homology()

produces the wrong answer: {0: 0, 1: 0, 2: C2, 3: 0, 4: 0} instead of {0: 0, 1: 0, 2: Z, 3: 0, 4: Z}.

jdemeyer commented 5 years ago
comment:13

If the only requirement is a predictable ordering, you could have a dict mapping arbitrary vertices to integers and use that to sort.

jdemeyer commented 5 years ago
comment:14

For example, the already-existing attribute _vertex_set could be used for that: we could implement a set-like class which internally uses a dict to store a vertex: index mapping.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

99eec7btrac 26966: simplicial complexes: do not publicly sort vertices any more.
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Changed commit from b41b33c to 99eec7b

jhpalmieri commented 5 years ago
comment:16

Okay, here is an attempt at not sorting vertices (publicly) any more. A few Python 3 doctests have random outputs, since if vertices cannot be sorted, they aren't, and the doctests give different outputs depending on the order of the vertices. I've marked one as # py3 # random and I've made another Python 2 only.

jhpalmieri commented 5 years ago

Description changed:

--- 
+++ 
@@ -13,3 +13,9 @@
 sage -t src/sage/homology/delta_complex.py  # 3 doctests failed
 sage -t src/sage/homology/simplicial_complexes_catalog.py  # 3 doctests failed

+After this branch: + + +sage -t src/sage/homology/simplicial_complex.py # 10 doctests failed +sage -t src/sage/homology/examples.py # 1 doctest failed +

jhpalmieri commented 5 years ago

Author: John Palmieri

jdemeyer commented 5 years ago
comment:17
  1. What's the use case for trying to sort anyway? That way, you make doctests different on Python 2 and Python 3 for no good reason.

  2. Use dict comprehension instead of dict((x,y) ....) for example here:

        vertex_to_index = dict((vertex, i) for i, vertex
                               in enumerate(vertices))
  1. self._vertex_to_index seems redundant with self._vertex_set. It seems that self._vertex_to_index could replace self._vertex_set.
jhpalmieri commented 5 years ago
comment:18

Replying to @jdemeyer:

  1. What's the use case for trying to sort anyway? That way, you make doctests different on Python 2 and Python 3 for no good reason.

I tried without any sorting, but it didn't work well. Various methods rely on sorting: for example, when you take the product of simplicial complexes, if you want to triangulate the product of two edges, there are two choices, and which choice depends on how the vertices are ordered. Another example: when you compute the fundamental group, the presentation of the group depends on how the vertices and edges are sorted. So if nothing is sorted, lots of doctests break and/or the code needs special cases. Why not order if it's easy? Looking to the future when we switch to Python 3, it makes a lot of sense to sort the vertices if the vertices are just integers, and that's what I have in mind in the sorting code.

  1. Use dict comprehension instead of dict((x,y) ....) for example here:
        vertex_to_index = dict((vertex, i) for i, vertex
                               in enumerate(vertices))

I can do that. Is this just a stylistic choice, or is it better to use dict comprehension for other reasons?

  1. self._vertex_to_index seems redundant with self._vertex_set. It seems that self._vertex_to_index could replace self._vertex_set.

That shouldn't be hard to do.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

6bb3e28trac 26966: Remove vertex_set. Use dict comprehension.
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Changed commit from 99eec7b to 6bb3e28

jdemeyer commented 5 years ago
comment:21

Replying to @jhpalmieri:

Various methods rely on sorting

Well, that's a problem. If things rely on sorting, then allowing the sort to fail is bad.

I much prefer either always sorting or never sorting to this "compromise" of trying to sort.

jhpalmieri commented 5 years ago
comment:22

Replying to @jdemeyer:

Replying to @jhpalmieri:

Various methods rely on sorting

Well, that's a problem. If things rely on sorting, then allowing the sort to fail is bad.

"Bad" is a bit strong. The answers can vary if the order varies (as will happen if the vertices are unsorted), but the answers will still be mathematically correct.

I much prefer either always sorting or never sorting to this "compromise" of trying to sort.

How much work should be put into this? You don't like it when I add a fallback to sort using str, so what do you suggest? And note that the compromise works pretty well. In practice, most simplicial complexes will have sortable vertices (either integers or tuples of integers). With the current branch, there are only two doctests with unpredictable results in Python 3: the one in simplicial_set.py now marked "random", and this:

            sage: G = (S1.wedge(S1)).flip_graph()
            sage: G.vertices(); G.edges(labels=False) # py2
            [(0, 'L1'), (0, 'L2'), (0, 'R1'), (0, 'R2'), ('L1', 'L2'), ('R1', 'R2')]
            [((0, 'L1'), (0, 'L2')),
             ((0, 'L1'), (0, 'R1')),
             ((0, 'L1'), (0, 'R2')),
             ((0, 'L1'), ('L1', 'L2')),
             ((0, 'L2'), (0, 'R1')),
             ((0, 'L2'), (0, 'R2')),
             ((0, 'L2'), ('L1', 'L2')),
             ((0, 'R1'), (0, 'R2')),
             ((0, 'R1'), ('R1', 'R2')),
             ((0, 'R2'), ('R1', 'R2'))]

With Python 3, the vertices may be ordered randomly, and the same with the edges. Not a big deal, I think.

jdemeyer commented 5 years ago
comment:23

Replying to @jhpalmieri:

the answers will still be mathematically correct.

OK, in that case I misunderstood what you said in [comment:18].

I would suggest then to not sort and just replace the doctest outputs with the different-but-equally-correct outputs.

jhpalmieri commented 5 years ago
comment:24

I was wrong about one thing: when you are dealing with the product of a simplicial complex K with itself, you have to sort the vertices in the product consistently with the sorting of the vertices in K, or else the diagonal map may not be defined properly. So we need some sort of sorting. I can see two easy options:

jdemeyer commented 5 years ago
comment:25

Replying to @jhpalmieri:

  • always sort using key=str. Consistent, but '10' comes before '9', which is a little annoying to me.

Not guaranteed to be consistent either, since str() does not have to be an injective function:

sage: L1 = [1.0 + 2^-52, 1.0]; L2 = reversed(L1)
sage: sorted(L1, key=str) == sorted(L2, key=str)
False
jhpalmieri commented 5 years ago
comment:26

Any suggestions, then? I suppose we could delete all of the simplicial complex code.

jhpalmieri commented 5 years ago
comment:27

I could do the sort of test you gave: compare sorted(vertices, key=str) with sorted(reversed(vertices), key=str) (or using whatever key is appropriate). If this is False, print a warning when constructing the product of a complex with itself.

I don't know of any other case in which the sorting makes a difference. For homology, for example, the facets are sorted once in the __init__ method, and that sorting is all that is necessary.

jhpalmieri commented 5 years ago
comment:28

And to confirm, if I create a simplicial complex with vertices with nonunique string representations and sort always using key=str, the diagonal map can break. With my own branch which sorts exclusively by str:

sage: K = SimplicialComplex([[1.0 + 2^-52, 1.0]])
sage: L = K.product(K)
sage: d = Hom(K,L).diagonal_morphism()
sage: d.associated_chain_complex_morphism()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
...
ValueError: matrices must define a chain complex morphism

Edit: this case is actually worse because in the product L, the vertices are named using string representations, so there is only one vertex, not four, as there should be. (Each vertex is named 'L1.00000000000000R1.00000000000000', and since they all have the same name, they are viewed as the same vertex.) You can avoid this as follows:

sage: K = SimplicialComplex([[1.0 + 2^-52, 1.0]])
sage: L = K.product(K, rename_vertices=False)
sage: d = Hom(K,L).diagonal_morphism(rename_vertices=False)
sage: d.associated_chain_complex_morphism()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
...
ValueError: matrices must define a chain complex morphism

Now the names of the vertices cause bad sorting, and so the purported map d does not induce a chain map as it should. This is the same problem that arises with other complexes if we don't sort at all.

jdemeyer commented 5 years ago
comment:29

Replying to @jhpalmieri:

Any suggestions, then? I suppose we could delete all of the simplicial complex code.

As I suggested several times: just don't sort at all. That's how we have been fixing other sorting-related bugs (for example in incidence structures, graphs, ...).

dimpase commented 5 years ago
comment:30

Jeroen - this might need a novel definition for homology to work smoothly...

I'd say: always sort. This "unsortable" insanity of py3 has already taken its toll on the progress of porting to py3.

jdemeyer commented 5 years ago
comment:31

Replying to @dimpase:

Jeroen - this might need a novel definition for homology to work smoothly...

I think you are confusing two kinds of sorting introduced by this ticket.

One is the sorting using vertex_to_index which is perfectly fine. It allows consistent ordering of vertices which is indeed required for homology computations.

The sorting that I object to is this one:

        try:
            # If vertices can be sorted, sort them.
            vertices = tuple(sorted(vertices))
        except TypeError:
            pass

This shouldn't be needed for anything. It also adds (rather than solves) problems with porting to Python 3 since this code behaves differently on Python 2 and Python 3.

jhpalmieri commented 5 years ago
comment:32

The vertex sorting is needed but just for one thing: for the diagonal map X -> X x X to be defined properly. If you sort the vertices in the product incompatibly with the sorting in the original simplicial complex, you can end up with a square triangulated the wrong way, so the diagonal map is not a map of simplicial complexes. So you really do need to sort the vertices.

As Jeroen says in his last comment, the vertex_to_index dictionary is sufficient sorting for homology.

jhpalmieri commented 5 years ago
comment:33

Or maybe there is a way to use the vertex_to_index sorting when defining the product, although that take some work to implement. I'll have to think about that.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Changed commit from 6bb3e28 to f4a672c

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

29ade7etrac 26966: simplicial complexes: do not publicly sort vertices any more.
b325350trac 26966: Remove vertex_set. Use dict comprehension.
e79fd39trac 26966: always sort vertices using key=str
f4a672ctrac 26966: do not sort vertices. Allow the user to specify sort_facets,
jhpalmieri commented 5 years ago
comment:35

The last commit undoes a large part of the previous one, but oh well. This now does not sort vertices at all. It uses sort_facets as an optional argument to allow users to specify the dictionary which converts vertices to integers. This optional argument is only used when taking the product of a simplicial complex with itself. A few doctests had to be changed. In some cases, I knew what they were testing and could provide suitable replacements. In one (for flip_graph), it wasn't clear what the point was, so I replaced with something that I think captures the right spirit.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

41eeaf5typo
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Changed commit from f4a672c to 41eeaf5

jhpalmieri commented 5 years ago
comment:37

I made a few other changes in here. For example, the change

@@ -1692,10 +1708,7 @@ class SimplicialComplex(Parent, GenericCellComplex):
         # construct a graph with one vertex for each facet, one edge
         # when two facets intersect in a (d-1)-simplex, and see
         # whether that graph is connected.
-        V = [f.set() for f in self.facets()]
-        E = (lambda a, b: len(a.intersection(b)) == d)
-        g = Graph([V, E])
-        return g.is_connected()
+        return self.flip_graph().is_connected()

     def product(self, right, rename_vertices=True, is_mutable=True):
         """

is an attempt to make the documentation for flip_graph correct: it says The flip graph is used to detect if ``self`` is a pseudomanifold.

jhpalmieri commented 5 years ago
comment:38

By the way, a few doctests were changed from a known output to a random output or to ellipses. In these cases, the answer will depend on, for example, how the vertices are sorted in

vertex_to_index = {v:i for i,v in enumerate(vertices)}

which is more or less random. It can certainly depend on whether you are using Python 2 vs. Python 3, and at least with Python 3, it will be random.

tscrim commented 5 years ago
comment:39

Some minor comments:

Instead of lambda x: vertex_to_index[x], you can use vertex_to_index.__getitem__ (which I believe is faster).

Also, this try block is if the max is empty, right:

        try:
            idx = max(vertex_to_index.values()) + 1
        except ValueError:
            idx = 0

so you should be able to do this:

        if vertex_to_index:
            idx = max(vertex_to_index.values()) + 1
        else:
            idx = 0

I would pull out the self._translation_to_numeric() call here so it is not called on every key call:

simplex = Simplex(sorted(face, key=lambda x: self._translation_to_numeric()[x]))

I think the latter syntax is easier to read (IIRC, it is also a little faster too):

-dict((g, i) for i, g in enumerate(gens))
+{g: i for i, g in enumerate(gens)}

You might also be able to simply do dict(enumerate(gens)) too.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

95f6026trac 26966: minor code cleanup
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Changed commit from 41eeaf5 to 95f6026

jhpalmieri commented 5 years ago
comment:41

Replying to @tscrim:

Some minor comments:

[snip]

I think the latter syntax is easier to read (IIRC, it is also a little faster too):

-dict((g, i) for i, g in enumerate(gens))
+{g: i for i, g in enumerate(gens)}

You might also be able to simply do dict(enumerate(gens)) too.

I agree with everything except the very last sentence: dict(enumerate(gens)) would be equivalent to {i:g for i,g in enumerate(gens)}, but I want g:i, not i:g.

tscrim commented 5 years ago
comment:42

Replying to @jhpalmieri:

Replying to @tscrim:

I think the latter syntax is easier to read (IIRC, it is also a little faster too):

-dict((g, i) for i, g in enumerate(gens))
+{g: i for i, g in enumerate(gens)}

You might also be able to simply do dict(enumerate(gens)) too.

I agree with everything except the very last sentence: dict(enumerate(gens)) would be equivalent to {i:g for i,g in enumerate(gens)}, but I want g:i, not i:g.

Ah, right. I will try to finish the review in a day or two.

embray commented 5 years ago
comment:43

Retarging tickets optimistically to the next milestone. If you are responsible for this ticket (either its reporter or owner) and don't believe you are likely to complete this ticket before the next release (8.7) please retarget this ticket's milestone to sage-pending or sage-wishlist.