sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.44k stars 480 forks source link

Element richcmp: never use id() #22029

Closed mezzarobba closed 5 years ago

mezzarobba commented 7 years ago

As discussed in the sage-devel thread starting with <o21nte$6jp$1@blaine.gmane.org> (https://groups.google.com/d/msg/sage-devel/YVFdxPH6avk/4OZUmzLHBgAJ), coercion_model.richcmp() should not fall back on comparing by type/id.

This branch implements comparisons for Element the same way as in Python 3: a TypeError is raised for uncomparable objects (instead of comparing by id).

Dependencies: #22297, #22344, #22346, #22369, #22370, #22371, #22372, #22373, #22376, #22382, #24968, #26917, #26931, #26933, #26934, #26935, #26936, #26937, #26938, #26947, #27003, #27009, #27010, #27026, #27027, #27029, #27123, #27241

Depends on #27241

CC: @fchapoton

Component: coercion

Keywords: richcmp

Author: Marc Mezzarobba, Jeroen Demeyer

Branch/Commit: e311ab1

Reviewer: Julian Rüth, Marc Mezzarobba

Issue created by migration from https://trac.sagemath.org/ticket/22029

jdemeyer commented 7 years ago
comment:1

Do you plan to work on this? If yes, fill in your name as reviewer. If no, let me know.

jdemeyer commented 7 years ago

Description changed:

--- 
+++ 
@@ -1 +1 @@
-As discussed in the sage-devel thread starting with `<o21nte$6jp$1@blaine.gmane.org>`, `Element.__richcmp__()` should not fall back on comparing by type/id.
+As discussed in the sage-devel thread starting with `<o21nte$6jp$1@blaine.gmane.org>`, `coercion_model.richcmp()` should not fall back on comparing by type/id.
jdemeyer commented 7 years ago

Description changed:

--- 
+++ 
@@ -1 +1 @@
-As discussed in the sage-devel thread starting with `<o21nte$6jp$1@blaine.gmane.org>`, `coercion_model.richcmp()` should not fall back on comparing by type/id.
+`coercion_model.richcmp()` should not fall back on comparing by type/id.
mezzarobba commented 7 years ago
comment:4

Replying to @jdemeyer:

Do you plan to work on this? If yes, fill in your name as reviewer.

as author? owner?

If no, let me know.

I've started doing experiments, but I'll most likely need help from people who understand the parts of Sage that rely on the current behavior. At the very least I'll try to push a first draft in the next few days.

mezzarobba commented 7 years ago
comment:5

My main concern right now is that sage.graphs heavily relies on sorting lists of vertices, while various other parts of Sage create graphs with vertices that mix Elements with other Python objects.

An option might be to try again with key=id when sort() fails, but (with no global understanding of the code, at least) there's a major risk of getting inconsistent orders with different subsets of vertices.

Another idea would be to do introduce utilities like

# universal_cmp.pyx
cpdef lt(x, y):
    try:
        return x < y
    except TypeError:
        if type(x) is type(y):
            return id(x) < id(y)
        else:
            return type(x) < type(y)
...

cdef class key(object):
    def __init__(self):
        self.value = value
    def __lt__(self, other):
        return lt(self.value, other.value)
    ...

and then use universal_cmp.lt() and sort(key=universal_cmp.key), but this won't work either when one of the types involved only has a partial order, since this order will not be consistent with id.

Yet another variant would be to pass a comparison function (or, equivalently but probably better in view of the transition to python3, a key as above) that wraps cmp() and catches TypeErrors. Not really ideal either, but at least it wouldn't introduce any regression(?), so perhaps that would be enough for this ticket...

Any advice?

jdemeyer commented 7 years ago
comment:6

Replying to @mezzarobba:

Replying to @jdemeyer:

Do you plan to work on this? If yes, fill in your name as reviewer.

as author? owner?

Sorry, I meant as author.

mezzarobba commented 7 years ago
comment:7

By the way, why did you remove the reference to the sage-devel thread?

mezzarobba commented 7 years ago

Author: Marc Mezzarobba

jdemeyer commented 7 years ago
comment:8

Replying to @mezzarobba:

By the way, why did you remove the reference to the sage-devel thread?

Because I didn't understand how to interpret it. A hyperlink would be better.

jdemeyer commented 7 years ago
comment:9

Replying to @mezzarobba:

My main concern right now is that sage.graphs heavily relies on sorting lists of vertices

Why does it rely on sorting vertices?

mezzarobba commented 7 years ago
comment:10

Replying to @jdemeyer:

Replying to @mezzarobba:

By the way, why did you remove the reference to the sage-devel thread?

Because I didn't understand how to interpret it. A hyperlink would be better.

I personally prefer Message-Ids, as they don't depend on a particular archive...

mezzarobba commented 7 years ago
comment:11

Replying to @jdemeyer:

Replying to @mezzarobba:

My main concern right now is that sage.graphs heavily relies on sorting lists of vertices

Why does it rely on sorting vertices?

In sparse_graph and generic_graph, mostly. Some of the occurrences look like they are there for algorithmic purposes, others to make the output more readable, some both at the same time... I didn't look closely enough to say more.

mezzarobba commented 7 years ago

Description changed:

--- 
+++ 
@@ -1 +1,2 @@
+As discussed in the sage-devel thread starting with `<o21nte$6jp$1@blaine.gmane.org>` (https://groups.google.com/d/msg/sage-devel/YVFdxPH6avk/4OZUmzLHBgAJ),
 `coercion_model.richcmp()` should not fall back on comparing by type/id.
jdemeyer commented 7 years ago
comment:13

Replying to @mezzarobba:

In sparse_graph and generic_graph, mostly. Some of the occurrences look like they are there for algorithmic purposes, others to make the output more readable, some both at the same time... I didn't look closely enough to say more.

It's also important to note the difference between old-style cmp() and new-style rich comparisons. Ideally, this branch would only affect the latter.

mezzarobba commented 7 years ago
comment:14

Replying to @jdemeyer:

It's also important to note the difference between old-style cmp() and new-style rich comparisons. Ideally, this branch would only affect the latter.

I'm not sure I follow you: cmp() can end up calling Element.__richcmp__(), and that's perhaps the main reason why getting rid of the comparison by id there is not trivial.

nbruin commented 7 years ago
comment:15

Replying to @mezzarobba:

My main concern right now is that sage.graphs heavily relies on sorting lists of vertices, while various other parts of Sage create graphs with vertices that mix Elements with other Python objects.

An option might be to try again with key=id when sort() fails, but (with no global understanding of the code, at least) there's a major risk of getting inconsistent orders with different subsets of vertices.

If you have no information on the objects you're sorting, I don't think there is anything that can save you. If your objects are guaranteed to be hashable you could try sorting on the hash, but that will leave your results undefined when hashes clash. In any case, if you rely on "first try what the object implements and then use a fallback if that fails", you're doomed.

In fact, if you allow your vertices to be labelled with arbitrary objects, you're already losing transitivity of equality in sage, via the standard

sage: a=GF(3)(1); b =31; c=GF(5)(1)
sage: a==b,b==c,a==c
(True, True, False)

It seems that labels get mangled anyway, though, so I think them not being sorted properly is only a minor concern by comparison (sorry):

sage: a=GF(3)(1); b =31; c=GF(5)(1)
sage: L=[[a,b,c],[a,c,b],[b,a,c],[b,c,a],[c,a,b],[c,b,a]]
sage: for l in L:
....:     G=Graph()
....:     for i in l:
....:         G.add_vertex(i)
....:     print [parent(j) for j in G.vertices()]
....:     
[<type 'int'>, Integer Ring]
[<type 'int'>, Integer Ring]
[<type 'int'>, Integer Ring]
[<type 'int'>, Integer Ring]
[<type 'int'>, Integer Ring]
[<type 'int'>, Integer Ring]

It seems to me the only sane approaches are to not rely on sorting (in python it seems being hashable is more ubiquitous than being sortable, so fast lookup is likely better accomplished via hash functions), or to declare that the labels are sortable (making failure a user error).

This stuff causes real problems: since the old python interface for __cmp__ methods required one to provide non-errors for "<" and ">" testing if one wants to implement "==", sage is riddled with apparent implementations of orderings that are in actuality inconsistent. This has cause problems in at least one case, where labels in published tables of elliptic curves were based on non-sensical but assumed-consistent ordering of number field elements:

https://groups.google.com/forum/#!topic/sage-nt/aP5LP09sADY

This has convinced me that the only sane approach is to deprecate "cmp" style interfaces as quickly as possible and instead rely on "richcmp", with errors for ordering if not obviously correct.

mezzarobba commented 7 years ago

Branch: public/ticket/22029-richcmp_fallback

mezzarobba commented 7 years ago
comment:16

Here is a first attempt. All (standard, long) tests should pass unless I made a last-minute mistake. See the commit messages for some comments. I'm not 100% happy with the changes, but I don't think I can do much better in a reasonable amount of time, feel free to commit improvements to the existing branch!


Last 10 new commits:

d45cd53CGraphBackend: fix a segfault when given a non-existing vertex
6e0e1b2Fix use of comparison in simplicial_complex
2925d0fBug in Poset.is_slender(?)
2b0bcc9Some benign doctest failures related to #22029
5f36421Ad-hoc comparison functions for graph vertices
6d728fa{c,sparse}_graph: systematically turn integer-like vertices into ints
fcda2c2graph_plot: sort by id
127a11dremove or fix some meaningless doctests for cmp()
29fe1f8misc fixes after changes to coercion_model.richcmp
749dfbemisc fixes after changes to coercion_model.richcmp
mezzarobba commented 7 years ago

Commit: 749dfbe

tscrim commented 7 years ago
comment:17

I am a strong -1 on the removal of the != tests and the changes of foo != bar to (the fugly) not (foo == bar).

The failures you have in rigged_configurations.py is because you moved the definition of case_S. You have also made the cases in the code more difficult to follow. Please revert those changes.

jdemeyer commented 7 years ago
comment:18

This is bad:

        # (Note that, when y is not a Sage Element, we may have ended up here
        # from a call to cmp(). In this case, and in this case only, it may be
        # better to emulate what Python does and compare by type/id. But we
        # have no way(?) to distinguish this situation from the "normal" case
        # of a comparison operator.)
mezzarobba commented 7 years ago
comment:19

Replying to @tscrim:

I am a strong -1 on the removal of the != tests and the changes of foo != bar to (the fugly) not (foo == bar).

Ugly, I agree, but safer in generic code (because of cases where equality is not decidable, like expressions, or where not enough information is available, like with intervals)... And code that might end up comparing objects of different type is generic code.

The failures you have in rigged_configurations.py is because you moved the definition of case_S. You have also made the cases in the code more difficult to follow. Please revert those changes.

I might do it, but I'd like first to understand in detail the alternative you are suggesting, and to see some evidence of a consensus that it is a better option.

In any case, please feel free to commit improvements yourself, and I'll do my best to review them.

mezzarobba commented 7 years ago
comment:20

Replying to @jdemeyer:

This is bad:

        # (Note that, when y is not a Sage Element, we may have ended up here
        # from a call to cmp(). In this case, and in this case only, it may be
        # better to emulate what Python does and compare by type/id. But we
        # have no way(?) to distinguish this situation from the "normal" case
        # of a comparison operator.)

What do you mean? If I understand right, this is an issue that will solve itself if sage ever switches to Python3...

jdemeyer commented 7 years ago
comment:21

Replying to @mezzarobba:

If I understand right, this is an issue that will solve itself if sage ever switches to Python3...

Given that we currently still have Python 2, that's irrelevant.

mezzarobba commented 7 years ago
comment:22

Replying to @jdemeyer:

Replying to @mezzarobba:

If I understand right, this is an issue that will solve itself if sage ever switches to Python3...

Given that we currently still have Python 2, that's irrelevant.

Sorry, I fear I still don't understand your comment. The problem with cmp() was the main thing I asked about in the sage-devel post that led to this ticket, and you basically replied that it was okay from your point of view to remove the fallback code from richcmp. Now it looks to me like you are saying the contrary.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 7 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

fb9beb5misc fixes after changes to coercion_model.richcmp
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 7 years ago

Changed commit from 749dfbe to fb9beb5

mezzarobba commented 7 years ago
comment:24

Replying to @tscrim:

The failures you have in rigged_configurations.py is because you moved the definition of case_S. You have also made the cases in the code more difficult to follow.

Fixed, sorry for the stupid mistake. That's all I did for now. It wouldn't be hard to revert the other changes related to !=, but I'm waiting for answers to the follow-up question I just asked on sage-devel.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 7 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

6e69645Elt richcmp: return NotImplemented for non-Elements...
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 7 years ago

Changed commit from fb9beb5 to 6e69645

mezzarobba commented 7 years ago
comment:26

Another take, now returning NotImplemented more often. Due to Jeroen's doubts, I left out the idea of treating SageObjects in a special way. I do however handle types such as int and float like Element, since this is what the coercion framework usually does.

This should make it possible to revert some of my other changes. I haven't done it yet.

jdemeyer commented 7 years ago
comment:27

Replying to @mezzarobba:

I do however handle types such as int and float like Element since this is what the coercion framework usually does.

Yes, the coercion framework tries to coerce those types to a Sage Element. However, it doesn't do anything special in the case that such a coercion fails. So I don't see why you should treat the failed-coercion-with-int-or-float case differently. Just return NotImplemented in that case.

mezzarobba commented 7 years ago
comment:28

Replying to @jdemeyer:

Yes, the coercion framework tries to coerce those types to a Sage Element. However, it doesn't do anything special in the case that such a coercion fails.

But the net effect is that arithmetic that mixes Elements and objects of these types will succeed or fail (with an error coming from the coercion framework) depending whether there is a common parent or not.

So I don't see why you should treat the failed-coercion-with-int-or-float case differently. Just return NotImplemented in that case.

Do you really want things like

sage: GF(2)(1) > float(0.)
True
sage: RBF(0) < float(-1.)
True
sage: SymmetricGroup(2)(()) != int(1)
True

even though it takes exactly one LOC to prevent them and analogous code with Sage Elements raises an error (so that, in particular, the behavior will be different for preparsed and pure Python code)?

jdemeyer commented 7 years ago
comment:29

Replying to @mezzarobba:

Do you really want things like

sage: GF(2)(1) > float(0.)
True
sage: RBF(0) < float(-1.)
True
sage: SymmetricGroup(2)(()) != int(1)
True

In Python 2? Yes, I want that.

In Python 3? No, I do not want that.

jdemeyer commented 7 years ago
comment:30

In other words: just follow the common Python convention which specifies that a rich comparison (not __cmp__) should return NotImplemented when it cannot decide a comparison.

The fact that we directly raise a TypeError for Sage Elements is merely a shortcut to get the correct behaviour and to get a nicer error message referring to the parents.

mezzarobba commented 7 years ago
comment:31

Replying to @jdemeyer:

In other words: just follow the common Python convention which specifies that a rich comparison (not __cmp__) should return NotImplemented when it cannot decide a comparison.

I tend to see the situation we are talking about not as one where it can't decide, but as one where it can decide that the comparison doesn't make sense. And not to find “just follow the Python[2] convention” such a strong argument when Sage already breaks it in much worse ways...

jdemeyer commented 7 years ago
comment:32

Replying to @mezzarobba:

And not to find “just follow the Python[2] convention” such a strong argument when Sage already breaks it in much worse ways...

How is it relevant whether or not Sage breaks certain Python conventions in much worse ways?

If Python specifies how to do arithmetic/comparisons, why not follow that?

mezzarobba commented 7 years ago
comment:33

Replying to @jdemeyer:

How is it relevant whether or not Sage breaks certain Python conventions in much worse ways?

When you start deviating from the general conventions, “your” objects already don't interact with the rest of the system as other developers will expect, and your job IMHO is to minimize the damage.

Now, there is no doubt that you have orders of magnitude more experience of Sage development than I'll likely ever do, so I guess I'll follow your advice even without understanding it if that's really the only way. But I'd really prefer (in general: I'm bringing this up because it is not the first time I have trouble understanding your comments) if you could explain why you think a certain technical choice is better instead of just stating that it is.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 7 years ago

Changed commit from 6e69645 to a5a1e2b

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 7 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. Last 10 new commits:

af1a94aFix use of comparison in simplicial_complex
744942cBug in Poset.is_slender(?)
ea1910eSome benign doctest failures related to #22029
c87d7e3Ad-hoc comparison functions for graph vertices
107ad68{c,sparse}_graph: systematically turn integer-like vertices into ints
fd03cd8graph_plot: sort by id
69cc074remove or fix some meaningless doctests for cmp()
b33f2ebElt richcmp: return NotImplemented for non-Elements...
87ca195misc fixes after recent changes to coercion_model.richcmp()
a5a1e2bJeroen doesn't like this
mezzarobba commented 7 years ago
comment:35

New version of the complete patch series. Tested without the last commit, not fully with it.

I probably won't have much time to work on this in the coming 6-8 weeks, please feel free to take over if you have improvements to make.

mezzarobba commented 7 years ago
comment:36

The “known bug” is now #22071.

mezzarobba commented 7 years ago
comment:37

Replying to @tscrim:

I am a strong -1 on the removal of the != tests and the changes of foo != bar to (the fugly) not (foo == bar).

Does the new version (that returns NotImplemented in more cases, so ill-typed comparisons by != will typically succeed when the other member is not an Element) address your concerns?

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 7 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 7 years ago

Changed commit from a5a1e2b to 87ca195

mezzarobba commented 7 years ago
comment:40

Replying to @mezzarobba:

New version of the complete patch series. Tested without the last commit, not fully with it.

Turns out there where test failures with that commit. Since it is still not clear to me that it is really what we want, I undid it for now.

fchapoton commented 7 years ago

Changed keywords from none to cmp

saraedum commented 7 years ago

Reviewer: Julian Rüth

saraedum commented 7 years ago
comment:43

I also have trouble seeing why what jdemeyer proposes would be better. For me the proposed changes look fine.

mezzarobba commented 7 years ago
comment:44

It is plausible that the ad-hoc comparison functions for graph vertices can be simplified (or even removed) thanks to the fact that the general comparison function now returns NotImplemented for non-Elements. I haven't had time to check yet.

Also, there are apparently test failures with sage-2.6...

I'll try to work on these issues in 1-2 weeks unless someone beats me to it.