python / cpython

The Python programming language
https://www.python.org
Other
63.19k stars 30.26k forks source link

collections.abc.Mapping should include a __reversed__ that raises TypeError #70051

Closed 4037176b-12fa-49a8-88d4-2c9d85093e95 closed 2 years ago

4037176b-12fa-49a8-88d4-2c9d85093e95 commented 8 years ago
BPO 25864
Nosy @gvanrossum, @rhettinger, @terryjreedy, @ncoghlan, @bitdancer, @vadmium, @serhiy-storchaka, @Vgr255, @AlexWaygood
Files
  • cpython-iter-patch.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-bug', 'library'] title = 'collections.abc.Mapping should include a __reversed__ that raises TypeError' updated_at = user = 'https://bugs.python.org/abarnert' ``` bugs.python.org fields: ```python activity = actor = 'rhettinger' assignee = 'none' closed = True closed_date = closer = 'rhettinger' components = ['Library (Lib)'] creation = creator = 'abarnert' dependencies = [] files = ['41440'] hgrepos = [] issue_num = 25864 keywords = ['patch'] message_count = 34.0 messages = ['256434', '256435', '256436', '256437', '256438', '256439', '256442', '256572', '256576', '256578', '256579', '256643', '256645', '256690', '256712', '256910', '257045', '257046', '257049', '257053', '257057', '257062', '257063', '257064', '257128', '257276', '257308', '257314', '257483', '257484', '257485', '257494', '257500', '408571'] nosy_count = 11.0 nosy_names = ['gvanrossum', 'rhettinger', 'terry.reedy', 'ncoghlan', 'r.david.murray', 'martin.panter', 'serhiy.storchaka', 'abarnert', 'abarry', 'curioswati', 'AlexWaygood'] pr_nums = [] priority = 'low' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue25864' versions = ['Python 3.6'] ```

    4037176b-12fa-49a8-88d4-2c9d85093e95 commented 8 years ago

    Example:

        class MyDict(collections.abc.Mapping):
            def __init__(self, d): self.d = d
            def __len__(self): return len(self.d)
            def __getitem__(self, key): return self.d[key]
            def __iter__(self): return iter(self.d)
    
        d = {1:2, 3:4}
        m = MyDict({1: 2, 3: 4})

    If you do reversed(d), you get a nice TypeError: argument to reversed() must be a sequence. But if you do reversed(m), you get a reversed iterator. And when you iterate it, presumably expecting to get 0 and 1 in some arbitrary order, you instead get 3, and then a KeyError: 0.

    Of course it's obvious why this happens once you think about it: in order to handle types that implement the old-style sequence protocol (just respond to __getitem__ for all integers from 0 to len(self)), reversed has to fall back to trying __getitem__ for all integers from len(d)-1 to 0.

    If you provide a __reversed__ method, it just calls that. Or, if you're a C-API mapping like dict, PySequence_Check will return false and it'll raise a TypeError. But for a Python mapping, there's no way PySequence_Check or anything else can know that you're not actually a sequence (after all, you implement __getitem__ and __len__), so it tries to use you as one, and confusion results.

    I think trying to fix this for _all_ possible mappings is a non-starter.

    But fixing it for mappings that use collections.abc.Mapping is easy: just provide a default implementation of collections.abc.Mapping.__reversed__ that just raises a TypeError.

    I can't imagine this would break any working code. If it did, the workaround would be simple: just implement def __reversed__(self): return (self[k] for k in reversed(range(len(self)))).

    d76e1216-2604-4ac9-b5b9-f7bdc745761c commented 8 years ago

    Can it be reproduced in default branch? I tried but got: AttributeError: module 'collections' has no attribute 'abc'

    cd293b3e-6d38-412b-8370-a46a9aaee518 commented 8 years ago

    You need to do 'import collections.abc' as abc is a submodule of collections, and is not imported from a bare 'import collections'.

    d76e1216-2604-4ac9-b5b9-f7bdc745761c commented 8 years ago

    If you do reversed(d), you get a nice TypeError: argument to reversed() must be a sequence. But if you do reversed(m), you get a reversediterator. And when you iterate it, presumably expecting to get 0 and 1 in some arbitrary order, you instead get 3, and then a KeyError:0.

    I got 2 instead of 3.

    What are we exactly expecting here? How can a dictionary be reversed?

    I can't imagine this would break any working code. If it did, the workaround would be simple: just implement def __reversed__(self): return (self[k] for k in reversed(range(len(self)))).

    This seems to make no difference. I still got the KeyError.

    4037176b-12fa-49a8-88d4-2c9d85093e95 commented 8 years ago

    What are we exactly expecting here?

    Well, naively, I was expecting a TypeError, just as you get for dict, or a subclass of dict, or a custom extension type that implements the C-API mapping protocol.

    Once you understand how reversed works, you can understand why it gives you a nonsensical and useless iterator instead. But nobody would actually _want_ that.

    So, as I proposed in the initial description, and the title, what we should be doing is raising a TypeError.

    How can a dictionary be reversed?

    Assuming this is a pointless rhetorical question: That's exactly the point. There's no sensible meaning to reversing a dictionary, so it should raise a TypeError. Exactly as it already does for dict, subclasses of dict, and C-API mappings.

    If this wasn't rhetorical: I guess you could argue that any arbitrary order in reverse is any arbitrary order, so even returning iter(m) would be acceptable. Or maybe reversed(list(m)) would be even better, if it didn't require O(N) space. But in practice, nobody will ever expect that--again, they don't get it from dict, subclasses, C-API mappings--so why go out of our way to implement it? So, again, it should raise a TypeError.

    This seems to make no difference. I still got the KeyError.

    Of course. Again, the current behavior is nonsensical, will almost always raise a KeyError at some point, and will never be anything a reasonable person wants. So a workaround that restores the current behavior will also be nonsensical, almost always raise a KeyError at some point, and never be anything a reasonable person wants.

    But, if you happen to be unreasonably unreasonable--say, you created a mapping with {2:40, 0:10, 1:20} and actually wanted reversed(m) to confusingly give you 40, 20, 10--and this change did break your code, the workaround would restore it.

    rhettinger commented 8 years ago

    I've seen no evidence that this a problem in practice. It seems no more interesting or problematic than sequence argument unpacking working with dictionaries, "a, b = {'one': 1, 'two': 2}".

    4037176b-12fa-49a8-88d4-2c9d85093e95 commented 8 years ago

    It seems no more interesting or problematic than sequence argument unpacking working with dictionaries, "a, b = {'one': 1, 'two': 2}".

    Dictionaries (including dicts, dict subclasses, and custom Mappings) are iterables. People use that fact every time they write for key in d. So it's not at all problematic that they work with iterable unpacking. Especially since here, custom Mappings work exactly the same way as dicts, dict subclasses, custom Sets, iterators, and every other kind of iterable.

    Dictionaries are not sequences. People never write code expecting them to be. So it is problematic that they work with sequence reversing. Especially since here, custom Mappings do _not_ work the same way as dicts, dict subclasses, custom Sets, iterators, and other non-sequences, all of which raise a TypeError.

    d76e1216-2604-4ac9-b5b9-f7bdc745761c commented 8 years ago

    But the work around suggested here as:

    def __reversed__(self):
        return (self[k] for k in reversed(range(len(self))))

    is also not a general solution, i.e. it is applicable for the following case: m = MyDict({2:40, 0:10, 1:20})

    but for any other mapping which does not have 0 as a key, it results in KeyError. So another solution, which would be more general could be:

    def __reversed__(self):
        keys = [k for k in self.keys()]
        return (self[k] for k in reversed(keys))
    bitdancer commented 8 years ago

    No, the workaround was for duplicating the existing behavior if the fix (raising an error like a normal dict does) broken someone's code. The *only* possibility here is to have a __reversed__ that raises a TypeError.

    What is it that makes reversed raise a typeerror on dict here? Not that we can change it at this point, but reversed blindly using len and __getitem for user classes but not on dict is rather inconsistent. I suppose the dict TypeError special case catches common mistakes? In which case adding a __reversed that raises a TypeError to Mapping seems to make sense for the same reason.

    4037176b-12fa-49a8-88d4-2c9d85093e95 commented 8 years ago

    @R. David Murray:

    What is it that makes reversed raise a typeerror on dict here?

    There are separate slots for tp_as_sequence and tp_as_mapping, so a C type can be (and generally is) one or the other, not both.

    But for Python types, anything that has __getitem is both a sequence and a mapping at the C level. (It's one of the few minor inconsistencies between C and Python types left, like having separate slots for nb_add and sqconcat in C but only \_add for both in Python.)

    Not that we can change it at this point, but reversed blindly using len and __getitem__ for user classes but not on dict is rather inconsistent.

    But it's consistent with iter blindly using __len and __getitem if __iter__ is not present on Python classes but not doing that on C classes. That's how "old-style sequences" work, and I don't think we want to get rid of those (and, even if we did, I'm pretty sure that would require at least a thread on -dev or -ideas...).

    I suppose the dict TypeError special case catches common mistakes?

    Yes. That's probably not why it was implemented (it's easier for a C type to _not_ fake being a broken sequence than to do so), but it has that positive effect.

    In which case adding a __reversed__ that raises a TypeError to Mapping seems to make sense for the same reason.

    Exactly.

    I think Raymond's point is that, while it does make sense, it may still not be important enough to be worth even two lines of code. Hopefully we can get more than two opinions (his and mine) on that question; otherwise, at least as far as I'm concerned, he trumps me.

    4037176b-12fa-49a8-88d4-2c9d85093e95 commented 8 years ago

    @Swati Jaiswal:

    But the work around suggested here ... is also not a general solution, i.e. ... for any other mapping which does not have 0 as a key, it results in KeyError.

    You're missing the point. The workaround isn't intended to be a general solution to making mappings reversible. It's intended to produce the exact same behavior as the current design, for any code that somehow depends on that. So, any mapping that happens to be reversible by luck is reversible with the workaround; any mapping that successfully produces odd nonsense produces the same odd nonsense; any mapping that raises a KeyError(0) will raise the same KeyError(0).

    In the incredibly vast majority of cases (probably 100%) you will not want that workaround; you will want the new behavior that raises a TypeError instead. I don't think the workaround needs to be mentioned in the documentation or anything; I just produced it to prove that, on the incredibly unlikely chance that the change is a problem for someone, the workaround to restore the old behavior is trivial.

    Meanwhile, your general solution takes linear space, and linear up-front work, which makes it unacceptable for a general __reversed__ implementation. When you actually want, you can do it manually and explicitly in a one-liner, as already explained in the docs.

    If you're still not getting this, pretend I never mentioned the workaround. It really doesn't matter.

    d76e1216-2604-4ac9-b5b9-f7bdc745761c commented 8 years ago

    Okay, so should I go for a patch for it? And sorry if it sounds naive, but do we provide the work around or the user would implement if they purposely want it. If we provide it, then where should it be written?

    d76e1216-2604-4ac9-b5b9-f7bdc745761c commented 8 years ago

    @Andrew Barnert, sorry, I didn't get your previous messages so please ignore the last message i sent. I got your point i.e. We just need to provide the TypeError in the Mapping. And the work around is never implemented. Should I go for the patch with it?

    bitdancer commented 8 years ago

    You can write a patch if you like, but we haven't come to a consensus on actually doing anything. I'm leaning toward Andrew's position, but not strongly, so we need some more core dev opinions.

    terryjreedy commented 8 years ago

    Unless this somehow breaks the philosophy of ABCs, I would be inclined to add the negative methods.

    serhiy-storchaka commented 8 years ago

    I agree that by default calling reversed() on mapping should raise a TypeError. But for now issubclass(collections.abc.Mapping, typing.Reversible) returns False. If add default __reversed__ implementation this test will return True. We have to find other way to make Mapping true non-reversible in all meanings.

    Perhaps there is a bug in typing.Reversible. It doesn't accept all types supported by reversed().

    >>> class Counter(int):
    ...   def __getitem__(s, i): return i
    ...   def __len__(s): return s
    ... 
    >>> list(reversed(Counter(10)))
    [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
    >>> issubclass(Counter, typing.Reversible)
    False

    And accepts types that don't work with reversed().

    >>> class L(list):
    ...    __reversed__ = None
    ... 
    >>> reversed(L())
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'NoneType' object is not callable
    >>> issubclass(L, typing.Reversible)
    True
    4037176b-12fa-49a8-88d4-2c9d85093e95 commented 8 years ago

    Perhaps there is a bug in typing.Reversible. It doesn't accept all types supported by reversed().

    ... And accepts types that don't work with reversed().

    The problem is the way the two are defined:

    That explains why it doesn't work on tuple, bytearray, etc. Iterable actually has the exact same problem, but, because it's a supertype of Sequence, and we have explicit Sequence.register(tuple) and MutableSequence.register(bytearray) in collections.abc, and typing.Iterable specifies collections.abc.Iterable as its "extra", it all works out.

    We could do the same for Reversible: add a collections.abc.Reversible, make it a subtype of Iterable and make Sequence a subtype of Reversible instead of Iterable, and make that the extra for typing.Reversible. Then it would work for all of those builtin types (and many third-party types that explicitly register with Sequence), just as Iterable does.

    But that only solves the problem in one direction. To solve it in the other direction, we'd need some way to either explicitly mark a method as not implemented (maybe setting it to None, or to any non-callable, or any data descriptor?) that ABC subclass hooks and/or typing checks are expected to understand, or unregister a class with an ABC so that it isn't a subtype even if it passes the implicit hooks.

    Or... could we just drop Reversible as an implicit protocol? The lack of an explicit "deny" mechanism for implicit protocols and ABCs is a more general problem, but if this is the only actual instance of that problem in real life, do we need to solve the general problem? If not, there's no obvious way to define typing.Reversible that isn't wrong, it doesn't have a corresponding ABC, it doesn't seem like it will be useful often enough to be worth the problems it causes, and I doubt there's much real-life code out there already depending on it, so that seems a lot easier.

    bitdancer commented 8 years ago

    So this issue now has two problems being discussed in it. Someone should start a new issue for the typing.Reversible problem.

    4037176b-12fa-49a8-88d4-2c9d85093e95 commented 8 years ago

    Serhiy already filed the typing.Reversible bug on the separate typehinting tracker (https://github.com/ambv/typehinting/issues/170). So, unless fixing that bug requires some changes back to collections.abc or something else in the stdlib, I think the only issue here is the original one, on Mapping.

    4037176b-12fa-49a8-88d4-2c9d85093e95 commented 8 years ago

    Also, I filed bpo-25958 as an ABC equivalent to Serhiy's typehinting problem. I don't know if that actually needs to be solved, but that definitely takes it out of the way for this issue.

    4037176b-12fa-49a8-88d4-2c9d85093e95 commented 8 years ago

    As mentioned in bpo-25958, Guido pointed out on -ideas that __hash__ = None is already the standard way to declare a class unhashable, and it's recognized by collections.abc.Hashable.

    Doing __reversed__ = None does make reversed(m) raise a TypeError (although with a description saying "'NoneType' is not callable", which isn't quite as nice a description, but probably good enough).

    So, I think Mapping should set __reversed__ = None, rather than setting it to a method that raises TypeError. (If we need something more general, that's for bpo-25958 and/or Serhiy's typechecking bug.)

    gvanrossum commented 8 years ago

    This sounds good. Also, reversed() could then be modified to produce a better error. (The "unhashable" error comes from the hash() builtin, so that's also a precedent.)

    On Sat, Dec 26, 2015 at 4:32 PM, Andrew Barnert \report@bugs.python.org\ wrote:

    Andrew Barnert added the comment:

    As mentioned in bpo-25958, Guido pointed out on -ideas that __hash__ = None is already the standard way to declare a class unhashable, and it's recognized by collections.abc.Hashable.

    Doing __reversed__ = None does make reversed(m) raise a TypeError (although with a description saying "'NoneType' is not callable", which isn't quite as nice a description, but probably good enough).

    So, I think Mapping should set __reversed__ = None, rather than setting it to a method that raises TypeError. (If we need something more general, that's for bpo-25958 and/or Serhiy's typechecking bug.)

    ----------


    Python tracker \report@bugs.python.org\ \http://bugs.python.org/issue25864\


    4037176b-12fa-49a8-88d4-2c9d85093e95 commented 8 years ago

    This sounds good. Also, reversed() could then be modified to produce a better error.

    Should iter also be modified to produce a better error if __iter__ is None?

    Also, should this be documented? Maybe a sentence in the "Special method names" section of the "Data model" chapter, like this:

    To indicate that some syntax is not supported, set the corresponding special method name to None. For example, if __iter is None, the class is not iterable, so iter() will not look for __getitem.

    gvanrossum commented 8 years ago

    All sounds fine.

    4037176b-12fa-49a8-88d4-2c9d85093e95 commented 8 years ago

    The attached patch does the following:

    vadmium commented 8 years ago

    If this patch goes ahead, I think the ABC documentation should clarify which methods are checked for None and which aren’t. The datamodel.rst file will suggest None for any method, but ABC will only support it for Iterable and Hashable (I think).

    Also, what is the point of the odd __getitem__() method in test_enumerate.py? Maybe you should use assertRaisesRegex() to check that the intended TypeError is actually raised.

    4037176b-12fa-49a8-88d4-2c9d85093e95 commented 8 years ago

    If this patch goes ahead, I think the ABC documentation should clarify which methods are checked for None and which aren’t.

    That seems fair.

    Also, as you pointed out on bpo-25958, at least one other ABC has the same problem as Iterable: you can block the "in" operator by setting __contains__=None, but you'll still be a Container. So, do we want to go through all of the existing ABCs and make sure they all do this negative check, instead of just Iterable?

    Also, what is the point of the odd __getitem__() method in test_enumerate.py? Maybe you should use assertRaisesRegex() to check that the intended TypeError is actually raised.

    If an implementation doesn't raise a TypeError there, that's a failure. If it raises one with a different (possibly less helpful) message, I think that's just a quality-of-implementation issue, isn't it?

    vadmium commented 8 years ago

    IMO allowing any special method to be set to None seems to make more trouble than it is worth. Are there practical problems to address, or are they all theoretical?

    Ideally I think it would be better to require __reversed__() for reverse() to work, but such a change would break compatibility.

    Regarding test_enumerate.py, your class looks like this:

    class Blocked(object):
        def __getitem__(self): return 1
        def __len__(self): return 2
        __reversed__ = None

    The signature of __getitem__() is wrong, and causes a TypeError during iteration, although your particular test does not go that far. When I see someone using assertRaises() with a common exception like TypeError, I instinctively suggest checking the message to avoid these kind of test case bugs.

    I suggest either remove __getitem__() if it serves no purpose, or change it to something like this if you really want an unreversible sequence:

    def __getitem__(self, index):
        return (1, 1)[index]
    gvanrossum commented 8 years ago

    I think I tried to address all questions in bpo-25958.

    4037176b-12fa-49a8-88d4-2c9d85093e95 commented 8 years ago

    IMO allowing any special method to be set to None seems to make more trouble than it is worth.

    That's exactly why allowing _any special method to be None is a separate issue (bpo-25958). Most special methods don't have any corresponding problem to the one with \_reversed__.

    Ideally I think it would be better to require __reversed__() for reverse() to work, but such a change would break compatibility.

    See the -ideas thread "Deprecating the old-style sequence protocol" (http://article.gmane.org/gmane.comp.python.ideas/37588).

    Regarding test_enumerate.py, your class looks like this:

    Please look at the two classes directly above it in the same function. The new Blocked exactly parallels the existing NoLen.

    I suggest either remove __getitem__() if it serves no purpose

    It very definitely serves a purpose. The whole point of the new test is that reversed will not fall back to using __getitem and __len if __reversed is None. So __getitem has to be there; otherwise, we already know (from the NoGetItem test) that it wouldn't get called anyway.

    This is exactly the same as the NoLen test, which verifies that __reversed will not fall back to __getitem and __len__ if one is present but not both.

    , or change it to something like this if you really want an unreversible sequence:

    Sure, if I wanted a real class that could be used as a sequence but could not be reversed. But all I want here is a toy class for testing the specific method lookup behavior. Again, exactly like the existing classes in the same test.

    Finally, from your previous comment:

    I think the ABC documentation should clarify which methods are checked for None and which aren’t.

    Looking at this a bit more: The ABC documentation doesn't even tell you that, e.g., Container and Hashable have subclass hooks that automatically make any class with __contains and __hash act like registered subclasses while, say, Sequence and Set don't. So, you're suggesting that we should explain where the hooks in some of those types differ, when we haven't even mentioned where the hooks exist at all. Maybe collections.abc _should_ have more detail in the docs, but I don't think that should be part of this bug. (Practically, I've always found the link to the source at the top sufficient--trying to work out exactly why tuple meets some ABC and some custom third-party sequence doesn't, which is a pretty rare case to begin with, is also pretty easy to deal with: you scan the source, quickly find that Sequence.register(tuple), read up on what it does, and realize that collections.abc.Sequence.register(joeschmoe.JoeSchmoeSequence) is what you want, and you're done.)

    gvanrossum commented 8 years ago

    Agreed that improving the docs doesn't belong in this bug, but in general if the docs aren't clear enough and only a visit to the source helps you understand, something's wrong. Because the source may do things one way today and be changed to do things differently tomorrow, all within the (intended) promises of the API. But without docs we don't know what those promises are.

    vadmium commented 8 years ago

    I’m sorry I only read your patch and did not see the NoLen class above Blocked. (I blame the lack of Reitveld review link.) I still think __getitem__() should have a valid signature, but I acknowledge that it’s not really your fault. :)

    My main concern about the documentation was that in your patch you say _all special methods are now allowed to be None, but in your code you only check \_iter(). This is adding new undocumented inconsistencies e.g. with Iterable vs Container. Maybe it would be better to only say that setting __iter to None is supported, instead of setting any special method.

    4037176b-12fa-49a8-88d4-2c9d85093e95 commented 8 years ago

    My main concern about the documentation was that in your patch you say _all special methods are now allowed to be None, but in your code you only check \_iter__(). This is adding new undocumented inconsistencies e.g. with Iterable vs Container.

    No it isn't. We already have undocumented inconsistencies between, e.g., Hashable vs. Container; we're just moving Iterable from one set to the other. (Notice that there's an even larger inconsistency between, e.g., Hashable and Container vs. Sequence and Set. As Guido suggests, that probably _does_ need to be fixed, but not as part of this bug, or bpo-25958.)

    And, as for the docs, it's already true that you can block fallback and inheritance of special methods by assigning them to None, but that isn't documented anywhere.

    So, this bug doesn't add fix any of those inconsistencies, but it doesn't add any new ones, either. If you think we actually _need to fix them, see bpo-25958 as at least a starting point. (Notice that Guido seems to want that one fixed, so, assuming I can write a decent patch for it, this bug would then become a one-liner: "\_reversed__ = None" inside Mapping.)

    AlexWaygood commented 2 years ago

    The proposed patch appears to have been implemented in https://github.com/python/cpython/commit/97c1adf3935234da716d3289b85f72dcd67e90c2, and there has been no discussion for five years. I think this can now be closed.