python / cpython

The Python programming language

https://www.python.org

Other

62.61k stars 30.04k forks source link

`dict` and `dict.update` treat the first argument as a mapping when it has attribute `keys` without attribute `getitem` #116938

Open Prometheus3375 opened 6 months ago

Prometheus3375 commented 6 months ago

Bug report

How to reproduce

Make such a class:

class Object:
   def __iter__(self, /):
       return iter(())

   def keys(self, /):
       return ['1', '2']

Call dict(Object()).
Call d = {} and then d.update(Object()).

Expected result

At the step 2 an empty dictionary is returned. At the step 3 d stays empty.

Actual result

Both steps 2 and 3 raise a TypeError 'Object' object is not subscriptable.

CPython versions tested on:

3.10 3.11

Operating systems tested on:

Windows 21H2 (19044.1645) Ubuntu 20.04.6 LTS

Docs of dict state:

If a positional argument is given and it is a mapping object, a dictionary is created with the same key-value pairs as the mapping object. Otherwise, the positional argument must be an iterable object.

Unfortunately, there is no link to what is considered as a mapping object. In typeshed both dict and dict.update accept SupportsKeysAndGetItem, i.e., any object with attributes keys and __getitem__.

But the experiment above shows that only keys is enough. While typeshed is a bit too restrictive in the case for iterable (only iterables of 2-sized tuples are allowed, but dict accepts any iterable of 2-sized iterables), I think just checking for keys is not enough.

In the actual C code there is such comment:

https://github.com/python/cpython/blob/2982bdb936f76518b29cf7de356eb5fafd22d112/Include/dictobject.h#L39-L43

Thus, it is intended to check the presence of two attributes.

The error is here:

https://github.com/python/cpython/blob/2982bdb936f76518b29cf7de356eb5fafd22d112/Objects/dictobject.c#L3426-L3441

This code evaluates whether attribute keys is present. If the answer is true, calls PyDict_Merge, and calls PyDict_MergeFromSeq2 otherwise.

terryjreedy commented 6 months ago

Object is not a mapping, so the exception looks correct, not a bug.

But the experiment above shows that only keys is enough. It shows that keys is not enough. For each key in keys, mapping[key] is accessed to get the value, and that failed.

there is no link to what is considered as a mapping object This glossary entry points one here

I guess one is expected to know to look and find if one does not know what a mapping is. The first occurrence of mapping in the text could/should be linked to the glossary entry just as the iterable entry is. I'll make a PR tomorrow if no one beats me or persuades me to not do so.

Prometheus3375 commented 6 months ago

Object is not a mapping, so the exception looks correct, not a bug.

Object is not a mapping, but it is still an iterable, so I expect dict and dict.update to use other method of initialization/update.

It shows that keys is not enough. For each key in keys, mapping[key] is accessed to get the value, and that failed.

It shows that the presence of keys is enough to consider an object being mapping and then use __getitem__ to get key-value pairs even if this attribute is not present. But the comment in C states that both attributes must be present. The same typeshed does with SupportsKeysAndGetItem protocol. The glossary states that mapping is an object which implements collections.abc.Mapping. It is clear, that class Object in my report is not a mapping, yet dict and dict.update consider otherwise.

terryjreedy commented 6 months ago

Since a mapping is iterable, dict_update_arg(arg), whose code you quoted above, must first check whether arg seems to be a mapping. If arg has .keys, it assumes that it is meant to be a mapping and unconditionally calls and returns the merge function. So keys is enough for arg to be tried out as a mapping, but not enough for it to be successfully used as a mapping.

I believe you are claiming that either dict_update_arg should also check for __getitem__ before calling the merge function or the doc should be changed somehow. I am not going to decide which.

Class MyListWithKeys, which you edited out from the post above while I was writing this, is both a list and a mapping, and would pass a __getitem__ check. I presume you deleted after realizing this. However, it is a buggy mapping in that its keys and __getitem__ are dis-coordinated. Since it also raises TypeError, it illustrates why merely checking for a TypeError is insufficient for deciding to fallback to a sequence merge.

Prometheus3375 commented 6 months ago

Yes, I realized that class MyListWithKeys defines both keys and __getitem__ and in such case dict works as intended.

It can be fixed replacing list with set.

class MySetWithKeys(set):
    __slots__ = ()

    def keys(self, /):
        return list(map(repr, self))

my_set = MySetWithKeys([(1, 2), (3, 4)])

Raises TypeError upon calling dict(my_set): 'MySetWithKeys' object is not subscriptable.

A real world example of an iterable with keys and without __getitem__ is in Neo4j driver for Python. There is a class neo4j.Result which is an iterator with method keys. The method returns names of variables used in the result of the query.

For example, query returns an iterable of records of size 2 where the first index is occupied by ID and the second by the list of connected IDs:

query = (
    'unwind $source_ids as node_id '
    'match (:Node {nodeId: node_id})--(other:Node:Entity) '
    'return node_id, collect(other.nodeId) as ids'
    )

result: neo4j.Result = self.session.run(query, source_ids=list(source_ids))
return dict(list(result))

Initially I used dict(result) as neo4j.Result is not a mapping, but then I got the error and added list to avoid it.

pochmann3 commented 6 months ago

added list to avoid it

Just in case you don't like using a list: dict(iter(result)) perhaps also works, and dict(r for r in result) and dict(itertools.chain(result)) certainly do.

rhettinger commented 6 months ago

I think this can be closed. The OP's description is the actual intended and tested behavior.

Here's the spec from collections.abc.MutableMapping.update():

def update(self, other=(), /, **kwds):
    ''' D.update([E, ]**F) -> None.  Update D from mapping/iterable E and F.
        If E present and has a .keys() method, does:     for k in E: D[k] = E[k]
        If E present and lacks .keys() method, does:     for (k, v) in E: D[k] = v
        In either case, this is followed by: for k, v in F.items(): D[k] = v
    '''
    if isinstance(other, Mapping):
        for key in other:
            self[key] = other[key]
    elif hasattr(other, "keys"):
        for key in other.keys():
            self[key] = other[key]
    else:
        for key, value in other:
            self[key] = value
    for key, value in kwds.items():
        self[key] = value

Note that the docstring for dict.update has the same specification:

>>> help(dict.update)
Help on method_descriptor:

update(...)
    D.update([E, ]**F) -> None.  Update D from dict/iterable E and F.
    If E is present and has a .keys() method, then does:  for k in E: D[k] = E[k]
    If E is present and lacks a .keys() method, then does:  for k, v in E: D[k] = v
    In either case, this is followed by: for k in F:  D[k] = F[k]

The produces exactly the behavior the OP observed. If non-mapping has keys(), the update method with loop over those and attempt to use __getitem__ to fetch a value. If __getitem__ is not found, a TypeError is raised for a non-subscriptable argument.

So, it looks to me like the behavior the OP observed is exactly what was promised. This spec have been around a very long time and almost certainly there is code that relies on the behavior.

Since neo4j.Result isn't working with the protocol as designed, it should probably just document the iter(result) or list(result) workaround.

Prometheus3375 commented 6 months ago

Alright, that evidence is persuasive. At this point I think the docs should be synchronized and fixed.

If E present and has a .keys() method, does: for k in E: D[k] = E[k]

There, for k in E: D[k] = E[k] should be replaced with for k in E.keys(): D[k] = E[k] as this is what actually happens. Phrase has a .keys() method should be changed to has a .keys attribute.

A similar note should be provided for dict constructor too, either in web documentation or in help message. Current docs say mapping which by glossary terms should implement collections.abc.Mapping but actually having keys is enough to proceed with for k in E.keys(): D[k] = E[k].

Nethertheless, I find a bit awkward the fact that the behavior depends on a single non-dunder method. keys may not be always a method after all. From now on I will be careful with any custom iterable/iterator to avoid usage of any member named keys unless the object is indeed a mapping.

For the code below mypy does not provide any warnings to dict(Object()). The revealed type is dict[str, int] - the same as I expected before opening this issue.

from collections.abc import Iterator, Iterable
from typing import reveal_type

class Object:
    def __iter__(self, /) -> Iterator[tuple[str, int]]: ...
    def keys(self, /) -> Iterable[str]: ...

d = dict(Object())
reveal_type(d)

Maybe adding next overload to dict and its update method would emit a warning. But I am afraid that this would cause any valid input to be also marked.

    @overload
    def __init__(self, iterable: SupportKeys, /) -> Never: ...

python / cpython

`dict` and `dict.update` treat the first argument as a mapping when it has attribute `keys` without attribute `__getitem__` #116938

Bug report

How to reproduce

Expected result

Actual result

CPython versions tested on:

Operating systems tested on:

`dict` and `dict.update` treat the first argument as a mapping when it has attribute `keys` without attribute `getitem` #116938