python / cpython

The Python programming language
https://www.python.org
Other
63.04k stars 30.19k forks source link

Improve error message when using `in` for object that is not container #95144

Closed Mariatta closed 3 months ago

Mariatta commented 2 years ago

Feature or enhancement

When using in to test containment, for example if "a" in b:, if b does not support the in operator, then it would raise an error message: TypeError: Argument of type b is not iterable.

To the reader/debugger of this code, the error message seems to suggest that you need to pass in an iterable object (an object that implements __iter__), but in reality, you can also pass in a container object (an object that implements __contains__).

It would be great if the error message can be improved and be more accurate and helpful.

Pitch

The in keyword can be used in different ways:

for loop

for item in [1,2,3]:

You can do for loop when you gave an iterable/sequences, like lists, dicts, strings, objects that implements __iter__

testing for containment

if "a" in "abcd":

To make a class into a container, you just need to implement the __contains__ method.

class Blabla:
    def __contains__(self, item):
        return False

b = Blabla()
>>> "a" in b
False

If somehow we made a bad refactoring on this container class and removed/renamed the __contains__, and tried to use the same existing code "a" in b, it would raise an error saying that b is not iterable.

If the object was not an iterable to begin with, this error message is confusing to the person debugging this, and they would not realize that this is due to the missing __contains__ method.

I think it would be great if the error message when testing containment if a in b can be different than the error message when doing for loop for a in b. Providing more accurate error message will be helpful to the user.

Example message: TypeError: Argument of type '%.200s' is not a container

(and that this is only raised when doing if a in b)

I tried to look into the CPython code, and it seems like the error message is coming from this line: https://github.com/python/cpython/blob/8a808952a61b4bd572d95f5efbff1680c59aa507/Objects/abstract.c#L2187 Which was introduced in https://github.com/python/cpython/pull/20537

Regarding the term container, it is used in this doc: https://docs.python.org/3/library/collections.abc.html#collections.abc.Container

Previous discussion

I don't know if you'd count Twitter thread as previous discussions, but here are some links: Start of thread

Supporting message that the error message can be improved: here and here

Comment about Python's terminology of container and iterable

Comment about Python oddity

Linked PRs

rhettinger commented 2 years ago

the error message seems to suggest that you need to pass in an iterable object

This is in fact true:

def f():
    yield 10
    yield 20
    yield 30

>>> 20 in f()
True

To be most helpful, the error message could mirror the actual logic. Perhaps something like this:

TypeError: Argument of type b does not support the "in" operator. It must define either __contains__ or __iter__.
brandtbucher commented 2 years ago
...or __getitem__.

🙃

rhettinger commented 2 years ago

Right. __getitem__ is also used:

class A:
    def __getitem__(self, i):
        if i >= 10:
            raise IndexError
        return i ** 2

>>> 64 in A()
True
>>> 55 in A()
False

Interestingly, it not detected by the Container ABC.

>>> from collections.abc import Container
>>> isinstance(A(), Container)
False

Note, the error message isn't produced directly by PySequence_Contains() in Objects/abstract.c. Instead, this is a downstream message returned by the step that attempts to iterate.

Changing the message would entail catching the exception, verifying the reason for it (we can get TypeError for other reasons and don't want to eat those exceptions), and then raising a new exception with the new message. Here's the current code:

/* Return -1 if error; 1 if ob in seq; 0 if ob not in seq.
 * Use sq_contains if possible, else defer to _PySequence_IterSearch().
 */
int
PySequence_Contains(PyObject *seq, PyObject *ob)
{
    PySequenceMethods *sqm = Py_TYPE(seq)->tp_as_sequence;
    if (sqm != NULL && sqm->sq_contains != NULL) {
        int res = (*sqm->sq_contains)(seq, ob);
        assert(_Py_CheckSlotResult(seq, "__contains__", res >= 0));
        return res;
    }
    Py_ssize_t result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);
    return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);
}

In other words, this is more involved than a simple string edit.

zware commented 4 months ago

I found the patch in GH-119888 laying around in one of my worktrees, and it seemed simple enough to me to go ahead and post it.

zware commented 3 months ago

Done in 3.14 (GH-119888); probably not one to backport (but I'm happy to be overruled on that :))