python / cpython

The Python programming language
https://www.python.org
Other
63.19k stars 30.26k forks source link

docs about containers membership testing wrong for broken objects #68175

Closed ethanfurman closed 5 years ago

ethanfurman commented 9 years ago
BPO 23987
Nosy @birkenfeld, @rhettinger, @ezio-melotti, @merwok, @bitdancer, @peternowee

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at = created_at = labels = ['docs'] title = 'docs about containers membership testing wrong for broken objects' updated_at = user = 'https://github.com/ethanfurman' ``` bugs.python.org fields: ```python activity = actor = 'peternowee' assignee = 'rhettinger' closed = True closed_date = closer = 'rhettinger' components = ['Documentation'] creation = creator = 'ethan.furman' dependencies = [] files = [] hgrepos = [] issue_num = 23987 keywords = [] message_count = 9.0 messages = ['241320', '241326', '241337', '241339', '241341', '241661', '241667', '241694', '350211'] nosy_count = 8.0 nosy_names = ['georg.brandl', 'rhettinger', 'ezio.melotti', 'eric.araujo', 'r.david.murray', 'docs@python', 'jonrsharpe', 'peternowee'] pr_nums = [] priority = 'normal' resolution = 'out of date' stage = 'resolved' status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue23987' versions = ['Python 3.4', 'Python 3.5'] ```

ethanfurman commented 9 years ago

https://docs.python.org/3/reference/expressions.html#comparisons: ---------------------------------------------------------------- The operators 'in' and 'not in' test for membership. 'x in s' evaluates to true if x is a member of s, and false otherwise. 'x not in s' returns the negation of 'x in s'. All built-in sequences and set types support this as well as dictionary, for which 'in' tests whether the dictionary has a given key. For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression 'x in y' is equivalent to 'any(x is e or x == e for e in y)'.

StackOverflow question for context: http://stackoverflow.com/q/29692140/208880

Summary: if a user creates a broken object such that __hash__ returns a random number with every invocation, then that object will get lost in a dict or set; but the above statement about 'equivalent to' claims that such an object will still be found.

On the other hand, https://docs.python.org/3/glossary.html#term-hashable says that a constant return value is required for an object to be hashable (of course, Python can't tell if future calls to __hash__ will return the same value).

Perhaps a link to the #term-hashable would be appropriate?

540487fa-fdcb-4a5f-8b1a-6ebae085e1c7 commented 9 years ago

I don't think it's as simple as linking to the hashable definition. The "equivalent expression" is simply wrong for dict/set/frozenset, as those types check hash equality, not identity.

ethanfurman commented 9 years ago

It's not quite that simple -- those containers use the hash to find the objects that will be first checked by identity, and then equality* -- otherwise they would have to do a complete scan from first key to last, and that would kill performance.

...

Okay, I see your point -- even for sane objects, a systematic check of every key is not undertaken for the hash-type containers.

Perhaps something like:

For container types such as list, tuple, or collections.deque, the expression 'x in y' is equivalent to 'any(x is e or x == e for e in y)'. For container types such as set, frozenset, and dict, 'x in y' is equivalent to 'any(x is e or x == e for e in z)' where 'z' is a collection of objects in 'y' that have the same hash.

bitdancer commented 9 years ago

Could we add "For dictionaries and sets this equivalence expression is modified by the addition of 'if hash(x) == hash(e)'?

ethanfurman commented 9 years ago

I think I like that better than my suggestion. :)

ethanfurman commented 9 years ago

So something like:

For container types such as list, tuple, or collections.deque, the expression 'x in y' is equivalent to 'any(x is e or x == e for e in y)'. For container types such as set, frozenset, and dict, this equivalence expression is modified by the addition of 'if hash(x) == hash(e)'.

?

bitdancer commented 9 years ago

Yes, that's what I had in mind.

rhettinger commented 9 years ago

There is a separate report for taking care of the identity check for contains: https://bugs.python.org/issue23986

I think notes about crazy hashes shouldn't spill all over our docs. At best, it warrants a FAQ entry about how hash tables work.

The risk here is that in an effort to be more precise, it is easy impair the usability of the docs. The wording in question has been around for a very long time and has overall done a good job of explaining the intent of the in-operator to all but the most pedantic reader, "The operators 'in' and 'not in' test for membership. 'x in s' evaluates to true if x is a member of s, and false otherwise."

If you really want to be precise, the *only* thing that can be broadly stated about the in-operator is that it calls __contains__ on an object that that object can implement whatever logic it wants (hash table lookup, linear search, google search, random result, etc). But then, this is no different than most magic methods in that regard.

rhettinger commented 5 years ago

It looks like the identity-implies-equality part of this has already been taken care of. The __hash invariant also appears to now have extensive coverage, some in the glossary and much more in the data model section of the reference. Cute puzzles like a __hash returning a random value can be left for stackoverflow.