Closed ethanfurman closed 5 years ago
https://docs.python.org/3/reference/expressions.html#comparisons: ---------------------------------------------------------------- The operators 'in' and 'not in' test for membership. 'x in s' evaluates to true if x is a member of s, and false otherwise. 'x not in s' returns the negation of 'x in s'. All built-in sequences and set types support this as well as dictionary, for which 'in' tests whether the dictionary has a given key. For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression 'x in y' is equivalent to 'any(x is e or x == e for e in y)'.
StackOverflow question for context: http://stackoverflow.com/q/29692140/208880
Summary: if a user creates a broken object such that __hash__ returns a random number with every invocation, then that object will get lost in a dict or set; but the above statement about 'equivalent to' claims that such an object will still be found.
On the other hand, https://docs.python.org/3/glossary.html#term-hashable says that a constant return value is required for an object to be hashable (of course, Python can't tell if future calls to __hash__ will return the same value).
Perhaps a link to the #term-hashable would be appropriate?
I don't think it's as simple as linking to the hashable definition. The "equivalent expression" is simply wrong for dict/set/frozenset, as those types check hash equality, not identity.
It's not quite that simple -- those containers use the hash to find the objects that will be first checked by identity, and then equality* -- otherwise they would have to do a complete scan from first key to last, and that would kill performance.
...
Okay, I see your point -- even for sane objects, a systematic check of every key is not undertaken for the hash-type containers.
Perhaps something like:
For container types such as list, tuple, or collections.deque, the expression 'x in y' is equivalent to 'any(x is e or x == e for e in y)'. For container types such as set, frozenset, and dict, 'x in y' is equivalent to 'any(x is e or x == e for e in z)' where 'z' is a collection of objects in 'y' that have the same hash.
Could we add "For dictionaries and sets this equivalence expression is modified by the addition of 'if hash(x) == hash(e)'?
I think I like that better than my suggestion. :)
So something like:
For container types such as list, tuple, or collections.deque, the expression 'x in y' is equivalent to 'any(x is e or x == e for e in y)'. For container types such as set, frozenset, and dict, this equivalence expression is modified by the addition of 'if hash(x) == hash(e)'.
?
Yes, that's what I had in mind.
There is a separate report for taking care of the identity check for contains: https://bugs.python.org/issue23986
I think notes about crazy hashes shouldn't spill all over our docs. At best, it warrants a FAQ entry about how hash tables work.
The risk here is that in an effort to be more precise, it is easy impair the usability of the docs. The wording in question has been around for a very long time and has overall done a good job of explaining the intent of the in-operator to all but the most pedantic reader, "The operators 'in' and 'not in' test for membership. 'x in s' evaluates to true if x is a member of s, and false otherwise."
If you really want to be precise, the *only* thing that can be broadly stated about the in-operator is that it calls __contains__ on an object that that object can implement whatever logic it wants (hash table lookup, linear search, google search, random result, etc). But then, this is no different than most magic methods in that regard.
It looks like the identity-implies-equality part of this has already been taken care of. The __hash invariant also appears to now have extensive coverage, some in the glossary and much more in the data model section of the reference. Cute puzzles like a __hash returning a random value can be left for stackoverflow.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at =
created_at =
labels = ['docs']
title = 'docs about containers membership testing wrong for broken objects'
updated_at =
user = 'https://github.com/ethanfurman'
```
bugs.python.org fields:
```python
activity =
actor = 'peternowee'
assignee = 'rhettinger'
closed = True
closed_date =
closer = 'rhettinger'
components = ['Documentation']
creation =
creator = 'ethan.furman'
dependencies = []
files = []
hgrepos = []
issue_num = 23987
keywords = []
message_count = 9.0
messages = ['241320', '241326', '241337', '241339', '241341', '241661', '241667', '241694', '350211']
nosy_count = 8.0
nosy_names = ['georg.brandl', 'rhettinger', 'ezio.melotti', 'eric.araujo', 'r.david.murray', 'docs@python', 'jonrsharpe', 'peternowee']
pr_nums = []
priority = 'normal'
resolution = 'out of date'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue23987'
versions = ['Python 3.4', 'Python 3.5']
```