Open david-andrew opened 1 month ago
When you overwrite __setitem__
, after d1[item]['a'] = 10
, there are no key preprocess_key(item)
in your d1
, but key preprocess_key(preprocess_key(item))
exists. Therefore, __missing__
is still called during d1[item]['a'] = 20
.
Documentation
defaultdict
seems to call__getitem__
whenever__setitem__
is called (regardless of if the item was already present), whereas regulardict
does not call__getitem__
when__setitem__
is called. The documentation fordefaultdict
says thatdefaultdict
anddict
are basically identical, except in a few narrow cases.But nothing is mentioned in the docs about this difference in behavior of calling
__getitem__
/__setitem__
This comes up when making a child class of either of them if you want to have a preprocessing step that operates on keys before they are used to index into the dictionary, e.g.
Which prints out something like:
In this example, I have a preprocessor function I'd like to run on all keys to convert them from objects into strings which can be used in the dictionary. It is not clear from the docs that you need to not override
__setitem__
like I have commented out, becausedefaultdict
will always call__getitem__
thus always running the preprocessor. If you override__setitem__
like I have commented out, you will preprocess the item twice, and end up with results like this:or this:
(I believe the extra element happens because the string from
preprocess_item
may or may not allocate new memory given an identical input)I'm not exactly sure what the underlying cause of this difference is. It doesn't seem to be related to the
__missing__
method mentioned in the docs, because the behavior I mentioned happens for keys that are not present in thedefaultdict
as well as for those that are already present (and presumably wouldn't be calling__missing__
).python version
I ran my example in python 3.6 through 3.12, and observed the same behavior in all of them