python / cpython

The Python programming language
https://www.python.org
Other
63.19k stars 30.26k forks source link

Dictionary union. (PEP 584) #80325

Closed brandtbucher closed 4 years ago

brandtbucher commented 5 years ago
BPO 36144
Nosy @gvanrossum, @rhettinger, @mdickinson, @scoder, @serhiy-storchaka, @zooba, @MojoVampire, @aaronchall, @3lnc, @tirkarthi, @brandtbucher, @curtisbucher, @chaburkland, @justjais
PRs
  • python/cpython#12088
  • python/cpython#18659
  • python/cpython#18729
  • python/cpython#18814
  • python/cpython#18832
  • python/cpython#18911
  • python/cpython#18931
  • python/cpython#18967
  • python/cpython#19106
  • python/cpython#19127
  • python/cpython#19221
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['interpreter-core', 'type-feature', '3.9'] title = 'Dictionary union. (PEP 584)' updated_at = user = 'https://github.com/brandtbucher' ``` bugs.python.org fields: ```python activity = actor = 'Raps Uk' assignee = 'none' closed = True closed_date = closer = 'brandtbucher' components = ['Interpreter Core'] creation = creator = 'brandtbucher' dependencies = [] files = [] hgrepos = [] issue_num = 36144 keywords = ['patch'] message_count = 66.0 messages = ['336798', '336803', '336808', '336810', '336811', '336812', '336816', '336820', '336847', '336848', '336849', '336854', '337094', '337107', '337266', '337267', '358305', '362619', '362620', '362621', '362626', '362645', '362660', '362663', '362666', '362726', '362729', '362740', '362757', '362759', '362761', '362774', '362779', '362810', '362813', '362816', '362817', '362819', '362821', '362823', '362827', '362828', '362831', '362835', '362852', '362855', '363526', '363527', '363528', '363613', '363625', '363889', '363953', '364106', '364107', '364195', '364196', '364887', '364900', '364969', '364992', '365245', '365333', '371520', '371549', '373205'] nosy_count = 15.0 nosy_names = ['gvanrossum', 'rhettinger', 'mark.dickinson', 'scoder', 'serhiy.storchaka', 'steve.dower', 'josh.r', 'Aaron Hall', 'slam', 'xtreak', 'brandtbucher', 'curtisbucher', 'chaburkland', 'justjais', 'Raps Uk'] pr_nums = ['12088', '18659', '18729', '18814', '18832', '18911', '18931', '18967', '19106', '19127', '19221'] priority = 'normal' resolution = 'fixed' stage = 'patch review' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue36144' versions = ['Python 3.9'] ```

    brandtbucher commented 5 years ago

    ...as discussed in python-ideas. Semantically:

    d1 + d2 \<-> d3 = d1.copy(); d3.update(d2); d3 d1 += d2 \<-> d1.update(d2)

    Attached is a working implementation with new/fixed tests for consideration. I've also updated collections.UserDict with the new __add/radd/iadd__ methods.

    rhettinger commented 5 years ago

    I believe that Guido rejected this when it was proposed a few years ago.

    tirkarthi commented 5 years ago

    Python ideas discussion in 2015 : https://mail.python.org/pipermail/python-ideas/2015-February/031748.html LWN summary : https://lwn.net/Articles/635397/

    serhiy-storchaka commented 5 years ago

    I believe it was proposed and rejected multiple times.

    rhettinger commented 5 years ago

    For the record, I'm opposed to the idea.

         options.update(user_selections)

    That reads more like self explanatory English than:

     options += user_selections

    The latter takes more effort to correctly parse and makes it less clear that you're working with dicts.

    serhiy-storchaka commented 5 years ago
    • It is natural to expect the plus operator to be commutative, but this operation would necessarily be non-commutative.

    In Python, the plus operator for sequences (strings, lists, tuples) is non-commutative.

    But I have other arguments against it:

    rhettinger commented 5 years ago

    In Python, the plus operator for sequences (strings, lists, tuples) is non-commutative.

    For sequences, that is obvious and expected, but not so much with mappings where the order of overlapping keys is determined by the left operand and the value associated with those keys is determined by the right operand.

    Also with sequences the + operator actually means "add to", but with dictionaries it means "add/or replace" which is contrary to the normal meaning of plus. I think that was one of Guido's reasons for favoring "|" instead of "+" for set-to-set operations.

    We already have a syntax for dict merging: {*d1, *\d2}. It works with arbitrary mappings,

    This is a good point.

    scoder commented 5 years ago

    We already have a syntax for dict merging: {*d1, *\d2}.

    Which doesn't mean that "d1 + d2" isn't much more intuitive than this special-character heavy version. It takes me a while to see the dict merge under that heap of stars. And that's already the shortest example.

    It works with arbitrary mappings,

    The RHS of "d += M" doesn't have to be a dict IMHO, it could be any mapping. And even "dict(X) + M" doesn't look all too bad to me, even though there's "dict(X, **M)".

    Use of the + operator is a temptation to produce new dictionaries rather than update an existing dict in-place which is usually what you want.

    That's why there would be support for "+=". The exact same argument already fails for lists, where concatenation is usually much more performance critical than for the average little dict. (And remember that most code isn't performance critical at all.)

    We already have ChainMap() which presents a single view of multiple mappings with any copying.

    Which is a different use case that is unlikely to go away with this proposal.

    makes it less clear that you're working with dicts.

    This is a valid argument, although it always depends on the concrete code what the most readable way to express its intentions is. Again, this doesn't really differ for lists.

    Let's wait for the PEP, I'd say.

    99ffcaa5-b43b-4e8e-a35e-9c890007b9cd commented 5 years ago

    scoder: dict(X, **M) is broken unless M is known to be string keyed (it used to work, but in Python 3, it will raise a TypeError). It's part of the argument for the additional unpacking generalizations from PEP-448; {*X, **M} does what dict(X, *\M) is trying to do, but without abusing the keyword argument passing convention.

    You also claim "It takes me a while to see the dict merge under that heap of stars", but that's at least as much about the newness of PEP-448 (and for many Python coders, a complete lack of familiarity with the pre-existing varargs unpacking rules for functions) as it is about the punctuation; after all, you clearly recognize dict(X, **M) even though it's been wrong in most contexts for years.

    In any event, I'm a strong -1 on this, for largely the same reasons as Raymond and others:

    1. It doesn't provide any new functionality, just one more way to do it; += is satisfied by .update, + is satisfied (more generally and efficiently) by the unpacking generalizations

    2. It's needlessly confusing; addition is, for all existing types in the standard library I can think of, lossless; the information from both sides of the + is preserved in some form, either by addition or concatenation (and in the concatenation case, addition is happening, just to the length of the resulting sequence, and order is preserved). Addition for dictionaries would introduce new rules specific to dicts that do not exist for any other type regarding loss of values, non-additive resulting length, etc. Those rules would likely be similar to those of dict literals and the update method, but they'd need to be made explicit. By contrast, the PEP-448 unpacking generalization rules followed the existing rules for dict literals; no special rules occur, it just behaves intuitively (if you already knew the rules for dict literals without unpacking being involved).

    3. Almost any generic, duck-typing based code for which addition makes sense will not make sense for dicts simply because it loosens the definition of addition too much to be useful, so best case, it still raises TypeError (when dicts added to non-dict things), worst case, it silently operates in a way that violates the rules of both addition and concatenation rather than raising a TypeError that the generic code could use to determine the correct thing to do.

    4. The already mentioned conflict with Counter (which already has an addition operator, with lossless semantics)

    5. (Minor) It means PyDict_Type needs a non-NULL tp_as_number, so now it's slightly slower to reject dicts as being non-numeric at the C layer

    Problem #2 could be used to argue for allowing | instead of + (which would also resolve #4, and parts of #3), since | is already used for unioning with sets, and this operation is much closer to a union operation than addition or concatenation. Even so, it would still be misleading; at least with sets, there is no associated value, so it's still mostly lossless (you lose the input lengths, but the unique input data is kept); with dicts, you'd be losing values too.

    Basically, I think the PEP-448 unpacking syntax should remain as the "one-- and preferably only one --obvious way to" combine dictionaries as a one-liner. It's more composable, since it allows adding arbitrary additional key/value pairs, and more efficient, since it allows combining more than two dicts at once with no additional temporaries: dicta + dictb + dictc requires "dictab" to be made first, then thrown away after dictab + dictc produces dictabc, while {*dicta, **dictb, *\dictc} builds dictabc directly.

    The only real argument I can see for not sticking to unpacking is that it doesn't allow for arbitrary dict-like things to produce new dict-like things directly; you'd have to rewrap as myspecialdict({*speciala, *\specialb}). But I don't think that's a flaw worth fixing if it means major changes to the behavior of what I'm guessing is one of the three most commonly used types in Python (along with int and tuple, thanks to the integration of dicts into so many facets of the implementation).

    gvanrossum commented 5 years ago

    I changed my mind and am now in favor. Most of the arguments against could also be used against list+list. Counter addition is actually a nice special case of this -- it produces the same keys but has a more sophisticated way of merging values for common keys. Please read the python-ideas thread!

    99ffcaa5-b43b-4e8e-a35e-9c890007b9cd commented 5 years ago

    Also note: That Python ideas thread that xtreak linked ( https://mail.python.org/pipermail/python-ideas/2015-February/031748.html ) largely rejected the proposal a couple weeks before PEP-448 was approved. At the time, the proposal wasn't just about +/+=; that was the initial proposal, but operator overloading was heavily criticized for the failure to adhere to either addition or concatenation semantics, so alternate constructors and top-level functions similar to sorted were proposed as alternatives (e.g. merged(dicta, dictb)). The whole thread ended up being about creating an approved, built-in way of one-lining: d3 = d1.copy(); d3.update(d2)

    A key quote though is that this was needed because there was no other option without rolling your own merged function. Andrew Barnert summarized it best:

    "I'm +1 on constructor, +0.5 on a function (whether it's called updated or merged, whether it's in builtins or collections), +0.5 on both constructor and function, -0.5 on a method, and -1 on an operator.

    "Unless someone is seriously championing PEP-448 for 3.5, in which case I'm -0.5 on anything, because it looks like PEP-448 would already give us one obvious way to do it, and none of the alternatives are sufficiently nicer than that way to be worth having another."

    As it happens, PEP-448 was put in 3.5, and we got the one obvious way to do it.

    Side-note: It occurs to me there will be one more "way to do it" in 3.8 already, thanks to PEP-572:

    (d3 := d1.copy()).update(d2)

    I think I'll stick with d3 = {*d1, *\d2} though. :-)

    tirkarthi commented 5 years ago

    Current python-ideas thread for the issue : https://mail.python.org/pipermail/python-ideas/2019-February/055509.html

    5f88a49a-d38b-49f1-960b-c32b9d4564ce commented 5 years ago

    If we're going to forget about commutativity of +, should we also implement +/+= for sets?

    scoder commented 5 years ago

    should we also implement +/+= for sets?

    The question is: what would that do? The same as '|=' ? That would be rather confusing, I think. "|" (meaning: "or") seems a very natural operation for sets, in the same way that "|" operates on bits in integers. That suggests that "|" is the right operator for sets.

    In any case, this is an unrelated proposal that is better not discussed in this ticket. The only link is whether "|" is the more appropriate operator also for dicts, which is to be discussed in the PEP and thus also not in this ticket.

    vstinner commented 5 years ago

    Is this issue directly or indirectly related to the PEP-584 "Add + and - operators to the built-in dict class"? https://www.python.org/dev/peps/pep-0584/

    vstinner commented 5 years ago

    Is this issue directly or indirectly related to the PEP-584 "Add + and - operators to the built-in dict class"? https://www.python.org/dev/peps/pep-0584/

    Ah yes, it's written in the title of the PR. I add it to the bug title as well.

    affb16c2-6fdb-403f-b328-86b7e719c99e commented 4 years ago

    Another obvious way to do it, but I'm +1 on it.

    A small side point however - PEP-584 reads:

    To create a new dict containing the merged items of two (or more) dicts, one can currently write:

    {*d1, *\d2}

    but this is neither obvious nor easily discoverable. It is only guaranteed to work if the keys are all strings. If the keys are not strings, it currently works in CPython, but it may not work with other implementations, or future versions of CPython[2].

    ...

    [2] Non-string keys: https://bugs.python.org/issue35105 and https://mail.python.org/pipermail/python-dev/2018-October/155435.html

    The references cited does not back this assertion up. Perhaps the intent is to reference the "cool/weird hack" dict(d1, **d2) (see https://mail.python.org/pipermail/python-dev/2010-April/099485.html and https://mail.python.org/pipermail/python-dev/2010-April/099459.html), which allowed any hashable keys in Python 2 but only strings in Python 3.

    If I see {*d1, *\d2}, my expectations are that this is the new generalized unpacking and I currently expect any keys to be allowed, and the PEP should be updated to accurately reflect this to prevent future misunderstandings.

    gvanrossum commented 4 years ago

    PEP-584 has been approved by the Steering Council (at my recommendation). We will shortly begin landing PRs related to this.

    gvanrossum commented 4 years ago

    New changeset eb8ac57af26c4eb96a8230eba7492ce5ceef7886 by Brandt Bucher in branch 'master': bpo-36144: Dictionary Union (PEP-584) (bpo-12088) https://github.com/python/cpython/commit/eb8ac57af26c4eb96a8230eba7492ce5ceef7886

    gvanrossum commented 4 years ago

    While the main code has been merged now, I propose to keep this issue open until some other things have happened:

    brandtbucher commented 4 years ago

    My current PR plans are:

    I'll also create a BPO issue to discuss whether the dict subclasses in http.cookies should be updated.

    That should do it for CPython; I'm planning on updating typeshed and adding a handful of tests to mypy for TypedDict, etc. after these are landed.

    Guido, okay to tag you on these for review?

    gvanrossum commented 4 years ago

    Yup, great plan.

    On Mon, Feb 24, 2020 at 22:29 Brandt Bucher \report@bugs.python.org\ wrote:

    Brandt Bucher \brandtbucher@gmail.com\ added the comment:

    My current PR plans are:

    • Docs. This will include the dict docs and the whatsnew 3.9. I assume we have no plans to cover this in the tutorials, etc. Let me know if I'm missing anything here.
    • collections.defaultdict, with tests. I don't think this needs docs beyond a short "changed in version 3.9" note.
    • collections.OrderedDict, with tests. Ditto defaultdict on docs.
    • collections.ChainMap, ditto.
    • types.MappingProxy, ditto.

    I'll also create a BPO issue to discuss whether the dict subclasses in http.cookies should be updated.

    That should do it for CPython; I'm planning on updating typeshed and adding a handful of tests to mypy for TypedDict, etc. after these are landed.

    Guido, okay to tag you on these for review?

    ----------


    Python tracker \report@bugs.python.org\ \https://bugs.python.org/issue36144\


    -- --Guido (mobile)

    zooba commented 4 years ago

    Not sure if this is a big deal or not, and it seems likely that the preexisting behaviour of .update() and ** unpacking have already decided it, but is it intentional that you end up with the first-seen key and the last-seen value in the case of collisions?

    class C:
        def __init__(self, *a): self.a = a
        def __hash__(self): return hash(self.a[0])
        def __eq__(self, o): return self.a[0] == o.a[0]
        def __repr__(self): return f"C{self.a}"
    >>> c1 = C(1, 1); c1
    C(1, 1)
    >>> c2 = C(1, 2); c2
    C(1, 2)
    
    For set union we get the first seen value:
    >>> {c1} | {c2}
    {C(1, 1)}
    
    For dict union we get the first seen key and the last seen value:
    >>> {c1: 'a'} | {c2: 'b'}
    {C(1, 1): 'b'}
    
    But similarly for dict unpack (and .update(); code left as an exercise to the reader):
    >>> {**{c1: 'a'}, **{c2: 'b'}}
    {C(1, 1): 'b'}

    So the union of two dicts may contain .items() elements that were not in either of the inputs.

    Honestly, I've never noticed this before, as the only time I create equivalent objects with meaningfully-distinct identities is to use with sets. I just figured I'd try it out after seeing suggestions that the dict union operands were transposed from set union.

    brandtbucher commented 4 years ago

    As a somewhat simpler example:

    >>> f = {False: False}
    >>> z = {0: 0}
    >>> f | z
    {False: 0}
    >>> {**f, **z}
    {False: 0}
    >>> f.update(z); f
    {False: 0}

    Though these hairier cases aren't explicitly addressed, the conflict behavior is covered in the Rationale and Reference Implementation sections of the PEP. All of the above examples share code (dict_update_arg), and that's definitely intentional. I for one think it would be confusing (and probably a bug) if one of the examples above gave a different key-value pair!

    I find it makes more sense if you see a set as valueless keys (rather than keyless values).

    zooba commented 4 years ago

    That's a much simpler example. And of course:

    >>> z[False] = False
    >>> z
    {0: False}

    So the precedent is well established that the key doesn't get updated with the value.

    No further questions, yer honour ;)

    gvanrossum commented 4 years ago

    New changeset d0ca9bd93bb9d8d4aa9bbe939ca7fd54ac870c8f by Brandt Bucher in branch 'master': bpo-36144: Document PEP-584 (GH-18659) https://github.com/python/cpython/commit/d0ca9bd93bb9d8d4aa9bbe939ca7fd54ac870c8f

    gvanrossum commented 4 years ago

    @Brandt: you have some more followup PRs planned right? Let's keep this issue open until you've done all of those.

    brandtbucher commented 4 years ago

    Yep. I'm currently working on OrderedDict, defaultdict, and MappingProxyType.

    My brother is looking to make his first contribution, so he'll be taking care of ChainMap.

    99ffcaa5-b43b-4e8e-a35e-9c890007b9cd commented 4 years ago

    What is ChainMap going to do? Normally, the left-most argument to ChainMap is the "top level" dict, but in a regular union scenario, last value wins.

    Seems like layering the right hand side's dict on top of the left hand side's would match dict union semantics best, but it feels... wrong, given ChainMap's normal left-to-right precedence. And top-mostness affects which dict receives all writes, so if chain1 |= chain2 operates with dict-like precedence (chain2 layers over chain1), then that also means the target of writes/deletions/etc. changes to what was on top in chain2.

    brandtbucher commented 4 years ago

    The plan is to follow dict’s semantics. The |= operator will basically delegate to the first map in the chain. The | operator will create a new ChainMap where the first map is the merged result of the old first map, and the others are the same.

    So, basically update / copy-and-update, respectively.

    99ffcaa5-b43b-4e8e-a35e-9c890007b9cd commented 4 years ago

    Sorry, I think I need examples to grok this in the general case. ChainMap unioned with dict makes sense to me (it's equivalent to update or copy-and-update on the top level dict in the ChainMap). But ChainMap unioned with another ChainMap is less clear. Could you give examples of what the expected end result is for:

        d1 = {'a': 1, 'b': 2}
        d2 = {'b': 3, 'c': 4}
        d3 = {'a': 5, 'd': 6}
        d4 = {'d': 7, 'e': 8}
        cm1 = ChainMap(d1, d2)
        cm2 = ChainMap{d3, d4)

    followed by either:

        cm3 = cm1 | cm2

    or cm1 |= cm2

    ? As in, what is the precise state of the ChainMap cm3 or the mutated cm1, referencing d1, d2, d3 and d4 when they are still incorporated by references in the chain?

    My impression from what you said is that the plan would be for the updated cm1 to preserve references to d1 and d2 only, with the contents of cm2 (d3 and d4) effectively flattened and applied as an in-place update to d1, with an end result equivalent to having done:

        cm1 = ChainMap(d1, d2)
        d1 |= d4
        d1 |= d3

    (except the key ordering would actually follow d3 first, and d4 second), while cm3 would effectively be equivalent to having done (note ordering):

        cm3 = ChainMap(d1 | d4 | d3, d2)

    though again, key ordering would be based on d1, then d3, then d4, not quite matching the union behavior. And a reference to d2 would be preserved in the final result, but not any other original dict. Is that correct? If so, it seems like it's wasting ChainMap's key feature (lazy accumulation of maps), where:

    cm1 |= cm2

    could be equivalent to either:

    cm1.maps += cm2.maps

    though that means cm1 wins overlaps, where normal union would have cm2 win, or to hew closer to normal union behavior, make it equivalent to:

    cm1.map[:0] = cm2.maps

    prepending all of cm2's maps to have the same duplicate handling rules as regular dicts (right side wins) at the expense of changing which map cm1 uses as the target for writes and deletes. In either case it would hew to the spirit of ChainMap, making dict "union"-ing an essentially free operation, in exchange for increasing the costs of lookups that don't hit the top dict.

    gvanrossum commented 4 years ago

    I think for |= the only choice is for it to be essentially an alias to .update(). So that means cm |= other becomes cm.maps[0].update(other).

    For | we are breaking new ground and we could indeed make cm | other do something like ChainMap(other, *cm.maps).

    I've not used ChainMap much (though I've seen some code that uses it) so I'm probably not the best judge of whether this is a good feature to have.

    Note that other | cm will just do whatever other.__or__ does, since ChainMap isn't a true subclass of dict, so it will not fall back to cm.__ror__. Basically ChainMap will not get control in this case.

    Other thoughts:

    brandtbucher commented 4 years ago

    I think for |= the only choice is for it to be essentially an alias to .update(). So that means cm |= other becomes cm.maps[0].update(other).

    Agreed.

    These semantics make |= behave rather differently from |. Is that okay? If not, which of them should change, and how?

    I don’t like this. Let me try to explain why:

    So far (and to the best of my knowledge), setting and updating values on a ChainMap works exactly the same as it does for dict, with all of the same semantics (the docs themselves even say that “all of the usual dictionary methods are supported”… which now could be interpreted as meaning | and |= as well). It’s only when deleting or using the new interfaces that things get more specialized.

    But that doesn’t really apply here. Having different (or worse, inconsistent) behavior for these operators, I feel, would be more confusing than helpful. Remember, a major goal of this proposal is to aid in duck typing.

    So, Josh’s understanding of my intended semantics is correct, I propose that, for:

        d1 = {'a': 1, 'b': 2}
        d2 = {'b': 3, 'c': 4}
        d3 = {'a': 5, 'd': 6}
        d4 = {'d': 7, 'e': 8}
        cm1 = ChainMap(d1, d2)
        cm2 = ChainMap{d3, d4)
    
        cm3 = cm1 | cm2

    Gives cm3 a value of:

        ChainMap(d1 | d4 | d3, d2)  # Or, equivalently: ChainMap(d1 | dict(cm2), d2)

    And:

    cm1 |= cm2

    Is equivalent to:

    d1 |= cm2

    I don’t want to change which map is "first", and I think changing the winning behavior from that of dict will create more problems than it solves. We only need to look at how ChainMap handles the update method… it keeps the same exact behavior, rather than trying to be lazy or reversed or something.

    If we *are* deciding to do something different, then I think it should have no relationship to PEP-584, which reasons out a carefully considered merge operation for dict, not ChainMap. But, it would also probably need a different operator, and be able to stand on its own merits.

    gvanrossum commented 4 years ago

    I had just come to a different conclusion. Maybe ChainMap should just not grow | and |= operators? That way there can be no confusion. dict() | ChainMap() and ChainMap() | dict() will fail because ChainMap doesn't inherit from dict. (Note that in your last message, d1 |= cm2 will fail for this reason. You can of course fix that with d1 |= dict(cm2), although IIUC there's no reason one of the maps couldn't be some other [Mutable]Mapping.)

    brandtbucher commented 4 years ago

    Note that in your last message, d1 |= cm2 will fail for this reason. You can of course fix that with d1 |= dict(cm2), although IIUC there's no reason one of the maps couldn't be some other [Mutable]Mapping.

    Mappings and iterables are fine for the in-place variant. :)

    >>> from collections import ChainMap
    >>> d = {}
    >>> c = ChainMap({"r": 2, "d":2})
    >>> d |= c
    >>> d
    {'r': 2, 'd': 2}

    I think it would be confusing to have ChainMap | ChainMap behave subtly different than dict | ChainMap. It would be especially odd if it also differed subtly from ChainMap | dict.

    To recap:

    +1 on adding the operators with dict semantics,
    +0 on no PEP 584 for ChainMap.
    -0 on implementing them, but changing the winning behavior by concatenating the maps lists or something. This would probably make more sense to me as a `+` operator, honestly. :(
    -1 for having the operators behave differently (other than performance shortcuts) for `cm | d`, `cm | cm`, `cm |= d`, `cm |= cm`.
    gvanrossum commented 4 years ago

    OK, assuming |= gets the same semantics as update(), can you repeat once more (without motivation) what the specification for cm | other will be?

    brandtbucher commented 4 years ago

    I believe that:

    cm | other

    Should return the equivalent of:

    ChainMap(cm.maps[0] | dict(other), *cm.maps[1:])
    brandtbucher commented 4 years ago

    ...however, I could also see the (similar):

        ChainMap(other, *cm.maps)  # Note that `other` is the original reference here.

    Being okay as well. Maybe even better, now that I've written it out.

    gvanrossum commented 4 years ago

    OK, that makes sense, it works similar to ChainMap.copy(), which copies maps[0] and keeps links to the rest. So in particular cm | {} will do the same thing as cm.copy().

    Im not sure if the dict(other) cast is the best way to go about it. Maybe this would work?

    def __or__(self, other):
        new = self.copy()
        new |= other  # OR new.update(other) ???
        return new
    
    def __ior__(self, other):
        self.update(other)
        return self

    Note that there is no ChainMap.update() definition -- it relies on MutableMapping.update().

    I guess we need a __ror as well, in case there's some other mapping that doesn't implement __or:

    def __ror__(self, other):
        new = other.copy()
        new.update(self)
        return new

    Note that this doesn't return a ChainMap but an instance of type(other). If other doesn't have a copy() method it'll fail.

    As a refinement, __or and __ror should perhaps check whether the operation can possibly succeed and return NotImplemented instead of raising? (Based on the type of other only, not its contents.)

    gvanrossum commented 4 years ago

    I didn't see your second reply, with ChainMap(other, *cm.maps).

    I'm not so keen on that, because its special behavior can't be mimicked by |=.

    brandtbucher commented 4 years ago

    Im not sure if the dict(other) cast is the best way to go about it. Maybe this would work?

    Yeah, I was imagining something like that... I used the cast for brevity in my reply but that probably wasn't helpful.

    Note that for __or__, we probably want to check the type of the argument (for either dict or ChainMap, or maybe just Mapping), to keep it from working on an iterable of key-value pairs.

    I guess we need a __ror as well, in case there's some other mapping that doesn't implement __or:

    Agreed. Again, we can check for Mapping here to assure success for the copy() move.

    As a refinement, __or and __ror should perhaps check whether the operation can possibly succeed and return NotImplemented instead of raising? (Based on the type of other only, not its contents.)

    Yup, see above. I think a check for Mapping should be fine.

    brandtbucher commented 4 years ago

    Just to clarify:

    If we decide to check isinstance(other, (ChainMap, dict)), '|' should probably be used.

    If we decide to check isinstance(other, Mapping), I think the copy/update methods should be used.

    serhiy-storchaka commented 4 years ago

    1.

    def __or__(self, other):
        return self.__class__(self.maps[0] | other, *self.maps[1:])
    
    def __ror__(self, other):
        return other | dict(self)

    2.

    def __or__(self, other):
        return self.__class__(other, *self.maps)
    
    def __ror__(self, other):
        return self.__class__(*self.maps, other)

    There are problems with both variants, so I think it may be better to not add this operator to ChainMap.

    brandtbucher commented 4 years ago

    I think we're only seriously considering the first variant (although implemented slightly differently, see my last two messages). And __ror__ would probably change, returning the type of self.

    What are the "problems" with it, exactly? We seem to be in agreement that the update behavior is reasonable, even for ChainMaps.

    gvanrossum commented 4 years ago

    We already have somewhat different semantics of | for Counter, and hence I think it's fine to give it the most useful semantics for ChainMap given that class's special behavior. I think we've come up with the right solution there.

    Let's stop the debate and put up a PR.

    brandtbucher commented 4 years ago

    Sounds good, I'll have these up soon.

    gvanrossum commented 4 years ago

    New changeset 57c9d1725689dde068a7fccaa7500772ecd16d2e by Brandt Bucher in branch 'master': bpo-36144: Implement defaultdict union (GH-18729) https://github.com/python/cpython/commit/57c9d1725689dde068a7fccaa7500772ecd16d2e

    gvanrossum commented 4 years ago

    Still waiting for ChainMap -- what else?

    brandtbucher commented 4 years ago

    My brother will have a ChainMap PR up soon. I'm just finishing up MappingProxyType, myself. Probably both this weekend.

    Then I'll move on to OrderedDict, which looks like it could be tricky. I'll need to familiarize myself with the implementation better (unless there's somebody who is already familiar with it who wants to take over). It looks well-commented, though.

    I think we can pass on the http.cookies subclasses since there don't appear to be any experts/maintainers for that module.

    gvanrossum commented 4 years ago

    New changeset 4663f66f3554dd8e2ec130e40f6abb3c6a514775 by Brandt Bucher in branch 'master': bpo-36144: Update MappingProxyType with PEP-584's operators (bpo-18814) https://github.com/python/cpython/commit/4663f66f3554dd8e2ec130e40f6abb3c6a514775