python / cpython

The Python programming language
https://www.python.org
Other
63.11k stars 30.22k forks source link

itertools.chain behaves strangly when copied with copy.copy #74083

Open 84dd73fe-9313-425f-aae0-390228a90839 opened 7 years ago

84dd73fe-9313-425f-aae0-390228a90839 commented 7 years ago
BPO 29897
Nosy @rhettinger, @kristjanvalur, @serhiy-storchaka, @MSeifert04
Files
  • itertools-chain-copy.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/serhiy-storchaka' closed_at = None created_at = labels = ['type-bug', 'library'] title = 'itertools.chain behaves strangly when copied with copy.copy' updated_at = user = 'https://github.com/MSeifert04' ``` bugs.python.org fields: ```python activity = actor = 'rhettinger' assignee = 'serhiy.storchaka' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'MSeifert' dependencies = [] files = ['46769'] hgrepos = [] issue_num = 29897 keywords = ['patch'] message_count = 10.0 messages = ['290106', '290143', '290146', '290916', '290917', '290918', '290995', '291027', '291035', '291091'] nosy_count = 4.0 nosy_names = ['rhettinger', 'kristjan.jonsson', 'serhiy.storchaka', 'MSeifert'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue29897' versions = ['Python 3.5', 'Python 3.6'] ```

    84dd73fe-9313-425f-aae0-390228a90839 commented 7 years ago

    When using copy.copy to copy an itertools.chain instance the results can be weird. For example

    >>> from itertools import chain
    >>> from copy import copy
    >>> a = chain([1,2,3], [4,5,6])
    >>> b = copy(a)
    >>> next(a)  # looks okay
    1
    >>> next(b)  # jumps to the second iterable, not okay?
    4
    >>> tuple(a)
    (2, 3)
    >>> tuple(b)
    (5, 6)

    I don't really want to "copy.copy" such an iterator (I would either use a, b = itertools.tee(a, 2) or b = a depending on the use-case). This just came up because I investigated how pythons iterators behave when copied, deepcopied or pickled because I want to make the iterators in my extension module behave similarly.

    rhettinger commented 7 years ago

    Humph, that is definitely not the expected result. The itertools copy/reduce support has been a never-ending source of bugs and headaches.

    It looks like the problem is that __reduce__ is returning the existing tuple iterator rather than a new one:

    >>> a = chain([1,2,3], [4,5,6])
    >>> b = copy(a)
    >>> next(a)
    1
    >>> a.__reduce__()
    (<class 'itertools.chain'>, (), (<tuple_iterator object at 0x104ee78d0>, <list_iterator object at 0x104f81b70>))
    >>> b.__reduce__()
    (<class 'itertools.chain'>, (), (<tuple_iterator object at 0x104ee78d0>,))
    serhiy-storchaka commented 7 years ago

    chain(x) is a shortcut for chain.from_iterable(iter(x)).

    Neither copy.copy() nor __reduce__ don't have particular relation to this. Consider following example:

    >>> from itertools import chain
    >>> i = iter([[1, 2, 3], [4, 5, 6]])
    >>> a = chain.from_iterable(i)
    >>> b = chain.from_iterable(i)
    >>> next(a)
    1
    >>> next(b)
    4
    >>> tuple(a)
    (2, 3)
    >>> tuple(b)
    (5, 6)
    serhiy-storchaka commented 7 years ago

    This issue is related to the behavior of other composite iterators.

    >>> from copy import copy
    >>> it = map(ord, 'abc')
    >>> list(copy(it))
    [97, 98, 99]
    >>> list(copy(it))
    []
    >>> it = filter(None, 'abc')
    >>> list(copy(it))
    ['a', 'b', 'c']
    >>> list(copy(it))
    []

    The copy is too shallow. If you consume an item from one copy, it is disappeared for the original.

    Compare with the behavior of iterators of builtin sequences:

    >>> it = iter('abc')
    >>> list(copy(it))
    ['a', 'b', 'c']
    >>> list(copy(it))
    ['a', 'b', 'c']
    >>> it = iter(list('abc'))
    >>> list(copy(it))
    ['a', 'b', 'c']
    >>> list(copy(it))
    ['a', 'b', 'c']
    84dd73fe-9313-425f-aae0-390228a90839 commented 7 years ago

    Just an update what doesn't work: just overriding the __copy__ method.

    I tried it but it somewhat breaks itertools.tee because if the passed iterable has a __copy__ method tee rather copies the iterator (=> resulting in a lot of unnecessary memory overhead or breakage if a generator is "inside") instead of using it's memory-efficient internals.

    serhiy-storchaka commented 7 years ago

    Just for example there is a patch that implements in Python deeper copying for itertools.chain objects. I doesn't mean pushing it, it is too complicated. I have wrote also slightly simpler implementation, but it doesn't work due to the behavior of copied map object.

    rhettinger commented 7 years ago

    Serhiy, feel free to take this in whatever direction you think is best.

    84401114-8e59-4056-83cb-632106c0b648 commented 7 years ago

    It is a tricky issue. How deep do you go?what if you are chaining several of the itertools? Seems like we're entering a semantic sinkhole here.

    Deepcopy would be too deep... The original copy support in these objects stems from the desire to support pickling.

    On 1 Apr 2017 16:12, "Raymond Hettinger" \report@bugs.python.org\ wrote:

    Raymond Hettinger added the comment:

    Serhiy, feel free to take this in whatever direction you think is best.

    ---------- assignee: -> serhiy.storchaka


    Python tracker \report@bugs.python.org\ \http://bugs.python.org/issue29897\


    serhiy-storchaka commented 7 years ago

    Yes, this issue is tricky, and I don't have .

    If implement __copy for builtin compound iterators I would implement filter.__copy and map.__copy__ something like:

    def __copy__(self):
        cls, *args = self.__reduce__()
        return cls(*map(copy, args))

    If the underlying iterators properly support copying, the copying of filter and map iterators will be successful. If they don't support copying, the copying of filter and map iterators should fail, and don't accumulate elements in the tee() object.

    But there are open questions.

    1. This is a behavior change. What if any code depends on the current behavior? This is silly, copy(filter) and copy(map) could just return the original iterator if this is a desirable behavior.

    2. Depending on the copy module in the method of the builtin type looks doubtful. Should we implement copy.copy() in C and provide a public C API?

    3. If make a copying of limited depth, shouldn't we use a memo as for deepcopy() to prevent unwanted duplications? Otherwise the copied map(func, it, it) would behave differently from the original. This example is not so silly as looked.

    4. Is it possible to implement the copying for all compound iterators? For example the copying of chain() should change the state of the original object (by using __setstate__), so that it makes copies of subiterators before using them.

    Perhaps all this deserves a PEP.

    rhettinger commented 7 years ago

    Perhaps all this deserves a PEP.

    If Serhiy and Kristján are on a course of action, that will suffice. Copying iterators is an esoteric endeavor of interest to very few users (no one has even noticed until now).

    worldpeacez0991 commented 1 year ago

    This issue #74083 is more about questioning the naming conventions used in Python. This is a debate between usage of copy(shallow copy) vs deep copy.

    A workaround for this issue, as discussed, is to use deepcopy like this: Input: from itertools import chain from copy import deepcopy

    a = chain([1,2,3], [4,5,6])

    b = copy(a)

    b = deepcopy(a)

    print(next(a)) print(next(b)) print(tuple(a)) print(tuple(b))

    Output: 1 1 (2, 3, 4, 5, 6) (2, 3, 4, 5, 6)

    The intention of shallow copy allows Python to save memory and aids its performance. Python being a language known to sacrifice speed for readability, has to cut corners for Python, to compete with other programming languages.

    Since copying is a essential task in programming, which may lead to unintentional accidents if misuse....perhaps, a PEP can be considered by Python BDFL, to change the term ".copy" to ".shallowCopy" (or a nicer term like "lightCopy"), to prevent confusion to learn Python.

    Else, if things remain the same way, programmers coming from other programming languages, are bound to overlook a simple term like .copy == shallow copy, which can lead to devastating consequences like more costs and hatred in this world. Sometimes, when we look at all the programming languages, they all have flaws, which may take more than courage and git to change it all. Sometimes, the cost of replacing an entire codebase just for a single term is just not worth it. Maybe Python is still young and prioritizing its time to evolve in other areas to solve mankind...