Closed 6be953c8-bbe2-4c41-a1de-bb9dbc605ac5 closed 8 years ago
The memory footprint of itertools.tee can be reduced substantially by using a shared buffer for the child iterators (see sample code). If local queues are desired for efficient threading support, they can be combined with a global queue, allowing to constrain the size of local queues.
Why do you think that your implementation uses less memory? You have some measurements confirming it?
No. The sample code is a demonstration how to do it, it's by no means a full-fledged patch.
The drawback of the current implementation is that if you tee n-fold, and then advance one of the iterators m times, it fills n queues with m references each, for a total of (n-1)*m references. The documentation explicitly mentions this is unfortunate.
I only demonstrate that it is perfectly unnecessary to fill n separate queues, as you can use a single queue and index into it. Instead of storing duplicate references, you can store a counter with each cached item reference. Replacing duplicates by refcounts, it turns (n-1)m references into 2\m references (half of which being the counters).
Not in the demo code: you can improve this further by storing items in iterator-local queues when that iterator is the only one that still needs to return it, and in a shared queue with refcount when there are more of them. That way, you eleminate the overhead of storing (item, 1) instead of item for a fix-cost per-iterator.
The documentation explicitly mentions this is unfortunate.
Ah, here's the thing. The code in the documentation is provided only for the explanation. In fact tee() is implemented in the same way that you suggest (but uses a more efficient deque implementation). I propose to close this issue as invalid. Thanks for the suggestion.
Oh great! Then I can use it as-is. How about reassigning the issue to documentation (for clarifying the inefficiency warning)?
Raymond, do you want to add a clarification to the documentation, or prefer just to close this issue?
Unlike the other "equivalents" in the docs, this one is further away from the actual implementation. So, I think I should add a clarification that the example code is a "rough logical equivalent" but that actual implementation may have shared queues (Jython, PyPy, and IronPython are free to have their own implementation choices).
New changeset f09306f9fa6f by Raymond Hettinger in branch 'default': Issue bpo-16394: Note the tee() pure python equivalent is only a rough approximation. https://hg.python.org/cpython/rev/f09306f9fa6f
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at =
created_at =
labels = ['type-feature', 'docs']
title = 'Reducing tee() memory footprint'
updated_at =
user = 'https://bugs.python.org/cami'
```
bugs.python.org fields:
```python
activity =
actor = 'rhettinger'
assignee = 'rhettinger'
closed = True
closed_date =
closer = 'rhettinger'
components = ['Documentation']
creation =
creator = 'cami'
dependencies = []
files = ['27856']
hgrepos = []
issue_num = 16394
keywords = []
message_count = 8.0
messages = ['174632', '174653', '174659', '174669', '174674', '254552', '254694', '264222']
nosy_count = 5.0
nosy_names = ['rhettinger', 'docs@python', 'python-dev', 'cami', 'pconnell']
pr_nums = []
priority = 'low'
resolution = 'fixed'
stage = 'needs patch'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue16394'
versions = ['Python 2.7', 'Python 3.2', 'Python 3.3', 'Python 3.4']
```