pytoolz / toolz

A functional standard library for Python.
http://toolz.readthedocs.org/
Other
4.64k stars 258 forks source link

Tee function? #243

Open jni opened 9 years ago

jni commented 9 years ago

Hi everyone,

I often want to do more than one thing to a stream, for example, take k-mer frequencies and build a De Bruijn graph. I propose the following:

import toolz as tz

@tz.curry
def tee(consumer, stream):
    for elem in stream:
        consumer(elem)
        yield elem

Then, for example:

def counter(acc):
    def add_count(elem):
        acc[elem] += 1
    return add_count

And:

>>> import collections
>>> counts = collections.defaultdict(int)
>>> tz.pipe(tz.concat((range(5), range(4))), tee(counter(counts)), sum)
16
>>> counts
defaultdict(<class 'int'>, {0: 2, 1: 2, 2: 2, 3: 2, 4: 1})
>>> result = tz.pipe(range(5), tee(print), sum)
0
1
2
3
4
>>> result
10

Any interest in this?

mrocklin commented 9 years ago

Hrm, you might take a look at the do function. It's a bit like tee with the map factored out.

In [1]: from toolz.curried import pipe, map, do

In [2]: result = pipe(range(5), map(do(print)), sum)
0
1
2
3
4

In [3]: result
Out[3]: 10
eriknw commented 9 years ago

Historical context of do: https://github.com/pytoolz/toolz/pull/122 https://github.com/pytoolz/toolz/pull/141

jni commented 9 years ago

Yep, looks like exactly what I need. I do think map(do(foo)) is a common enough pattern to warrant its own function, but it's easy enough to create.

Looking at the implementation of do, I don't understand why the stream isn't consumed by func(x)?

mrocklin commented 9 years ago

Do doesn't operate on a stream, it operates on an element. The element doesn't get consumed.

jni commented 9 years ago

Ah! Of course. So then my point is, you would never use do without map in a pipe, right?

mrocklin commented 9 years ago

Well, unless you wanted a side effect on the iterator (like itertools.tee-ing) or something that didn't burn data (like incrementing a counter or triggering a log event) or if the data flowing through the pipe isn't an iterator or general consumable. I find myself using pipe with non-iterators fairly often.

mrocklin commented 9 years ago

I suspect that a deeper question might be, does it make sense to merge map and do into a new function and add that to the api?

do_many = compose(map, do)
jni commented 9 years ago

It seems you already had that (side_effects) and decided against keeping it. I think it's worthwhile, but since it's a one-liner to make, maybe you don't think it's worthwhile.

Incidentally, I like tee as the name and I think it's more useful than itertools.tee, because