`stream.list` implementation is x10 slower than using plain Python built-in functionality (list comprehension)

simon-liebehenschel commented 2 years ago

Running values = await stream.list(gen()) timeit: 17,5 seconds Running values = [_ async for _ in gen()] timeit: 1.9 seconds

Code:

import asyncio
from collections.abc import AsyncIterator
from aiostream import stream

async def gen() -> AsyncIterator[int]:
    for i in range(10_000):
        yield i

def main():
    async def func():
        values = await stream.list(gen())
        # values = [_ async for _ in gen()]
    asyncio.run(func())

if __name__ == '__main__':
    import timeit
    print(timeit.timeit("main()", setup="from __main__ import main", number=1000))

What is the reason of the current implementation?

..note:: The same list object is produced at each step in order to avoid memory copies.

Why?

simon-liebehenschel commented 2 years ago

I do not want to create one more issue, so I will write here. The chunks implementation is x3 slower than another implementation. Is there any reason for this or this is just a bad design?

vxgmichel commented 2 years ago

Hi @AIGeneratedUsername and thanks for the report,

What is the reason of the current implementation?

Good catch. The stream operators provided by aiostream have quite an overhead although I didn't expect it to be that much. However, this test is not really representative of what aiostream is typically used for since this example does not perform any IO operation.

Try replacing the generator with:

async def gen() -> AsyncIterator[int]:
    for i in range(10_000):
        await asyncio.sleep(0)
        yield i

And you'll get something like:

values = await stream.list(gen()): 58.098731541000234
Running values = [_ async for _ in gen()] 49.413229781000155

Still, I might look into it when I have time.

or this is just a bad design?

Please don't call other people's work bad design. Feel free to open another issue if you want me to address it.

vxgmichel / aiostream

`stream.list` implementation is x10 slower than using plain Python built-in functionality (list comprehension) #82