Open cgarciae opened 6 years ago
@cgarciae Thanks for sharing your implementation!
What do you think of the alternative interface for the same task? It's basically a further generalization of paco.gather:
def igather(coros_or_futures, limit=0, loop=None, timeout=None,
return_exceptions=False):
"""
Arguments:
coros_or_futures (iterable|asynchronousiterable): iterator yielding
coroutines functions.
limit (int): max concurrency limit. Use ``0`` for no limit.
loop (asyncio.BaseEventLoop): optional event loop to use.
timeout (int|float): timeout can be used to control the maximum number
of seconds to wait before returning. timeout can be an int or
float. If timeout is not specified or None, there is no limit to
the wait time.
return_exceptions (bool): returns exceptions as valid results.
Returns:
asynchronousiterable: sequence of values yielded by coroutines,
as completed
Making the result ordered should be also possible, albeit a bit harder to implement and memory-hungry in the worst case.
An implementation sketch inspired by https://bugs.python.org/issue30782#msg336237:
async def igather(tasks, limit=None):
async def submit(tasks, buf):
# TODO: additionally support async iterators
for task in tasks:
await buf.put(asyncio.create_task(task))
await buf.put(None)
async def consume(buf):
while True:
task = await buf.get()
if task:
yield await asyncio.wait_for(task, None)
else:
break
buf = asyncio.Queue(limit or 0)
asyncio.create_task(submit(tasks, buf))
async for result in consume(buf):
yield result
It preserves task submission order in efficient way, but lacks proper exception handling.
Hi! First of I think paco is a very nice library and would like to help improve it. That said I have a particular problem: I need to download millions of images as fast as possible. I looked into these resources:
Using
paco
my initial code was:I like the API of
paco.each
but when testing it my computer froze as its memory blew up while trying to create 1 million coroutines. The main problem is in these lines of code:I observe the following:
Since my problem speed and memory then 1 to 3 are more relevant. I recreated the
map
andeach
usingasyncio.Queue
and limiting the amount of tasks to exist at the same time. This involved creating and structure I calledStream
that just holds a coroutine and a Queue. My API enforces thelimit
oneach
to not surpass that amount of objects in memory.Both the new
from_iterable
andmap
functions havequeue_maxsize
parameter that further limits how the data flows and enforces a back-pressure mechanism. The code is at the end. I wanted to share the experiment and also open the possibility of creating apaco.stream
module to continue the life of this code.