Open Goldziher opened 5 months ago
Offering a generator chunker and perhaps even support for lazy chunking is something I’m open to. I’ll start work on that shortly.
With regard to offering an asynchronous generator, I’m not too sure what value there would be in that when there isn’t anything I’m aware of in my chunker that is IO-bound. And seeing as synchronous functions and generators are already callable within asynchronous environments, making chunkers asynchronous would only seem to add more overhead. If there’s something I’m missing here, however, please let me know.
Offering a generator chunker and perhaps even support for lazy chunking is something I’m open to. I’ll start work on that shortly.
With regard to offering an asynchronous generator, I’m not too sure what value there would be in that when there isn’t anything I’m aware of in my chunker that is IO-bound. And seeing as synchronous functions and generators are already callable within asynchronous environments, making chunkers asynchronous would only seem to add more overhead. If there’s something I’m missing here, however, please let me know.
using an asnyc iterator / generator allows for streaming the source rather than loading it all into memory.
So you imagine it being used to handle inputs that are async iterators, is that right? For example:
chunker = chunkerify(...)
texts = my_async_text_generator()
# Normally you'd do this:
chunks = [chunker(text) async for text in texts]
# But you'd like to be able to do this(?)
chunks = await chunker(texts)
So you imagine it being used to handle inputs that are async iterators, is that right? For example:
chunker = chunkerify(...) texts = my_async_text_generator() # Normally you'd do this: chunks = [chunker(text) async for text in texts] # But you'd like to be able to do this(?) chunks = await chunker(texts)
For a stream I would use an async iterator (e.g. async generator)
But using async for chunking is purely for IO bound situations, like using chunking in an API. The advantage of
chunks = await chunker(texts)
Is that this will be ran in an async worker thread rather than the main thread, and thus not block the execution of other async threads.
I can fake it by doing something like
await anyio.to_thread.run_sync(chunker, texts)
But this is pretty suboptimal since it slows execution quite a bit.
Hi there!
Thanks for this neat library. I'm giving it a go.
It would be great to have two variants of the
chunkerify
function that return a generator and async generator, and a version that is async.Use cases:
The simplest option (but non performant) version for implementing async logic, is simply to execute the sync version using something like
anyio.to_thread.run_sync
: https://anyio.readthedocs.io/en/stable/threads.html.