jjmachan commented 1 year ago

Description

Currently, LlamaIndex has support for asynchronous query requests but it is not implemented fully. the Retriever does not have an asynchronous retrieve part and the Synthesiser even though it has an async version. it is not implemented for a few.

The following components have to be made async since depending on implementation they might be called externally.

Todos

Retriever (needs an async counterpart, #6587 should address that)
- get embeddings (can be network call to embeddings endpoint)
- get vector store response (another network call)
- get from doc_store (not all vector_store's support docs as well, but other modalities also should be considered)
Synthesis: supported today but a few response_modes don't have the implementation
- refine
- compact_and_refine

Motivation

So I'm working on revamping the Playground module so that user can quickly and cheaply prototype different LlamaIndex configurations and evaluate which ones work best. However, since a lot of the calls are blocking (esp in synthesis) the UX is bad.

The 2 ways to fix this are

Batching: allow users to send a batch of queries in the playground. This means changes to the internals so that we have batching but offer significant speedups, especially at the synthesis stage. But this is a more significant change internally and might be harder to implement for just this use-case of evaluations.
Async: a truly optimised async is as good as batching since the requests would be sent out in parallel. We have partial implementations for this so personally I think it would be easier to

Now having a complete async functionality can bring optimisation for user hosting LlamaIndex also since ASGI servers can be leveraged, increasing throughput.

Value of Feature

I'll let the numbers speak for themselves (although it's a crude implementation) using default vector_store and doc_store

with #6587 Pasted image 20230626131416

with #6590 - this has the most bang for the buck and I'll get this merged asap Pasted image 20230626142533

concerns

having both async and sync versions can be harder to maintain over time.
parts of the codebase might start acting differently for these.

A solution I had in mind was to make async methods default internally while exposing something that feels synchronous to the users. query() will call aquery and execute the coroutines for the users.

The neat thing here is that all we have to do is make non-async functions async - even if they are blocking in the implementation stage. The developer is free to implement blocking/async methods.

jon-chuang commented 1 year ago

Definitely agreed. For instance, one aspect of this is

[ ] https://github.com/jerryjliu/llama_index/issues/6599

jon-chuang commented 1 year ago

A downstream benefit of async support is better webserver performance. Currently, tracing and callbacks would completely fail on such async serving use cases.

jjmachan commented 1 year ago

exactly @jon-chuang I feel the same too and I am happy to help contribute some of the things required but I guess we need the team's opinion on this

logan-markewich commented 1 year ago

@jjmachan you are free to modify and propose as you please! If it's a useful feature, it will be merged :)

dosubot[bot] commented 1 year ago

Hi, @jjmachan! I'm Dosu, and I'm helping the LlamaIndex team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, this issue is a feature request to improve support for async functionality in the LlamaIndex library. The author wants to revamp the Playground module to allow users to prototype different LlamaIndex configurations, but the current blocking calls make the user experience bad. Implementing async functionality would improve the speed and throughput of the library. @jon-chuang agrees and mentions a downstream benefit of better webserver performance.

It seems like there has been some discussion on this issue, and you have expressed your willingness to contribute and have asked for the team's opinion. @logan-markewich has encouraged you to modify and propose the feature.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LlamaIndex repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LlamaIndex project!

run-llama / llama_index

[Feature Request]: improve support for async in llama_index #6591

Description

Todos

Motivation

Value of Feature

concerns