[Question]: does SubQuestionQueryEngine support streaming?

ibicdev commented 1 year ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

Does SubQuestionQueryEngine support streaming? For example, how to stream response from docs/examples/usecases/10k_sub_question.ipynb? Tried to add streaming=True to llm, lyft_engine, and uber_engine, but still couldn't get s_engine to produce streaming response.

ibicdev commented 1 year ago

A follow-up question: how to make SubQuestionQueryEngine work with Chainlit, as in the LlamaIndex-Chainlit integration example: https://docs.chainlit.io/integrations/llama-index ?

dosubot[bot] commented 1 year ago

Answer generated by a 🤖

Answer

Hi there,

Yes, the SubQuestionQueryEngine in LlamaIndex does support streaming. This is achieved through the use of asyncio in the _aquery and _aquery_subq methods. The _aquery method generates sub-questions asynchronously and gathers the results of asynchronous sub-question queries. The _aquery_subq method performs an asynchronous query for a single sub-question. You can find more details in the source code here.

To enable streaming in SubQuestionQueryEngine, you can set streaming=True when building a query engine or when constructing the Response Synthesizer. Here is an example:

query_engine = index.as_query_engine(
    streaming=True,
    similarity_top_k=1
)

And for the Response Synthesizer:

from llama_index import get_response_synthesizer
synth = get_response_synthesizer(streaming=True, ...)
query_engine = RetrieverQueryEngine(response_synthesizer=synth, ...)

You can then use the StreamingResponse object like this:

streaming_response = query_engine.query(
    "What did the author do growing up?", 
)

for text in streaming_response.response_gen:
    # do something with text as they arrive.

Or:

streaming_response.print_response_stream()

You can find more details in the documentation here.

As for integrating with Chainlit, the code does not provide any direct evidence of how this can be done. However, since Chainlit is a data visualization tool, you would typically use it to visualize the results of your queries. You would need to write a separate function or method that takes the results of the queries, formats them in a way that Chainlit can understand, and then passes them to Chainlit for visualization. This would likely involve using Chainlit's API, but the specifics would depend on the exact requirements of your application and the capabilities of Chainlit.

I hope this helps! If you have any other questions, feel free to ask.

Best, Dosu

Sources

This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

ibicdev commented 1 year ago

The bot's answer works for RetrieverQueryEngine, but not for SubQuestionQueryEngine. I tried to add response_synthesizer to SubQuestionQueryEngine and call the query method, but the returned response object doesn't have print_response_stream or response_gen.

Any human insights?

logan-markewich commented 1 year ago

@ibicdev I think the bot is on the right track here.

The sub question query engine takes a response synthesizer, for generating the final response.

If this is configured to be streaming, then the final response will also be streaming 🤔

https://github.com/jerryjliu/llama_index/blob/c528cf6857193a6056cfb2d77666ba3c5659750b/llama_index/query_engine/sub_question_query_engine.py#L80

ibicdev commented 1 year ago

Thanks @logan-markewich for the clarification. It turns out that in the 10k_sub_question.ipynb example lyft_engine and uber_engine cannot have streaming=True too; with this removed the bot's suggestion is working now.

Closing this issue - though it would be nice to be able to stream answers to sub-questions :smile:

run-llama / llama_index