run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
34.81k stars 4.92k forks source link

[Bug]: Query Pipeline with Async/Parallel Execution Broken #14746

Open redswimmer opened 1 month ago

redswimmer commented 1 month ago

Bug Description

I'm running into an issue with my query pipeline. I'm trying to run the join on the output of worker_llms (llms run in parallel) but the join always seems to run right after worker_query and before the worker_llms. When I visualize the DAG it looks correct.

It works through version 1.10.46 and breaks in version 1.10.47. Here is the Discord thread where I go over the issue with Logan.

image

Version

1.10.47

Steps to Reproduce

This LlamaIndex example.

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 1 month ago

The issue with the join operation running immediately after worker_query instead of after worker_llms in version 1.10.47 is likely due to changes introduced in the task scheduling or execution logic in this version. Specifically, version 1.10.47 enabled stepwise execution of query pipelines, which might have affected the order in which tasks are executed [1][2].

To address this, you can investigate the changes related to stepwise execution and how tasks are initialized, scheduled, and executed, particularly focusing on the join operation and its dependencies. Additionally, you can use the print_debug_input and print_debug_input_multi functions to print debug information about the inputs to the modules, which can help in understanding the execution order of tasks in the query pipeline [3].

For more details, you can refer to the changelog and the relevant code snippets in the repository.

To continue talking to Dosu, mention @dosu.

botschmiede commented 1 month ago

can confirm this. still broken.

michaelbluegenai commented 2 weeks ago

Any update on this one? Still seems to be broken as of 0.10.65