[Bug]: Query Pipeline with Async/Parallel Execution Broken

redswimmer commented 1 month ago

Bug Description

I'm running into an issue with my query pipeline. I'm trying to run the join on the output of worker_llms (llms run in parallel) but the join always seems to run right after worker_query and before the worker_llms. When I visualize the DAG it looks correct.

It works through version 1.10.46 and breaks in version 1.10.47. Here is the Discord thread where I go over the issue with Logan.

Version

1.10.47

Steps to Reproduce

This LlamaIndex example.

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 1 month ago

The issue with the join operation running immediately after worker_query instead of after worker_llms in version 1.10.47 is likely due to changes introduced in the task scheduling or execution logic in this version. Specifically, version 1.10.47 enabled stepwise execution of query pipelines, which might have affected the order in which tasks are executed ^[1]^[2].

To address this, you can investigate the changes related to stepwise execution and how tasks are initialized, scheduled, and executed, particularly focusing on the join operation and its dependencies. Additionally, you can use the print_debug_input and print_debug_input_multi functions to print debug information about the inputs to the modules, which can help in understanding the execution order of tasks in the query pipeline ^[3].

For more details, you can refer to the changelog and the relevant code snippets in the repository.

To continue talking to Dosu, mention @dosu.

botschmiede commented 1 month ago

can confirm this. still broken.

michaelbluegenai commented 2 weeks ago

Any update on this one? Still seems to be broken as of 0.10.65

run-llama / llama_index