Open bhanson-techempower opened 2 months ago
Also not sure if I should open up another issue, but the example project also shows a bug with the trace viewer. When I open the link provided in the console only the errored cases show up. This is happening every time for me on that project, although I haven't noticed it in our actual applications.
There should be 10 rows here:
We have noticed other subtle bugs with the trace viewer. Sometimes the information displayed in the table format doesn't match the underlying case when you click on it. The same line run might be duplicated a few times and then we need to click into each row to find the actual case we're looking for.
My suspicion is both of these are related to other concurrency issues and not actually a problem with the trace viewer.
Hi @bhanson-techempower , thanks for reaching us and the detail reproduce function. We just investigated this problem, I'll explain more about this.
Conclusion at first
Root cause
load_flow
line, as flow path got wrongly resolved due to cwd changed and effect each other.Detail about the workaround
To resolve this problem To fully resolve this problem, we have to make each node run independently, for example in separate process, I'm afraid it will be a long-term work, I've add the 'long-term' tag and we'll keep this item open for anyone who meet the same problem, this item will be updated if we made changes to related part.
Describe the bug Prompt flow appears to modify the global working directory and workers in the line execution process pool can get spawned with different working directories.
How To Reproduce the bug I created an example project to reproduce the bug here: https://github.com/bhanson-techempower/promptflow-concurrency-bug
Some runs will succeed and others will randomly fail with:
Due to the worker for that particular node being spawned after the working directory has been changed.
With the example project the bug is reproduced for me almost every time.
In our production application we're seeing it about half the time when running a batch of 20 runs.
Expected behavior The flow executes successfully every time because prompt flow does not share extra global state between processes.
Running Information:
pf -v
:Additional context
We've worked around the issue by modifying the way we invoke the sub flows: