Open vittorioromeo opened 7 years ago
Looking over the code, it appears as if each task in the DAG is submitted to the global MPMC queue where worker threads are continuously popping from the queue, and this causes deadlock since each task is waiting on other tasks stuck way at the top of the queue.
The simplest way to fix this is probably going to be allowing threads to work on their own tasks so that they can never stall. You can achieve this with a work-stealing style of task scheduling as opposed to this global queue. Doing that will also fix #21 since threads execute their own tasks as opposed to waiting.
Assumte that the thread pool has only 4 threads, and that there is a DAG path where 5 systems can be executed in parallel. The thread pool may deadlock as every system is waiting for its subtasks to be completed, but they cannot be enqueued as all threads are occupied.
This would probably be improved by making the DAG execution asynchronous (see #21), but it would be nice to detect this "deadlock" from the thread pool and insert additional temporary "virtual threads" that solve the situation.