mindee / tawazi

A DAG Scheduler library written in pure python
https://mindee.github.io/tawazi/
Apache License 2.0
83 stars 5 forks source link

[ENHANCEMENT] Max Priority Queue in graph scheduling. #226

Open Mattcrmx opened 6 days ago

Mattcrmx commented 6 days ago

While skimming over the codebase, I saw that we kept on re selecting the node with the highest priority while we could just maintain a max priority queue at the beginning and pop from it : it would both be cleaner and simpler, while being more optimized (O(1) pop), I can write a PR today, wdyt @bashirmindee ?

https://github.com/mindee/tawazi/blob/c0f0cc2b81cb44fcef13c2e53bac87ab4fd4a9a7/tawazi/_dag/helpers.py#L318

bashirmindee commented 6 days ago

what do you mean by that ? we are obliged to construct a new queue every time we complete the execution of a root node, because new runnable_xns_ids might be created.

Mattcrmx commented 6 days ago

My idea was not to maintain only one queue, but rather have a pre computed queue in the graph, and compute new queues with updated nodes by only traversing the queue (so some kind of filtered queue based on the availability), that way, whenever a new node gets available, we extend the current max queue with the new children (in python heapq implements a min/max queue natively) and we still can pop from it, it avoids re computing the max every time since we've already done it at the beginning, and we just pop from the newly constructed queue the max node we want to run. I think this would be most beneficial if we have a super flat DAG at some point (such as one when there is a lot of nodes that can be run at the same time). This actually emerged from the fact that we search for the max node in all runnable nodes even though we're only capable of running MAX_CONCURRENCY at the same time, so in itself it can already be a bottleneck

bashirmindee commented 6 days ago

I think this won't be beneficial and pretty complicated TBH. Remember that the newly created root nodes are dynamic after each iteration of the scheduler; hence the number of combinaisons for the queues that need to be constructed is huge and will give an OOM directly in some cases

Mattcrmx commented 6 days ago

Nope, my idea was to compute a queue at first and extract the sub queue each time without recomputing, I'll do a simple poc and update you on i, maybe it'll be clearer