Open gavv opened 4 years ago
Hi, it seems like all the variants of priority queue has O(log n) delete operation(pop) and some of the variants has O(1) insert operation. So, does this requires O(1) delete operation?
@Ewaolx Indeed, thanks for noticing. According to this thread, when choosing between various heaps and trees, we'll have to pay O(log N) either for insert() or for pop(), which makes sense.
In our case, ctl::TaskQueue is used to schedule background low-priority work from real-time threads. Hence, if we should choose between faster insert() or faster pop(), insert() is preferred, since it's usually called when a task is scheduled from real-time thread, and pop() is usually called from TaskQueue background thread.
Sounds good. In this case, we may have something like Fibonnaci heap which provides us insert in O(1), and O(log N) for pop operation. Also, I see currently sorted linked list is being used for core:List in ctl::TaskQueue, so that would be replaced by heap and consequently we will have heap insert and delete instead of linked list's, right?
In this case, we may have something like Fibonnaci heap which provides us insert in O(1), and O(log N) for pop operation.
Looks like a good option! Are you going to work on this?
Except theoretical big-O complexity, we should also ensure that the actual task insertion performance wont hurt in the good case, where there are not much pending tasks. So we'll need a small benchmark for TaskQueue schedule.
Quick look at the Fibonacci heap algorithm suggest that it should be okay.
Also, I see currently sorted linked list is being used for core:List in ctl::TaskQueue, so that would be replaced by heap and consequently we will have heap insert and delete instead of linked list's, right?
Yes. IIRC, we'll need to replace three methods: fetch_sleepingtask, insert_sleepingtask, remove_sleepingtask.
Great! Develop branch is quite ahead of master and the guidelines were changed. They're not published on the web site (it's built from master), but you can find them in git:
something like Fibonnaci heap
I don't recommend using this algorithm, it's good in theory, but in practice there is a huge constant. I advise you to try a regular binary heap, and only if the performance turns out to be insufficient, you should look for optimizations
@MBkkt thanks for the notice!
I haven't work much with heaps by myself, but quick googling and even Wikipedia article on Fibonacci heap confirms your point.
I feel that Binary heap isn't very well for us because it has O(log n) insertion, while in our specific case insertion speed is the most critical, since tasks are inserted from latency-sensitive threads.
Wikipedia and this page also suggest Pairing heap as a better (faster in practice) alternative to Fibonacci heap. Actually even theoretically it seems better for our case because Pairing heap insertion seems to be O(1) even in the worst case, while Fibonacci heap insertion is only amortized O(1). The implementation of the Paring heap also looks simpler, arguably.
@Ewaolx what do you think? How far did you get on the Fibonacci heap implementation?
I understand the concern, but what is the order of the number n? Are you sure the insert will be the bottleneck? In general, given that you want an intrusive queue, probably a binary heap is really not suitable.
@gavv Sorry for late reply, but as you mentioned I also think if we want strictly O(1) amortized for insertion then binary heap would not work. Also, pairing heap sounds like a good option if we are concerned about the Fibonnaci's practical performance. We can go ahead with pairing heap in this case.
Let me know what do you think. Thanks.
I understand the concern, but what is the order of the number n? Are you sure the insert will be the bottleneck? In general, given that you want an intrusive queue, probably a binary heap is really not suitable.
Usually the number of tasks should be low. However, the key point here is that it can grow uncontrollably during load peaks, and at the same time we don't want latency to grow uncontrollably.
We don't optimize every single piece of code, but try to follow design principles which would keep us from ending up with code that we're unable to optimize at all.
One of such principles chosen in this project is that delays in one thread (no matter of their reason) should not cause delays in other threads. Such design decouples threads in terms of latency and delays, so when it comes to performance problems, you can review threads in isolation, which is much much simpler.
TaskQueue is used for communication between pipeline and control threads. If it's not O(1), then delays in control thread will lead to accumulating tasks, thus causing longer insertion times, thus probably causing delays in pipeline threads. And probably not, who knows. You'll have to profile it each time you're debugging glitches.
So it's very attractive just to use O(1) data structure and be 100% sure that TaskQueue insertion always takes fixed time and control thread is guaranteed not to affect pipeline thread. Then, if you hear glitches, you can just profile pipeline thread in isolation.
As for the specific numbers, as mentioned in the issue, it would be nice to add benchmarks and measure insertion times on different queue sizes.
@gavv Sorry for late reply, but as you mentioned I also think if we want strictly O(1) amortized for insertion then binary heap would not work. Also, pairing heap sounds like a good option if we are concerned about the Fibonnaci's practical performance. We can go ahead with pairing heap in this case.
Let me know what do you think. Thanks.
Yeah, let's try a pairing heap then.
Hey @Ewaolx are you working on pairing heap implementation ?
Hello @gavv , this seems inactive. Mind if I try as well?
Mind if I try as well?
Hi, sure.
Hi, I was interested in working on it and wanted to give it a try.. has it already been done ?
Hi @divyavit , still in progress.
Problem
ctl::TaskQueue is a thread-safe task queue, allowing to schedule task for immediate or delayed execution.
Tasks for immediate execution are stored in a lock-free FIFO. Tasks for delayed execution are stored in a sorted linked list. Thus, while we have O(1) pop, the insert operation into a sorted list is O(N), which does not scale to big numbers of tasks.
The typical data structure used in such cases is a priority queue implemented using some heap variation.
Solution
Requirements
Requirements to the priority queue implementation:
It should be intrusive, i.e. it should not perform allocations by itself, but instead should use data embedded into nodes. The same approach is used for core::List, core::MpscQueue, and core::SharedPtr. The limitation of such approach is that a node can be added only to one container, but this is okay for us.
It should have O(1) top/pop.
Ideally, it should have amortized O(1) insert, but O(log n) is also acceptable.
It should have O(1) membership check (check whether the element belongs to the queue). This can be implemented by just storing a pointer to the containing queue in each node, like we do for core::List.
Whenever possible, the implementation should be compact and readable.
It should NOT be thread-safe. Access to it is already serialized by ctl::TaskQueue.
Notes
Other TaskQueue optimizations:
362
380