Closed iamthad closed 2 months ago
Thank you for your contribution. We will take this under advisement. If there is a particular use case in which this condition causes performance issues, please let us know. We may incorporate this update in a future patch, time permitting.
I observed this issue causing substantial performance issues a couple of years back, in a sim with a huge number of scheduled jobs (tens of thousands, if I remember correctly) due to dynamically-created sim objects. Here is a portion of a profiling result:
# Overhead Command Shared Object Symbol
# ........ ............... ................................................. .........................................
#
60.17% S_main_Linux__x S_main_Linux__x86_64.exe [.] Trick::ScheduledJobQueue::push
11.52% S_main_Linux__x S_main_Linux__x86_64.exe [.] Trick::ScheduledJobQueue::find_next_j
3.23% S_main_Linux__x S_main_Linux__x86_64.exe [.] (proprietary function name redacted)
2.81% S_main_Linux__x S_main_Linux__x86_64.exe [.] Trick::JobData::call
2.22% S_main_Linux__x S_main_Linux__x86_64.exe [.] Trick::DataRecordGroup::call_function
2.21% S_main_Linux__x libc-2.27.so [.] __memset_avx2_unaligned_erms
1.03% S_main_Linux__x S_main_Linux__x86_64.exe [.] Trick::Executive::loop_single_thread
When I noticed this, I introduced the first version of the patch I submitted as #1694 (using lower_bound
), and it made a huge difference. We have been using that patch internally since then; the issue with preserving insertion order that I resolved by switching to upper_bound
was not a problem for us.
I recently obtained permission to submit the patch here in hopes that others would benefit from it.
Ok, I asked for an example and you gave us one. Hard to argue with that! We'll review the PR and merge it once we're satisfied it works as intended and credit you with the change.
OK, sounds good. I updated the source branch to tweak formatting and to update the function's comment to reflect the new behavior.
Trick::ScheduledJobQueue::push
scans linearly through a sorted list and copies items one by one.This takes O(n) time and can be a significant bottleneck when there are many jobs.
The scan can be replaced with
std::upper_bound
and a suitable comparison function, making the complexity O(log n).It also always allocates a new array of
JobData*
, so the entire array is always copied. This could be avoided by usingrealloc
andmemmove
, resulting in fewer copies of larger chunks of memory.EDIT:
Trick::ScheduledJobQueue::remove
has the same issue, but it's less impactful because removing jobs is less common than creating them.EDIT 2: It actually has to be
std::upper_bound
instead ofstd::lower_bound
in order to preserve insertion order.