Closed sebawild closed 2 years ago
I've run this through a profiler to see where the time is really spent. The result is very telling :)
For cProfile.run("exp.run_bottom_up(1000000,1,2)",sort=1)
we get
94806557 function calls in 5.970 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
21980542 2.912 0.000 3.291 0.000 MergerIPQ_LoserTree.py:24(update_lowest_priority)
999999 1.322 0.000 5.326 0.000 Merger_Adaptive.py:18(merge)
999999 0.523 0.000 1.008 0.000 MergerIPQ_LoserTree.py:10(__init__)
21980542 0.293 0.000 0.293 0.000 MergerIPQ_LoserTree.py:61(first_larger)
1 0.189 0.189 5.622 5.622 Sorter_BottomUp.py:17(sort)
999999 0.186 0.000 0.306 0.000 random.py:237(_randbelow_with_getrandbits)
19980544 0.154 0.000 0.154 0.000 MergerIPQ_LoserTree.py:82(peek_at_lowest_priority_element)
1404683 0.104 0.000 0.104 0.000 {method 'getrandbits' of 'Random' objects}
11459904 0.086 0.000 0.086 0.000 MergerIPQ_LoserTree.py:76(swap_nodes)
Whereas for cProfile.run("exp.run_adaptive(1000000,1,2)",sort=1)
we get
83044272 function calls in 71.058 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1000136 43.117 0.000 43.117 0.000 {method 'pop' of 'list' objects}
500068 20.754 0.000 20.754 0.000 {method 'insert' of 'list' objects}
19979955 3.320 0.000 3.753 0.000 MergerIPQ_LoserTree.py:24(update_lowest_priority)
500068 1.524 0.000 6.218 0.000 Merger_Adaptive.py:18(merge)
500068 0.700 0.000 1.282 0.000 MergerIPQ_LoserTree.py:10(__init__)
19979955 0.333 0.000 0.333 0.000 MergerIPQ_LoserTree.py:61(first_larger)
1 0.271 0.271 70.705 70.705 Sorter_Adaptive.py:12(sort)
18979819 0.209 0.000 0.209 0.000 MergerIPQ_LoserTree.py:82(peek_at_lowest_priority_element)
1500205 0.129 0.000 0.129 0.000 {method 'append' of 'list' objects}
So there is an absurd amount of time spent in pop and insert of list objects as I suspected.
But: Both bottom-up and adaptive use a list of ListSlice
s to call merge, so this is probably not the issue!
The number of calls is giving us hints: There are n calls to pop, and n/2 calls to insert ...
number of runs in this example is approx n/2!
Oh my God, @roarin-roran ...
this_block_runs.append(runs.pop(block_number))
runs.insert(block_number, this_block_write_list_slice)
Guess what this is doing.
something easily fixable, by the look of it
half fixed - this block runs portion has been fixed. with runs of 1000, runs 4 times faster at optimal k and 230 times faster at k=4
fully fixed. experimental evaluation showed the new version to outperform the old very substantially at n>= 10^5, where it also outperforms simple bottom up mergesort
this_block_runs
is allocated for each merge; move the memory alloc out of the loop and void resizing the array inside the loop