roarin-roran / Sorting

0 stars 0 forks source link

Optimizations in Sorter_Adaptive #56

Closed sebawild closed 2 years ago

sebawild commented 2 years ago

this_block_runs is allocated for each merge; move the memory alloc out of the loop and void resizing the array inside the loop

sebawild commented 2 years ago

I've run this through a profiler to see where the time is really spent. The result is very telling :)

For cProfile.run("exp.run_bottom_up(1000000,1,2)",sort=1) we get

         94806557 function calls in 5.970 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 21980542    2.912    0.000    3.291    0.000 MergerIPQ_LoserTree.py:24(update_lowest_priority)
   999999    1.322    0.000    5.326    0.000 Merger_Adaptive.py:18(merge)
   999999    0.523    0.000    1.008    0.000 MergerIPQ_LoserTree.py:10(__init__)
 21980542    0.293    0.000    0.293    0.000 MergerIPQ_LoserTree.py:61(first_larger)
        1    0.189    0.189    5.622    5.622 Sorter_BottomUp.py:17(sort)
   999999    0.186    0.000    0.306    0.000 random.py:237(_randbelow_with_getrandbits)
 19980544    0.154    0.000    0.154    0.000 MergerIPQ_LoserTree.py:82(peek_at_lowest_priority_element)
  1404683    0.104    0.000    0.104    0.000 {method 'getrandbits' of 'Random' objects}
 11459904    0.086    0.000    0.086    0.000 MergerIPQ_LoserTree.py:76(swap_nodes)

Whereas for cProfile.run("exp.run_adaptive(1000000,1,2)",sort=1) we get

         83044272 function calls in 71.058 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1000136   43.117    0.000   43.117    0.000 {method 'pop' of 'list' objects}
   500068   20.754    0.000   20.754    0.000 {method 'insert' of 'list' objects}
 19979955    3.320    0.000    3.753    0.000 MergerIPQ_LoserTree.py:24(update_lowest_priority)
   500068    1.524    0.000    6.218    0.000 Merger_Adaptive.py:18(merge)
   500068    0.700    0.000    1.282    0.000 MergerIPQ_LoserTree.py:10(__init__)
 19979955    0.333    0.000    0.333    0.000 MergerIPQ_LoserTree.py:61(first_larger)
        1    0.271    0.271   70.705   70.705 Sorter_Adaptive.py:12(sort)
 18979819    0.209    0.000    0.209    0.000 MergerIPQ_LoserTree.py:82(peek_at_lowest_priority_element)
  1500205    0.129    0.000    0.129    0.000 {method 'append' of 'list' objects}

So there is an absurd amount of time spent in pop and insert of list objects as I suspected. But: Both bottom-up and adaptive use a list of ListSlices to call merge, so this is probably not the issue!

sebawild commented 2 years ago

The number of calls is giving us hints: There are n calls to pop, and n/2 calls to insert ...

sebawild commented 2 years ago

number of runs in this example is approx n/2!

sebawild commented 2 years ago

Oh my God, @roarin-roran ...

this_block_runs.append(runs.pop(block_number)) runs.insert(block_number, this_block_write_list_slice)

Guess what this is doing.

roarin-roran commented 2 years ago

something easily fixable, by the look of it

roarin-roran commented 2 years ago

half fixed - this block runs portion has been fixed. with runs of 1000, runs 4 times faster at optimal k and 230 times faster at k=4

roarin-roran commented 2 years ago

fully fixed. experimental evaluation showed the new version to outperform the old very substantially at n>= 10^5, where it also outperforms simple bottom up mergesort