Open roarin-roran opened 2 years ago
update: bug is caused by the handling of trivial input.
What is a trivial input here? Try to make things concrete and specific, like "this call blah.sort([1,2,3,4],k=2)
invokes merge with k=3" or something like that
oh, there's a specific function which handles trivial input, creatively named _handle_trivial_input
. it's intended to handle the case where the input to a recursive call is a prefix run, a suffix run, and a middle section shorter than k-2.
the line at fault is probably line 57 of Sorter_Peeksort
, which uses: elif last_run_start - first_run_end <= self.k
as the condition, missing the -2. seems easy to fix - but adding the -2 here breaks the code for some reason, and this needs investigating.
I expect that this can be handled with extra robustness without slowing anything down, more special-casing, or any new features - but it will also probably be fixed by handling trivial input with insertion sort as is currently planned.
my plan is to fix this by implementing insertion sort for small inputs to replace the current trivial input handling (which was always a placeholder) - what do you think about that? should we discuss this properly sometime?
Insertionsort is fine if the number of elements ($n$) is small, but should exclude the prefix and suffix runs
Seems quite clear that the missing -2 is causing calls with too large k.
I think you want to check, whether the wrong call happens at the base level or also when there are further recursive calls. That will guide us to the point in the code to look at.
seems like I need to fix this, but fixing it will solve the problem (using insertion sort rather than straight merge seems like a decent policy, but not one that'll fix this).
will a question asking what k is slow things appreciably? k being static
Don't think so
it seems that peeksort uses k=3 sometimes when asked to use k=2, causing errors with the new merger.
investigate and fix this bug
@sebawild - should a testing package be added to make sure that the correct k is used? seems pretty hard to test for, especially as lower k values should be freely used.