Open mlochbaum opened 1 year ago
One thing I'd like to note that the 'piecewise-linear' case in glidesort is taken care of by run analysis and merging, and thus is not a deciding factor for the quicksort pivot selection. Instead I wanted to minimize the number of cache lines that the recursive method touches. That's why I don't do even spacing.
It is an interesting idea to change up the grouping though. I would venture to guess that 012 120 201
would perform the best here, which disperses one lo/mid/hi position across each group.
The script tests all combinations, so if 012 120 201
was the absolute best, it'd be in the table. I see that it's just a touch better than 000 111 222
, equal for 1- and 2-piece inputs and better on 3- and 7-piece ones with either set of positions.
A cache line is 64 bytes, so 16 elements in the 4-byte case, and the smallest gap is between 1 and 8 elements, so groups of 3 will usually fit in one line and adjacent groups might share one? That's better than I realized; I hadn't worked through all the * 8
s and / 8
s to figure out where it stopped. Still, once you get above that size the only reason to stick with the clumping is code simplicity, or is there something else? Note that a median of 3 with even spacing tests better than any of the clumpy 3-of-3 medians on ≤3-piece inputs, so more samples doesn't always help even if you can get them from the same cache line.
Yes, natural run detection catches exact piecewise-linear inputs (if the number of pieces is small enough, as it is here). I expect it's common to have inputs that follow the basic idea of "a value is close to neighboring values" without large natural runs: piecewise-linear with a little noise or a few swaps, or small ordered sections, say.
Still, once you get above that size the only reason to stick with the clumping is code simplicity, or is there something else?
Code simplicity translates to code size, and I'd like to minimize that where possible. There's definitely experiments to be had though.
It occurred to me to try a narrower distribution, so I changed the 0, 4, 7 offsets to 1, 4, 6. It's a major improvement! Overall this distribution is much closer to even, so it makes sense that the performance would be more similar. I'm not sure whether this is applicable to the recursive case, but a reason to think so is that the median of 3 seems to do best with an even or very slightly narrower distribution. A way to look at this is that the ends of the list are likely to be atypical, and for a median we don't want a representative picture of the whole list, just a best guess at what's typical. Which seems like a sound principle even outside the piecewise-linear framework.
Even distribution:
Partition | Diagram | 1 | 2 | 3 | 7 |
---|---|---|---|---|---|
True of 3 | ~147 | 0 | 14681 | 26447 | 77240 |
True of 9 | 012345678 | 0 | 1655 | 2918 | 8910 |
012102120 | 048/136/257 | 0 | 1655 | 3070 | 15718 |
012012012 | 036/147/258 | 0 | 1655 | 4274 | 16950 |
000111222 | 012/345/678 | 0 | 14681 | 19770 | 29456 |
011012220 | 038/124/567 | 35848 | 32760 | 30405 | 29658 |
Glidesort based on 1, 4, 6 (approx 0.148, 0.195, 0.227, 0.523, 0.57, 0.602, 0.773, 0.82, 0.852):
Partition | Diagram | 1 | 2 | 3 | 7 |
---|---|---|---|---|---|
True of 3 | ~147 | 0 | 14681 | 26447 | 77240 |
True of 9 | 012345678 | 14184 | 22257 | 27761 | 49349 |
001120221 | 015/238/467 | 1526 | 10364 | 18405 | 52902 |
012102120 | 048/136/257 | 14184 | 22257 | 27827 | 52224 |
012012012 | 036/147/258 | 14184 | 22257 | 28161 | 57044 |
000111222 | 012/345/678 | 14184 | 25705 | 32574 | 64075 |
000122112 | 012/367/458 | 30231 | 41024 | 49248 | 65698 |
Glidesort based on 1, 3.5, 6 (approx 0.148, 0.188, 0.227, 0.461, 0.5, 0.539, 0.773, 0.813, 0.852):
Partition | Diagram | 1 | 2 | 3 | 7 |
---|---|---|---|---|---|
True of 3 | ~147 | 0 | 14681 | 26447 | 77240 |
True of 9 | 012345678 | 0 | 11615 | 18112 | 44243 |
012102120 | 048/136/257 | 0 | 11615 | 18231 | 47409 |
012012012 | 036/147/258 | 0 | 11615 | 18504 | 51787 |
000111222 | 012/345/678 | 0 | 15424 | 23531 | 59706 |
000122112 | 012/367/458 | 4393 | 18546 | 25307 | 58149 |
With this last balanced distribution, which would change n8 * 4
to (n8 * 7) / 2
, the difference between 000111222 and other groupings (or true median for that matter) isn't that large. So this is something that can be an improvement without changing the order and potentially losing cache lines. And the tighter grouping does make cache line reuse slightly more likely.
Outer offsets of 1, 3.5, 6 combined with inner offsets of 0, 3.5, 7 do better still, so the better thing to do, going by this model, is try to evenly space the initial selection of three regions instead of putting the outer two at the edges. This kind of points to the fact that even spacing has a scaling factor of 3 instead of 8, so relative to scaling by 8 it keeps getting wider at each step.
Regarding the scheme as a whole, here's the dilemma I see. It does take position into account, as otherwise you'd just pick a bunch of elements from the start of the array and be done with it. The reason for this has to be concern about spatial locality, right? But both empirically and intuitively, I'm finding that the fractal recursion doesn't do much better than a median of 3 at handling locality (it will filter noise fine of course). Tight median-of-3 at the lowest level or two to compute faster and save cache makes sense to me; for higher levels my still-tentative thinking is that it's best to go with more even spacing, and reordering or larger medians if possible.
(I'll stop bothering you. Felt if I'm going to do this study I may as well share it...)
Regarding the scheme as a whole, here's the dilemma I see. It does take position into account, as otherwise you'd just pick a bunch of elements from the start of the array and be done with it. The reason for this has to be concern about spatial locality, right?
Actually my main justification for spreading out our pivot sample over the input array is distribution drift. The elements at the start of the array might not follow the same distribution as those somewhere else. This is rare in benchmarks but not necessarily so in real world data.
I fat fingered the close button, oops.
Talking about perf, I'll add that in my testing just median of 3 for small sizes < threshold e.g. 64 is not ideal for sorting integers. Getting a higher quality median pays of with a total runtime improvement of a couple percent. I recently simplified pivot selection for ipnsort https://github.com/Voultapher/sort-research-rs/commit/4c5a8925d29e232836dae3bf3340960f09a2bccc but I'm still not happy with it.
Random data I assume?
Yes, as generated by the rand
crate. But it's faster for other patterns too. My theory is that it allows it to avoid more ineffective partitions for small sizes, and more quickly switches to the specialized small-sort.
Have had this half-finished for a while; I'm writing up what I can even though there's more to investigate. CC @scandum, @Voultapher as I believe ipn's picked up glidesort's candidate approach.
I looked into how the choice of pivot candidates affects median accuracy using a simulation on random piecewise-linear inputs. These seem like they capture one type of order that might be expected in an input, and I don't have any other promising ideas for testing that.
I found that the accuracy depends not only on the way candidates are grouped into sets of 3 (what I initially wanted to test) but also the positions of those candidates (unexpected, but obvious in retrospect). So I measured both for an even distribution, and Glidesort's recursive selection, using midpoints of the nine intervals that come out of the recursion. I copied the arithmetic for this by hand; it looks right but it's possible I made a mistake here. I've also tested various positions not shown here. Error is measured as cost, assuming sorting each of the two partitions takes time proportional to n log(n). For context, the baseline cost is 8965784, so that a relatively high cost difference of 30000 adds 0.33% to total sorting cost due to that step (it'll compound at every partition). The tables below show cost for the best and worst groupings after testing every possibility, as well as some others:
The arrangement 011201220 that I've used and recommended previously does badly, often worse than 000111222.
Beyond any particular choice of grouping it doesn't seem like Glidesort's positions do well: the "true of 9" row is the true median of those and is worse than most pseudomedians with even spacing. The middle 4 makes Glidesort's positions skewed and prevents perfect performance on 1-piece (totally linear) inputs, but changing it to 3.5 isn't much of an improvement (in fact the true median is worse for 2-piece inputs). So I think it's mainly the clustering that weakens it, although I can't say exactly why the effect is so strong. Something that seems a little better is to use 3 as the midpoint for one or two intervals and 4 for the others.
I also did some theoretical analysis on the median-of-median-of-... idea. I find that 3^k candidates processed with recursive medians have about as much power as a true median of (π/2)*2.25^k pivots. For example the pseudomedian of 243 for a 32768-element list is worth a true median of 91. The probability distributions for the median value found this way with random candidates are equal at the exact midpoint, and the shapes seem similar (if anything the pseudomedian is wider). I can write the math up if you're interested.
Even spacing (0.5/9, ..., 8.5/9):
Glidesort arrangement (approx 0.008, 0.07, 0.117, 0.508, 0.57, 0.617, 0.883, 0.945, 0.992):
Source for this, run with CBQN. I can translate to some other language on request. The inputs are created with
•rand
which changes between runs, but the results above don't change significantly.