jgamble / Algorithm-Networksort

Release history of Algorithm-Networksort
https://metacpan.org/release/Algorithm-Networksort
Other
13 stars 4 forks source link

Best size and best depth sorting networks #8

Closed Morwenn closed 8 years ago

Morwenn commented 8 years ago

The "Best" algorithm could be split into two different categories: "Best size networks" (the current "Best") and "Best depth networks". While these networks are more or less the same for the small sorting networks, the best size-optimal and depth-optimal networks that we currently know for sizes > 16 are not always the same. For example, the paper Sorting Networks: to the End and Back Again by Codish & al. presents new sorting networks for sizes 17, 18, 19 and 20 whose depth is better than the ones we knew (and prove the resulting sorting network 17 to actually be depth-optimal).

Size-optimality and depth-optimality are interesting in different contexts, so providing both could be interesting for users, depending on what they need their sorting networks for.

jgamble commented 8 years ago

Yes, I've been musing on the equivalent issue of same-sized but differently arranged networks. The coming change to an object-oriented network will help, although I'm not certain how they should be organized.

I think a separate child module of "Best", with a way of organizing them, may be the way to go, but I am open to suggestions.

Morwenn commented 8 years ago

I am pretty sure that there is some value in having separate best size and best depth submodules, but I fear I can't help much otherwise: I never used Perl (I mostly used the library as some kind of knowledge base) and thus am not familiar with what's good pratice and what's not, let alone designing smart libraries :)

jgamble commented 8 years ago

With the new Best.pm module added (from #7), there's now at least a framework to use to add new 'best' networks. The code is ... minimal, and the documentation even sparer than that (cleaning that up will happen in #6). But at least now there is a module to add new networks to. Which I will do for this issue.

jgamble commented 8 years ago

So, despite my previous comment about documentation clean-up happening in #6, I've done some documentation clean-up here. For the most part it concerns better references to the papers that dealt with the discovered networks. I also need to credit you. Do you have a full name to use for this, or do you prefer Morwenn? And would you like a link to your githup page in the credits?

cc @hoytech : You too could get a better ref. Do want me to link to your github or CPAN page?

Morwenn commented 8 years ago

Just Morwenn is enough :)

On a side note, I have original sorting networks of size 23 & 24 in a library of mine, whose size correspond to the best known sizes, even though networks with the same size and a smaller depth exist. If I'm not mistaken they are different from the already existing ones, so you might want to have a look.

jgamble commented 8 years ago

Nice. I'll add them. Thank you.

hoytech commented 8 years ago

Hi John, thanks for the offer!

It might be useful to some readers to check out my sorting networks presentation:

http://hoytech.github.io/sorting-networks/

It's a pretty basic introduction to sorting networks and their applications, and also is a good demo for Algorithm::Networksort's SVG output.

Also it might help people to know that I have a complementary distribution to A::NS:

https://metacpan.org/pod/algorithm-networksort-chooser

It's a command-line utility that lets you experiment with the different networks, including by finding the optimal network in terms of comparators or stages. It also can prune networks down to find median/selection networks.

I agree with @Morwenn -- I've always considered "Best" to be ambiguous since there are multiple, sometimes conflicting, attributes of networks that can be desired.

In addition to the numbers of comparators and stages (or depth) I can think of at least one other possibility for "Best": average number swaps required assuming all inputs are equally likely. Even two networks with the same comparators and stages can have different expected numbers of swaps as illustrated here:

http://hoytech.github.io/sorting-networks/#slide29 http://hoytech.github.io/sorting-networks/#slide30

I think it would be cool for A::NS to have some kind of optimiser functionality where you could specify what it is important for your network and it would crunch the numbers to give you the truly best it can come up with. My algorithm-networksort-chooser command-line utility does this in a very naive brute-force way, but maybe there are better approaches?

hoytech commented 8 years ago

I'm just reading the links @Morwenn pasted now, including "mountain sort" -- very interesting stuff. :)

jgamble commented 8 years ago

Yeah, right now I've got two bases to use: the absolute number of comparators, and the depth of the network. I'm trying to justify the 'best' name there on those bases, but I may just toss it out in favor of something else when the OO version is released.

Average number of swaps will be in the (forthcoming) statistics method (when I OO-ify the module), but I think that should be a dynamic property dependent on the sample inputs provided by the user.

Morwenn commented 8 years ago

@hoytech Thanks :)

The problem once we have three criterions for « optimality » is to decided which is the second best criterion. For which, if we want a size-optimal network and have several networks with the best size, should we return the one with the lowest depth or the one with the lowest average number of swaps?

hoytech commented 8 years ago

Very true. Maybe there could be some sort of way to prioritise or rank what is important for your use-case?

But I see your point: By adding more criteria we're setting up a pretty hairy Pareto optimisation problem...

I'm looking forward to seeing John's approach with statistics and OOification. Maybe it will be possible for users to create sub-classes with their own custom optimisation strategies.

Morwenn commented 8 years ago

@hoytech By the way, do you know whether given two networks used to sort the same number of elements, if one of these networks is better than the other one with regard to swaps and bitstring, is it also guaranteed to be better with regard to swaps and general permutation. Basically, if a netwwork is better on your page 29, is it guaranteed to be better on your page 30?

hoytech commented 8 years ago

Really great question... I thought about this a bit when I made the presentation. I believe it's true but I haven't proved it yet.

Being more of a hacker/coder than a mathematician, I made a simple program to try to find a counter-example, but so far it holds up:

https://gist.github.com/hoytech/3bf028e29b51edb14fcb

For the proof (and this is very hand-wavy) I think you could start with the fact that the average swaps of permutations will always be larger (or equal when n=2) than with bitstrings. The part I'm stuck on is quantifying exactly how much larger... If we knew that then we could compare it to the difference between n! and 2^n...

I'd be very interested if you or anyone else has more ideas on this!

Morwenn commented 8 years ago

Unfortunately, I'm also more of a coder myself. I like to gather algorithms, proofs, etc... and to build complete tools with a nice interface above them, but I am generally aweful when it comes to mathematics. I think looking at the proof used to demonstrate that zero-one tests are enough to validate a sorting network could give some ideas, but that's pretty much it.

hoytech commented 8 years ago

I glued the average swaps computation code into my chooser utility, and it's interesting to note that the 'batcher' merge exchange algo seems to always make the best nets by this metric, even beating the "best" networks:

$ algorithm-networksort-chooser 9 --all --opt=swaps
network pairwise returned empty comparator list, skipping
Network size: 9
Network type: Sorting network
Optimisation criteria: swaps
Optimal network:
  Algorithm "batcher":
    Comparators: 26
    Stages: 8
    Avg swaps: 2.91796875
Additional candidate networks:
  Algorithm "hibbard":
    Comparators: 27
    Stages: 12
    Avg swaps: 4.201171875
  Algorithm "bosenelson":
    Comparators: 27
    Stages: 11
    Avg swaps: 4.416015625
  Algorithm "best":
    Comparators: 25
    Stages: 9
    Avg swaps: 4.60546875
  Algorithm "oddevenmerge":
    Comparators: 28
    Stages: 9
    Avg swaps: 4.630859375
  Algorithm "bitonic":
    Comparators: 28
    Stages: 8
    Avg swaps: 5.787109375
  Algorithm "oddeventransposition":
    Comparators: 36
    Stages: 9
    Avg swaps: 9
  Algorithm "bubble":
    Comparators: 36
    Stages: 15
    Avg swaps: 9

There is also a --swap-mode option (valid values are zero-one and permutation). In permutation mode batcher seems to always win too, although I did notice one interesting thing for n=8:

When counting swaps in zero-one mode (the default), hibbard is slightly better than oddevenmerge:

$ algorithm-networksort-chooser 8 --all --opt=swaps --algorithms=hibbard,oddevenmerge
Network size: 8
Network type: Sorting network
Optimisation criteria: swaps
Optimal network:
  Algorithm "hibbard":
    Comparators: 19
    Stages: 7
    Avg swaps: 3.703125
Additional candidate networks:
  Algorithm "oddevenmerge":
    Comparators: 19
    Stages: 6
    Avg swaps: 3.828125

However when counting by permutations, they are equivalent:

$ algorithm-networksort-chooser 8 --all --opt=swaps --algorithms=hibbard,oddevenmerge --swap-mode permutation
Network size: 8
Network type: Sorting network
Optimisation criteria: swaps
Optimal network:
  Algorithm "oddevenmerge":
    Comparators: 19
    Stages: 6
    Avg swaps: 10.647619047619
Additional candidate networks:
  Algorithm "hibbard":
    Comparators: 19
    Stages: 7
    Avg swaps: 10.647619047619
Morwenn commented 8 years ago

Some guy generated every valid size-optimal sorting network to sort 3 to 6 values; you can find the networks in an answer on StackOverflow (they are 1-based instead of 0-based though). I wrote a small script to find all the networks that were also size-optimal with regard to bitstrings and got the following networks:

2 inputs
  [[1, 2]]

3 inputs
  [[0, 2], [0, 1], [1, 2]]
  [[0, 2], [1, 2], [0, 1]]

4 inputs
  [[1, 3], [2, 4], [1, 2], [3, 4], [2, 3]]
  [[1, 4], [2, 3], [1, 2], [3, 4], [2, 3]]

5 inputs
  [[1, 4], [2, 5], [1, 3], [2, 4], [1, 2], [3, 5], [2, 3], [4, 5], [3, 4]]
  [[1, 4], [2, 5], [2, 4], [3, 5], [1, 3], [4, 5], [1, 2], [3, 4], [2, 3]]

6 inputs
  [[1, 4], [2, 6], [3, 5], [1, 3], [2, 5], [4, 6], [1, 2], [3, 4], [5, 6], [2, 3], [4, 5], [3, 4]]
  [[1, 5], [2, 4], [3, 6], [1, 3], [2, 5], [4, 6], [1, 2], [3, 4], [5, 6], [2, 3], [4, 5], [3, 4]]
  [[1, 5], [2, 6], [3, 4], [1, 3], [2, 5], [4, 6], [1, 2], [3, 4], [5, 6], [2, 3], [4, 5], [3, 4]]
  [[1, 6], [2, 4], [3, 5], [1, 3], [2, 5], [4, 6], [1, 2], [3, 4], [5, 6], [2, 3], [4, 5], [3, 4]]

I checked whether the results whether there were differences between these bitstrings-swaps-optimal network when tested for swaps-optimality with permutations instead of bitstrings, but the results were exactly the same, so... still no counter-example.

After other checks, it appears that every sorting network above is size-optimal, swaps-optimal (no matter how you define it) and depth-optimal (according to the optimal depths found on Wikipedia).

jgamble commented 8 years ago

Couple of questions before I close this (I'll open some other issues based on the conversation here):

Does anyone know if Sherenaz Waleed Al-Haj Baddar's surname is "Baddar" or "Al-Haj Baddar"? I have this horrible feeling that I'm misspelling her name for the network name.

@Morwenn : I assume your answer about Morwenn being fine refers to the name credit, but I'm left uncertain about a github link. Should I provide one?

Thank you all.

Morwenn commented 8 years ago

Some papers are credited as « S. Al-Haj Baddar » but none are credited as « Baddar », so I guess that the full last name is « Al-Haj Baddar ».

Also, yes: my answer was about the name credit. I don't mind whether you put a link or not to my GitHub profile. I guess that it could possibly help people to find a bit more resources about sorting networks even though most of the interesting ones are already available here. Just do as you want :)

jgamble commented 8 years ago

Merged. Other items mentioned in this issue will get their own issue.