lakwet / voracious_sort

Voracious radix sort
MIT License
58 stars 1 forks source link

Trait for (usize, usize) #12

Open vigna opened 1 year ago

vigna commented 1 year ago

Is it possible to use voracious sort to sort pairs (usize, usize)? I see the trait implemented for u128, so I guess it's possible. Am I right?

lakwet commented 1 year ago

Hello, sadly, it is not possible, because of the orphan rule I could implement the sort for (usize, usize), but it would mean I couldn't sort (i32, f32) for example, and the orphan rule prevents me from implementing the sort for a generic (T, U) tuple. What I recommend is to sort Struct { a: usize, b: usize } or u128, however, I haven't done a lot of (performance) tests with u128. Since it is a logical type (it is a primitive type in Rust but not for the machine) it is pretty slow, but still faster than other sorts. I don't know if you need a single thread or a multithreaded sort, if it is for single thread, I can add a "dedicated sort" in the folder with the same name. For multithreaded sort, it is a bit more complicated. MMhh actually, I should add a couple of dedicated single thread sort for tuple which cover most of the cases.

vigna commented 1 year ago

Oh we're totally ok with a struct. It doesn't have to be a tuple. It's just that the RadixKey trait is not really documented so we're a bit confused as what we should implement. If you can shows us an example with a struct containing two usize that would be great!

lakwet commented 1 year ago

Mmmmmh I just realized it is a bit more tricky than what I was thinking.

vigna commented 1 year ago

More tricky or impossible? :)

lakwet commented 1 year ago

Hello sorry for being so late, I completely forgot to do it. At first I wanted to do the generic case to sort (for example) an array of primitive type vec<vec<usize>> but it is more tricky than what I was expecting So I just did your need: You can find the code here: https://github.com/lakwet/voracious_sort/blob/c46c6b1d1df404a30b93a6895880415b5cfe0080/src/types/custom.rs#L343 (you can just copy paste the code from line 343 to the end of the file)

And the benchmark with correctness test (there is a horizontal scroll):

running 1 test
Number of iterations: 3
Number of threads: 4
With check: true
=== Test struct{usize, usize} ===   Trait Vora Uns      Trait Vora MT       Rust Uns        Rayon pll uns
Array size: 1000000
-- Unif       :29544us  18644ns (29.54ns)   7118us  4118ns  (7.12ns)    100826us    58221ns (100.83ns)  21880us 12634ns (21.88ns)
Array size: 10000000
-- Unif       :346935us 63342ns (34.69ns)   40382us 7373ns  (4.04ns)    1155803us   211045ns    (115.58ns)  240198us    43888ns (24.02ns)
Array size: 100000000
-- Unif       :2423071us    140773ns    (24.23ns)   503343us    29067ns (5.03ns)    13256390us  179951ns    (132.56ns)  2611723us   150974ns    (26.12ns)

this evening I have run tests on the mac provided by my work which has only 16Go ram, if you need, I can run benchmark on my computer with 96Go ram

I just realized that the mac has 8 cores/threads new results:


Number of iterations: 3
Number of threads: 8
With check: true
=== Test struct{usize, usize} ===   Trait Vora Uns or stable        Trait Vora MT       Rust Uns        Rayon pll uns
Array size: 1000000
-- Unif       :29356us  18401ns (29.36ns)   6850us  3965ns  (6.85ns)    98000us 56584ns (98.00ns)   20468us 11848ns (20.47ns)
Array size: 10000000
-- Unif       :345722us 63121ns (34.57ns)   30626us 5595ns  (3.06ns)    1140727us   208268ns    (114.07ns)  235722us    43205ns (23.57ns)
Array size: 100000000
-- Unif       :2304004us    133033ns    (23.04ns)   355460us    20526ns (3.55ns)    13017920us  107250ns    (130.18ns)  2418123us   139652ns    (24.18ns)
Array size: 200000000
-- Unif       :5182922us    213317ns    (25.91ns)   760724us    31068ns (3.80ns)    27351699us  375056ns    (136.76ns)  5539118us   226221ns    (27.70ns)
test tests::speed_sort::speed_test_structusizeusize ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 93 filtered out; finished in 219.35s```
vigna commented 1 year ago

Mmhhh you're smashing the two usize into a u128 and in my experience those are pretty slow but I'll try and report.

lakwet commented 1 year ago

I have implemented the partialOrd differently, and results for the rust sorts are better but still slower than the voracious sort https://github.com/lakwet/voracious_sort/blob/88255818bcad69958f8f3f3a34004c0ef8ed63be/src/types/custom.rs#L346


running 1 test
Number of iterations: 3
Number of threads: 8
With check: true
=== Test struct{usize, usize} ===   Trait Vora Uns or stable        Trait Vora MT       Rust Uns        Rayon pll uns
Array size: 1000000
-- Unif       :30398us  19047ns (30.40ns)   6750us  3934ns  (6.75ns)    51448us 29705ns (51.45ns)   11079us 6405ns  (11.08ns)
Array size: 10000000
-- Unif       :346347us 63236ns (34.63ns)   32367us 5958ns  (3.24ns)    583788us    106585ns    (58.38ns)   113973us    20813ns (11.40ns)
Array size: 100000000
-- Unif       :2388913us    137932ns    (23.89ns)   372664us    21517ns (3.73ns)    6462832us   373143ns    (64.63ns)   1228916us   71006ns (12.29ns)
Array size: 200000000
-- Unif       5329876us 218871ns    (26.65ns)   798115us    32612ns (3.99ns)    14090880us  390340ns    (70.45ns)   2582302us   105441ns    (12.91ns)
test tests::speed_sort::speed_test_structusizeusize ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 93 filtered out; finished in 138.40s```
lakwet commented 1 year ago

Mmhhh you're smashing the two usize into a u128 and in my experience those are pretty slow but I'll try and report.

yes indeed, since it is the key for the radix, it is done only "locally" I have to find time to do the generic case without casting into another type it was something that was already asked for a "long" time

vigna commented 5 months ago

OK, what about implementing voracious sort for arrays? That should be pretty easy—unlike pairs, you can just index the elements. I would really like to use this in the upcoming webgraph crate as we have to sort massively large sets of arcs from graphs but we're stuck with the fact that we have pairs of usize.

lakwet commented 5 months ago

Hello I don't have a lot of time, I will see what I can do. It is not pretty easy, because the code is optimized everywhere, and it is not just playing with level and index. It is not very complicated though. It just requires time.

vigna commented 5 months ago

Well that would immensely extend the usability of the crate. Or a Radixable trait as rdst does...

lakwet commented 5 months ago

I suppose you want that for the multithreaded version? I am looking into it right now, and it is not as easy as I thought. I will try to do it on a new branch, but not publish it for now, you will be able to use it with the branch.

lakwet commented 5 months ago

And I am doing that only for usize? (it will help me to have only 1 type to handle for now)

vigna commented 5 months ago

Ok, full disclosure, we're using rdst and we're pretty happy with it, so if it's a hassle don't do it...