Open jackfirth opened 5 years ago
I haven't formally worked out the bounds on this but it has the feel of an O(n log(n)).
I have built a copy of the algorithm without all of the transducer machinery that runs faster.
1000 iterations, 1000 random integers (0-999), taking lowest 10.
transducer: 34260ms
bare bones: 11806ms (34% of transducer version)
The extra 20ish seconds seems split between contract checking, transducers, and sequence handling.
Edit: There is still some contract checking overhead in the bare bones version. I was able to shave off 4s by using cons
(no contract) instead of list-insert
. From the profile there is 5-6s more of contract overhead.
We could add a no-contract
submodule to rebellion/collection/list
then. That would make the sorting transducer contracts on list-insert
toggleable, and if we do it for some of the other modules too then we can investigate the relative costs of each set of contracts.
My thoughts exactly.
Summarizing from discussion elsewhere:
About a month ago, when I saw @jackfirth talking about lazy sorting, I started some work on a lazy merge sort using Racket's promises and streams. Over the past few days, I've polished that up just enough to paste it here: https://gist.github.com/rocketnia/5e319a54e4fd980ea5e26d024645628d
It's a stream-to-stream function, but since any sorting transducer will consume the entire input before emitting the first output, I bet a stream-to-stream sorting function can be adapted pretty trivially (using (batching into-list)
followed by append-mapping
) to make a sorting transducer.
I haven't tested this lazy merge sort yet, but assuming it works the way I expect, its worst-case cost is something like O(max(N, K log N))
time and comparisons to find the first K
of N
sorted elements. Breaking this down a little: The first O(N)
steps turn the given stream into a list of singleton streams, merge all the streams, and take the first element. Then it takes up to O(log N)
time to get each subsequent element from the merged stream.
A moment ago @jackfirth suggested I check out this issue, so I think in the next couple of days I'll try to hook this up to the benchmarks and see how it goes. 🙂
I looked into the performance of in-merge-sorted
.
First I verified I was able to run the modified benchmark as-is:
$ racket rebellion-sorting-benchmark.rkt benchmark
'#hash(
("Rebellion lazy sort (milliseconds)" .
#<stats: #:average 96247/1000 #:count 1000 #:max 275 #:min 86 #:sum 96247>)
("Standard racket sort (milliseconds)" .
#<stats: #:average 8907/1000 #:count 1000 #:max 23 #:min 8 #:sum 8907>))
(I've manually formatted the output a little for readability.)
Then I made this change to the "standard racket sort" to simulate how a Rebellion user might implement their own sorting
in terms of Racket's sort
if sorting
weren't provided by the library already:
(define (bottom-10/racket gems)
- (take (sort (sequence->list gems) < #:key gemstone-weight) 10))
+ (transduce gems
+ (transducer-pipe
+ (batching into-list)
+ (append-mapping
+ (λ (lst) (in-list (sort lst < #:key gemstone-weight)))))
+ (taking 10)
+ #:into into-list))
From this point of view, Rebellion's sorting
transducer is much better than nothing:
$ racket rebellion-sorting-benchmark.rkt benchmark
'#hash(
("Rebellion lazy sort (milliseconds)" .
#<stats: #:average 12202/125 #:count 1000 #:max 276 #:min 87 #:sum 97616>)
("Standard racket sort (milliseconds)" .
#<stats: #:average 25231/200 #:count 1000 #:max 172 #:min 120 #:sum 126155>))
(The in-list
call may be unnecessary, but whatever performance impact it has seems to be negligible.)
Then I added my lazy merge sort, in-merge-sorted
, as a contender:
(define (bottom-10/lazy-merge-sort gems)
(transduce gems
(transducer-pipe
(batching into-list)
(append-mapping
(λ (lst)
(in-merge-sorted
(λ (a b) (<= (gemstone-weight a) (gemstone-weight b)))
(in-list lst)))))
(taking 10)
#:into into-list))
This uncovered a bug in in-merge-sorted
, so I fixed that, and then these were my results:
$ racket rebellion-sorting-benchmark.rkt benchmark
'#hash(
("Lazy merge sort (milliseconds)" .
#<stats: #:average 125869/1000 #:count 1000 #:max 254 #:min 113 #:sum 125869>)
("Rebellion lazy sort (milliseconds)" .
#<stats: #:average 12564/125 #:count 1000 #:max 287 #:min 87 #:sum 100512>)
("Standard racket sort (milliseconds)" .
#<stats: #:average 131687/1000 #:count 1000 #:max 321 #:min 118 #:sum 131687>))
It looks like for this benchmark, in-merge-sorted
-as-a-transducer takes about as long as sort
-as-a-transducer, and Rebellion's sorting
is still in first place.
Here's a new Gist with the modifications I made to the benchmark file as well as the bug fix for in-merge-sorted
.
It looks like for this benchmark,
in-merge-sorted
-as-a-transducer takes about as long assort
-as-a-transducer, and Rebellion'ssorting
is still in first place.
Hmm, that's suspicious. I think transducer-pipe
, batching
, and append-mapping
are introducing a lot of overhead and skewing the results, rather than showing any difference between the sorting algorithms themselves.
Idea: you can determine the overhead of pipe/batching/appending by just timing (transducer-pipe (batching into-list) (append-mapping values))
.
I see what you're saying. It's too bad we can't subtract that from the sorting
time too; surely there's some unnecessary overhead in the interface there too.
If I add in this code:
(define (bottom-10/passthrough-wrapper gems)
(transduce gems
(transducer-pipe (batching into-list) (append-mapping values))
(taking 10)
#:into into-list))
(measure-runtime "Passthrough wrapper (milliseconds)"
(λ () (bottom-10/passthrough-wrapper gems)))
I get:
$ time rebellion-sorting-benchmark.rkt benchmark
'#hash(
("Lazy merge sort (milliseconds)" .
#<stats: #:average 60277/500 #:count 1000 #:max 177 #:min 112 #:sum 120554>)
("Passthrough wrapper (milliseconds)" .
#<stats: #:average 117137/1000 #:count 1000 #:max 279 #:min 109 #:sum 117137>)
("Rebellion lazy sort (milliseconds)" .
#<stats: #:average 96781/1000 #:count 1000 #:max 142 #:min 84 #:sum 96781>)
("Standard racket sort (milliseconds)" .
#<stats: #:average 15743/125 #:count 1000 #:max 292 #:min 117 #:sum 125944>))
Algorithm | Time - Wrapper Time (ms) |
---|---|
Lazy merge sort | 3417 |
Rebellion lazy sort | -20356 (if only it made sense to subtract) |
Standard racket sort | 5390 |
Using the
sorting
transducer is over an order of magnitude slower than using the built-insort
function inracket/base
, even when when thesorting
transducer is followed immediately by ataking
transducer. Relevant links:sorting
and(taking 10)
with using(take (sort (sequence->list seq) ...) 10)
.Even with generic sequences, transducer-based sorting should meet the following performance criteria:
(sorting)
is followed by(taking 10)
. The current implementation doesn't seem to be.