jackfirth / rebellion

A collection of core libraries for Racket
https://pkgs.racket-lang.org/package/rebellion
Apache License 2.0
80 stars 16 forks source link

Ideas for improving performance of transducers and reducers #351

Open jackfirth opened 4 years ago

jackfirth commented 4 years ago
jackfirth commented 4 years ago

cc @samdphillips and @rocketnia

jackfirth commented 4 years ago

Another idea: make variant a macro that statically expands into a direct call to the internal variant constructor when it's used in a first-order way. So (variant #:foo (some-expression)) would expand into (constructor:variant '#:foo (some-expression)). When used in a higher-order fashion, like with keyword-apply, then fall back to the normal function-like behavior.

This would reduce the need to go through the make-keyword-function slow path, and eliminate the need to check that only one keyword argument is given. Since variants are often constructed by transducers and reducers in tight loops, and the current constructor is never used in a higher-order fashion, that could save a significant chunk of time.

jackfirth commented 4 years ago

See also #287 and #301.

samdphillips commented 4 years ago

Here is a branch (in my fork) where I used no contract versions of rebellion/collection/list and rebellion/base/comparators from the sorting transducer.

I don't have the numbers handy, IIRC in the case of sorting contracts were using about 60% of the time. I'm planning on running the contract profiler again tomorrow on some test programs. Maybe seeing how something like (transduce (in-range 10000) (mapping values) #:into (into-for-each void)) performs.

samdphillips commented 4 years ago

EDIT: the headline of this should probably be Running time is 57.74% contracts

I tested this program with the contract profiler. The changes that I made in the above branch were against a sorting program and don't improve the performance of this baseline case of transducers, so there is still contracts that can be removed or at least talked about. Here are the profile results for this program.

#lang racket/base

(require rebellion/streaming/reducer
         rebellion/streaming/transducer)

(transduce (in-range 1000000)
           (mapping values)
           #:into
           (into-for-each void))
jackfirth commented 4 years ago

Yikes, checking the consumer contract of (mapping values) is 5 seconds on its own, and the emitter contract is another four seconds. And that check would be repeated for every transducer in the chain, so I suspect this:

(transduce (in-range 1000000)
           (mapping values)
           (mapping values)
           (mapping values)
           (mapping values)
           (mapping values)
           #:into
           (into-for-each void))

...would add at least another 40 seconds or so. Maybe we need transducer fusion?

jackfirth commented 4 years ago

Expanded on an idea for transducer fusion in #358.

samdphillips commented 4 years ago

Transducing (mapping values) five times:

Running time is 58.62% contracts
131420/224180 ms
jackfirth commented 4 years ago

My goodness

jackfirth commented 4 years ago

I've written a small microbenchmarking library called Atomichron. Hopefully we can use this to get a better sense of the performance problems in Rebellion.

jackfirth commented 4 years ago

I made an example Atomichron benchmark that measures the performance of (transduce <some vector> #:into into-list) compared to (vector->list <some vector>). For a vector with 1000 elements on my machine:

samdphillips commented 4 years ago

Linking performance benchmarks Jack has written here, so we can use them for comparisons.

https://gist.github.com/jackfirth/9596dd48536344b4b94c1938a36de4c9