Open jackfirth opened 4 years ago
cc @samdphillips and @rocketnia
Another idea: make variant
a macro that statically expands into a direct call to the internal variant constructor when it's used in a first-order way. So (variant #:foo (some-expression))
would expand into (constructor:variant '#:foo (some-expression))
. When used in a higher-order fashion, like with keyword-apply
, then fall back to the normal function-like behavior.
This would reduce the need to go through the make-keyword-function
slow path, and eliminate the need to check that only one keyword argument is given. Since variants are often constructed by transducers and reducers in tight loops, and the current constructor is never used in a higher-order fashion, that could save a significant chunk of time.
See also #287 and #301.
Here is a branch (in my fork) where I used no contract versions of rebellion/collection/list
and rebellion/base/comparators
from the sorting
transducer.
I don't have the numbers handy, IIRC in the case of sorting
contracts were using about 60% of the time. I'm planning on running the contract profiler again tomorrow on some test programs. Maybe seeing how something like (transduce (in-range 10000) (mapping values) #:into (into-for-each void))
performs.
EDIT: the headline of this should probably be Running time is 57.74% contracts
I tested this program with the contract profiler. The changes that I made in the above branch were against a sorting
program and don't improve the performance of this baseline case of transducers, so there is still contracts that can be removed or at least talked about. Here are the profile results for this program.
#lang racket/base
(require rebellion/streaming/reducer
rebellion/streaming/transducer)
(transduce (in-range 1000000)
(mapping values)
#:into
(into-for-each void))
Yikes, checking the consumer contract of (mapping values)
is 5 seconds on its own, and the emitter contract is another four seconds. And that check would be repeated for every transducer in the chain, so I suspect this:
(transduce (in-range 1000000)
(mapping values)
(mapping values)
(mapping values)
(mapping values)
(mapping values)
#:into
(into-for-each void))
...would add at least another 40 seconds or so. Maybe we need transducer fusion?
Expanded on an idea for transducer fusion in #358.
Transducing (mapping values)
five times:
Running time is 58.62% contracts
131420/224180 ms
My goodness
I've written a small microbenchmarking library called Atomichron. Hopefully we can use this to get a better sense of the performance problems in Rebellion.
I made an example Atomichron benchmark that measures the performance of (transduce <some vector> #:into into-list)
compared to (vector->list <some vector>)
. For a vector with 1000 elements on my machine:
vector->list
takes 30.1 microsecondsLinking performance benchmarks Jack has written here, so we can use them for comparisons.
https://gist.github.com/jackfirth/9596dd48536344b4b94c1938a36de4c9
Make variants intern themselves upon construction when they're wrapping values that are interned, like fixnums, symbols, booleans, keywords, and singletons. Could turn operations like
into-nth
andinto-sum
into zero-allocation operations (sometimes), and it will make all stateless transitions (i.e. returning(variant #:<kw> #f)
from a state transition handler) zero-allocation.Consider adding a way for users to force a variant to be interned, so that transducer/reducer implementations that use mutable state can be zero-allocation.
Add special case dispatching to
transduce
andreduce-all
when the input sequence is a list. Possibly add special case dispatching totransduce
when the#:into
argument isinto-list
.Use transducer composition to implement
transduce
, instead of repeatedly folding over the input sequence within-transduced
. This should remove a bunch of generic interface dynamic dispatch overhead.Add
no-contract
submodules and use them in various places.