Static transducer fusion

These two transducer pipelines should have the exact same performance:

(transduce some-sequence
           (mapping f)
           (taking 20)
           (mapping g)
           (taking 5)
           (append-mapping h)
           (filtering p)
           (filtering q)
           #:into some-reducer)

(transduce some-sequence
           (taking 5)
           (append-mapping (compose1 h g f))
           (filtering (conjoin p q))
           #:into some-reducer)

That is, there should be rules for statically fusing transducers, similar to how for forms will specially recognize the various in-list and in-vector forms. Any function that accepts a chain of transducers, such as transduce and transducer-compose (#191), should be wrapped in a macro that looks for fusion opportunities. There are two primary performance benefits:

Dead transducer elimination. In the example above, the (taking 20) transducer is completely unnecessary because there's a (taking 5) transducer downstream, and the only transducers in between have no effect on the number of elements or their order. Transducer fusion should detect and eliminate dead transducers.
Fewer element exchanges between transducers. Each time a transducer consumes or emits a value, some contracts need to be checked and a variant need to be constructed to wrap the transducer's next state. Fusion can reduce this checking, especially in degenerate cases.

Alternatives considered

There could be some sort of protocol for dynamic transducer fusion, using generic interfaces. The upside is that this kind of code could still trigger fusion:

(define (adding x)
  (mapping (lambda (y) (+ x y))))

(transduce some-sequence (mapping f) (adding 10) (mapping g) #:into into-list)

However, the downsides are significant:

It's way more complicated to implement and maintain. This is the most important downside.
Dynamic dispatch is introduced and has to be used even when no fusion is possible, hurting performance in those cases.
Fused functions are composed dynamically, not statically, so their compositions can't be inlined away.

Note: should probably gather together references to prior art here. Stream fusion is a very widely studied problem:

Racket has its optimizing for forms, of course.
Haskell exposes a protocol for library authors to define rewrite rules for their functions, and this is used to implement stream fusion.
Java 8 streams implement stream operations as default methods on the Stream<T> interface and allow stream implementations to override those methods. The stream returned by stream.distinct() could be a DistinctStream<T> that overrides the distinct() method to be a no-op, so stream.distinct().distinct() doesn't try to dedupe the stream twice. (I have no idea if that's what actually happens, but that's the general idea.)
RxJava exposes a protocol for dynamic fusion between reactive stream operators. The dynamic indirection overhead matters a lot less here because reactive streams are usually IO-bound, not CPU-bound.
Countless other cases, I'm sure.

jackfirth / rebellion

Static transducer fusion #358

Alternatives considered