This melds the serial-Tee and parallel-batched approaches from before
and after commit adea17e. Now we can get the same multithreaded speedup
without having to build the entire uncompressed tarball in memory first.
The new impl Write for RayonTee uses rayon::join to split the
compression work for each buffer to separate threads. This is scoped,
so it can be fully zero-copy, sharing the input buffer directly. This
is all wrapped in a 1 MiB BufWriter to balance the cost of thread
wake-ups and synchronization.
The net performance is unchanged, using around 125% CPU -- approximately
4:1 time spent in xz versus gz. The overall memory use is much reduced,
now independent of the tarball size -- just a few MiB on top of the
fixed-cost 674 MiB compressor memory requirements of xz -9.
This melds the serial-
Tee
and parallel-batched approaches from before and after commit adea17e. Now we can get the same multithreaded speedup without having to build the entire uncompressed tarball in memory first.The new
impl Write for RayonTee
usesrayon::join
to split the compression work for each buffer to separate threads. This is scoped, so it can be fully zero-copy, sharing the input buffer directly. This is all wrapped in a 1 MiBBufWriter
to balance the cost of thread wake-ups and synchronization.The net performance is unchanged, using around 125% CPU -- approximately 4:1 time spent in xz versus gz. The overall memory use is much reduced, now independent of the tarball size -- just a few MiB on top of the fixed-cost 674 MiB compressor memory requirements of
xz -9
.Fixes #75.