Open WhyNotHugo opened 2 years ago
The ugly part of tuples()
is trying to be variadic in length, which doesn't have any language support. We deal with this a little in the IntoParallelIterator
for tuples into a MultiZip
, which we manually (macro) implement up to length 12. Itertools
depends on a similar manual implementation of its own HomogeneousTuple
trait.
For your particular case with slice-able data, you could also use par_chunks_exact(4)
and par_chunks_exact_mut(3)
. You can still pattern-match that if you treat it as a refutable pattern within your for_each
, like if let ([b, g, r, _], [r2, g2, b2]) {...}
, even though that if let
will never fail in practice. Eventually it can be a truly irrefutable argument pattern with arrays, #789, but I've been hoping that the standard library would stabilize the regular Iterator
API for that first.
I'm re-encoding images from LE-XRBG to BE-RGB:
This can clearly be parallelised quite a bit,
FWIW, it's not necessarily true that this task will benefit from pixel-level parallelization, especially since there's little real CPU work, just data movement. If you have many images to convert, it may work better to parallelize that coarsely with each image working serially, but ultimately you're going to be limited by I/O and memory bandwidth.
I tend to agree with that last statement; it seems to be mostly memory IO limited (rather than CPU), so it might be a pointless optimisation. FWIW, the original code can also be re-written as:
let mut image_data = vec![0u8; mmap.len() * 3 / 4];
mmap.iter()
.tuples()
.zip(image_data.iter_mut().tuples())
.for_each(|((b, g, r, _), (r2, g2, b2))| (*r2, *g2, *b2) = (*r, *g, *b));
Clearly it's just copying data around. I'd still be curious to benchmark with parallelisation, but performance has improved to the point where this is no longer the slowest thing.
I couldn't get chunks
to work for this example, regrettably.
Shall I still leave this issue open for further discussion on .tuple()
?
I couldn't get chunks to work for this example, regrettably.
Here's a playground with chunks_exact
, par_chunks_exact
, and also using unstable standard library features for array_chunks
and as_chunks
.
Shall I still leave this issue open for further discussion on
.tuple()
?
Yes, we can leave that request open.
I see
tuples()
listed in Probably applicable but did not yet open issuesI'd find this very useful. I'm re-encoding images from LE-XRBG to BE-RGB:
This can clearly be parallelised quite a bit, but I don't see any obvious way of writing it without
tuples()
.