Queue up to num_threads * 2 chunks for decompression instead of num_threads + 2. Lets the decoder build up a bigger backlog to make parallel decoding less sensitive to thread scheduling.
This comes at the cost of increasing the peak memory usage by num_threads * compressed_chunk_size.
Also correctly cap the number at the number of chunks to decode.
Improves a benchmark from #181:
Before:
test read_single_image_zips_rgba ... bench: 44,334,850 ns/iter (+/- 6,742,086)
After:
test read_single_image_zips_rgba ... bench: 41,789,371 ns/iter (+/- 6,471,339)
Queue up to
num_threads * 2
chunks for decompression instead ofnum_threads + 2
. Lets the decoder build up a bigger backlog to make parallel decoding less sensitive to thread scheduling.This comes at the cost of increasing the peak memory usage by
num_threads * compressed_chunk_size
.Also correctly cap the number at the number of chunks to decode.
Improves a benchmark from #181:
Before:
After: