Less backpressure - Githubissues

Queue up to num_threads * 2 chunks for decompression instead of num_threads + 2. Lets the decoder build up a bigger backlog to make parallel decoding less sensitive to thread scheduling.

This comes at the cost of increasing the peak memory usage by num_threads * compressed_chunk_size.

Also correctly cap the number at the number of chunks to decode.

Improves a benchmark from #181:

Before:

test read_single_image_zips_rgba                      ... bench:  44,334,850 ns/iter (+/- 6,742,086)

After:

test read_single_image_zips_rgba                      ... bench:  41,789,371 ns/iter (+/- 6,471,339)

johannesvollmer / exrs

Less backpressure #184