Reduce copying and allocations

fintelia commented 11 months ago

This eliminate Reader::prev and adds special handling of unfiltering for the first row.

It has minimal impact overall on the QOI corpus. However, if limited to QOI corpus images under 64KB, the average decoding speed goes from ~~244 MP/s to 436 MP/s~~ on my machine. Edit: I think I messed up the benchmarks but I still see a modest perf improvement

cc: @anforowicz

anforowicz commented 11 months ago

it may be worth comparing all 3 approaches

Not sure if the data below helps or just stirs the pot unnecessarily :-) :

prev-row-copy-avoidance--double-long-buffer (see here)
- noncompressed-8x8.png: change in time: [-3.4699% -3.2960% -3.1177%] (p = 0.00 < 0.05)
- noncompressed-128x128.png: change in time: [-1.7766% -1.3460% -0.9357%] (p = 0.00 < 0.05)
prev-row-copy-avoidance--fintelia (see here)
- noncompressed-8x8.png: change in time: [-5.5093% -5.3152% -5.0900%] (p = 0.00 < 0.05)
- noncompressed-128x128.png: change in time: [-5.2180% -4.8080% -4.3969%] (p = 0.00 < 0.05)
prev-row-copy-avoidance--swap-prev-and-curr (see here)
- noncompressed-8x8.png: change in time: [+16.659% +16.859% +17.074%] (p = 0.00 < 0.05)
- noncompressed-128x128.png: change in time: [-17.995% -17.647% -17.318%] (p = 0.00 < 0.05)

I am very surprised by the wild changes from prev-row-copy-avoidance--swap-prev-and-curr.

I still worry about the cost of compaction (i.e. the calls to drain in Reader::next_raw_interlaced_row and ZlibStream::transfer_finished_data), but maybe we should just merge this PR and move on.

anforowicz commented 10 months ago

@fintelia, I just wanted to double-check that you've seen all of my comments above. (The performance comparison is probably safe to ignore, but I was wondering about my comments/suggestions about unit test coverage and about tweaking code comments that refer to the deleted/renamed scan_start field.)

fintelia commented 10 months ago

Somehow didn't see those comments or your changes on #420. Not exactly sure what happened, but I'll address them when I have a chance

image-rs / image-png

Reduce copying and allocations #422