image-rs / jpeg-decoder

JPEG decoder written in Rust
Apache License 2.0
149 stars 87 forks source link

Improve color conversion performance #197

Closed wartmanm closed 2 years ago

wartmanm commented 2 years ago

These two commits pass component arrays directly to color conversion functions, and perform fixed-point math in ycbcr_to_rgb(). Together they enable LLVM to vectorize the conversion, cutting the time for the decode a 512x512 JPEG benchmark by about 50%.

The refactoring commit allocates more space for buffers, and the fixed-point commit causes some rounding errors compared to f32 math (not enough to fail any tests). If either of those limitations aren't acceptable, the individual commits each offer about a 5% speedup.

HeroicKatora commented 2 years ago

Afaik the standard recommends (or permits) an approximation itself, the yuv conversion for bt.601 is also specified with integer fractions link, see Table 2. You could check against the table for the nominator of tje 16.16 fixed point scheme but in theory the fixed-point scheme should be more accurate if anything, right?

HeroicKatora commented 2 years ago

There was some cleanup in master that had an overlap with the removal of clamp. I've taken the liberty to rebase it cleanly and force pushed to the branch.

wartmanm commented 2 years ago

Good point, I suppose approximation is unavoidable since converting into ycbcr is already lossy. I ran some experiments comparing against rational numbers, and found that f32 is essentially perfect - it doesn't always honor round-to-even but otherwise gets everything correct. 16.16 fixed point is slightly worse and rounds some values like 0.50054 down to 0. 12.20 fixed point is just as good as f32, although it doesn't always round in the same direction.