image-rs / jpeg-decoder

JPEG decoder written in Rust
Apache License 2.0
148 stars 87 forks source link

Eliminate some bounds checks in IDCT #167

Closed Shnatsel closed 3 years ago

Shnatsel commented 3 years ago

Convert a debug assert into an actual assert to use as an optimizer hint. Eliminates a bunch of bounds checks.

It reduces the number of assembly instructions, but I did not see any measurable performance improvement on my machine.

Instruction counts for jpeg_decoder::worker::immediate::ImmediateWorker::append_row_immediate where this gets inlined: before - 1420, after - 1363.

Top 10 instructions with counts, before:

    385 mov
    150 lea
     95 add
     86 imul
     85 cmp
     61 jmp
     50 sar
     49 sub
     36 call
     33 movzx

Instruction counts after:

    355 mov
    150 lea
     91 add
     86 imul
     83 cmp
     53 jmp
     50 sar
     48 sub
     35 call
     34 movzx
Shnatsel commented 3 years ago

In parallel mode this speeds up decoding from 320ms to 300ms, or at about 1%. No change when rayon is not enabled because the IDCT worker thread is not in the critical path then.

Tested on https://commons.wikimedia.org/wiki/File:Sun_over_Lake_Hawea,_New_Zealand.jpg

HeroicKatora commented 3 years ago

I wonder how complex it would be to funnel through the precise type &[i16; 64]?

HeroicKatora commented 3 years ago

Nevermind, that obviously requires bytemuck or probably const generics..

Shnatsel commented 3 years ago

A lot of precise types can and should be funnelled through in theory. That match on a usize is gross, instead of panicking here it should just be an enum with custom discriminants.

HeroicKatora commented 3 years ago

A lot of precise types can and should be funnelled through in theory. That match on a usize is gross, instead of panicking here it should just be an enum with custom discriminants.

There are probably a few other types that are not properly newtype wrapped or should be enums, yeah.