I found some places for improvement based on clippy and experimentation. The benchmarks seemed a bit noisy from run to run. Overall, I believe I had more small performance gains than regressions on my computer but that may not replicate.
decoder.rs
Refactor spectral selection difference magnitude to remove error check from non-zero match arm, better matches table F.1 on page 89 of the document, 93 of the PDF
Remove unnecessary closure
Return from clamp_to_u8 by expression instead of let value
huffman.rs
Avoid collecting in huffsize fold
idct.rs
Remove return
parser.rs
Use mutable iterator instead of for loop
upsampler.rs
Use copy_from_slice instead of for loop
decode a 512x512 JPEG
time: [4.1549 ms 4.1638 ms 4.1733 ms]
change: [-4.6042% -4.1612% -3.7614%] (p = 0.00 < 0.05)
Performance has improved.
decode a 512x512 progressive JPEG
time: [7.6885 ms 7.7178 ms 7.7545 ms]
change: [-0.7646% -0.2729% +0.3227%] (p = 0.34 > 0.05)
No change in performance detected.
Benchmarking decode a 512x512 grayscale JPEG
time: [1.1805 ms 1.1842 ms 1.1880 ms]
change: [+0.4658% +0.9305% +1.3867%] (p = 0.00 < 0.05)
Change within noise threshold.
Benchmarking extract metadata from an imag
time: [1.3026 us 1.3054 us 1.3085 us]
change: [-4.5689% -4.2494% -3.8972%] (p = 0.00 < 0.05)
Performance has improved.
I found some places for improvement based on clippy and experimentation. The benchmarks seemed a bit noisy from run to run. Overall, I believe I had more small performance gains than regressions on my computer but that may not replicate.
decoder.rs
Refactor spectral selection difference magnitude to remove error check from non-zero match arm, better matches table F.1 on page 89 of the document, 93 of the PDF Remove unnecessary closure Return from clamp_to_u8 by expression instead of let value
huffman.rs
Avoid collecting in huffsize fold
idct.rs
Remove return
parser.rs
Use mutable iterator instead of for loop
upsampler.rs
Use copy_from_slice instead of for loop