Replace `filter` with the old implementation

cospectrum commented 5 months ago

ripytide commented 5 months ago

Can you show a benchmark comparison on your machine with a before/after this pr? I'll do the same on my machine as well.

theotherphil commented 5 months ago

@ripytide I've just added a benchmark comparison on my machine on https://github.com/image-rs/imageproc/issues/644#issuecomment-2133278304 (edit: for v0.25 to master).

ripytide commented 5 months ago

My Results on a Ryzen 5 4500U:

 name                                                     before1 ns/iter  after1 ns/iter  diff ns/iter   diff %  speedup 
 filter::benches::bench_box_filter                        2,069,851        2,067,948             -1,903   -0.09%   x 1.00 
 filter::benches::bench_filter_clamped_gray_3x3           1,235,373        1,187,812            -47,561   -3.85%   x 1.04 
 filter::benches::bench_filter_clamped_gray_5x5           2,860,594        2,790,122            -70,472   -2.46%   x 1.03 
 filter::benches::bench_filter_clamped_gray_7x7           4,821,814        5,087,034            265,220    5.50%   x 0.95 
 filter::benches::bench_filter_clamped_parallel_gray_3x3  780,587          459,600             -320,987  -41.12%   x 1.70 
 filter::benches::bench_filter_clamped_parallel_gray_5x5  962,982          927,658              -35,324   -3.67%   x 1.04 
 filter::benches::bench_filter_clamped_parallel_gray_7x7  1,402,118        1,595,752            193,634   13.81%   x 0.88 
 filter::benches::bench_gaussian_f32_stdev_1              286,052          295,040                8,988    3.14%   x 0.97 
 filter::benches::bench_gaussian_f32_stdev_10             1,377,759        1,384,059              6,300    0.46%   x 1.00 
 filter::benches::bench_gaussian_f32_stdev_3              524,585          527,970                3,385    0.65%   x 0.99 
 filter::benches::bench_horizontal_filter                 902,045          913,429               11,384    1.26%   x 0.99 
 filter::benches::bench_separable_filter                  663,439          705,558               42,119    6.35%   x 0.94 
 filter::benches::bench_vertical_filter                   934,897          941,840                6,943    0.74%   x 0.99

theotherphil commented 5 months ago

@ripytide can you also try with the benchmarks listed on https://github.com/image-rs/imageproc/issues/644#issuecomment-2133278304 so we can get a consistent comparison between my benchmarking and yours.

ripytide commented 5 months ago

So on my machine the only massive change is the parallel_3x3 which is 1.7x but everything else is quite similar and goes either way. But definitely not 4x.

cospectrum commented 5 months ago

use std::{hint::black_box, time::Instant};
use imageproc::image::RgbImage;

const TRIES: usize = 20;

fn main() {
    const W: u32 = 2600;
    const H: u32 = W;

    let img = black_box(RgbImage::new(W, H));
    let kernel = black_box([0i32; 3 * 3]);

    let t = Instant::now();
    for _ in 0..TRIES {
        let _: RgbImage = black_box(imageproc::filter::filter3x3(&img, &kernel));
    }
    dbg!(t.elapsed());

    let ker = imgproc::kernel::Kernel::new(&kernel, 3, 3);
    let t_new = Instant::now();
    for _ in 0..TRIES {
        let _: RgbImage = black_box(imgproc::filter::filter_clamped(&img, ker));
    }
    dbg!(t_new.elapsed());
}

[dependencies]
imageproc = "0.25.0"
imgproc = { git = "https://github.com/image-rs/imageproc", branch = "master", package = "imageproc" }

cospectrum commented 5 months ago

[src/main.rs:17:5] t.elapsed() = 2.099322542s
[src/main.rs:24:5] t_new.elapsed() = 8.245356375s

ripytide commented 5 months ago

Interesting, I wonder what the difference between that test and the 3x3 test in the benchmarks is that's making such a big difference.

theotherphil commented 5 months ago

The regressions are much larger for RGB images: https://github.com/image-rs/imageproc/issues/644#issuecomment-2133356957

Which might explain some of the difference between what your respective benchmarks are showing. I expect we're missing benchmarks on RGB images for quite a lot of functions in this crate!

cospectrum commented 5 months ago

@theotherphil By the way, what should happen if P::CHANNEL_COUNT != Q::CHANNEL_COUNT?

cospectrum commented 5 months ago

The zip will be trimmed to the minimum length.

ripytide commented 5 months ago

Should probably panic in that scenario, is there a compile-time assert function?

cospectrum commented 5 months ago

Compile time assert is const _: () = assert!(expr)

cospectrum commented 5 months ago

But we can't use this, for the same reason we can't allocate the stack by the number of channels

cospectrum commented 5 months ago

The only way to do it in stable Rust is to add typenum in Pixel trait, I think

ripytide commented 5 months ago

I wonder if we could use the tinyvec crate to remove heap-allocation from the previous implementation and what effect that would have on its performance. Or just use a 4-channel array and panic if the pixel type has greater than 4-channels which should be generic enough for most usecases. Or even if P::CHANNEL_COUNT == 1 {[0; 1]} else if P::CHANNEL_COUNT == 2 {[0; 2]} ... up to 4.

theotherphil commented 5 months ago

@cospectrum I’d go with a runtime assert to check that channel counts match. Seems like a less likely user error than many of the places we already use panics to check preconditions.

cospectrum commented 5 months ago

@cospectrum I’d go with a runtime assert to check that channel counts match. Seems like a less likely user error than many of the places we already use panics to check preconditions.

Done

theotherphil commented 5 months ago

@cospectrum , @ripytide as both the function signatures and the set of benchmarks have changed since 0.25 comparisons across the two versions are a bit of a chore.

So I'll merge this, check manually that performance now matches 0.25, and we can then use this commit as the baseline for future changes.

image-rs / imageproc

Replace `filter` with the old implementation #654