Closed cospectrum closed 5 months ago
Can you show a benchmark comparison on your machine with a before/after this pr? I'll do the same on my machine as well.
@ripytide I've just added a benchmark comparison on my machine on https://github.com/image-rs/imageproc/issues/644#issuecomment-2133278304 (edit: for v0.25 to master).
My Results on a Ryzen 5 4500U
:
name before1 ns/iter after1 ns/iter diff ns/iter diff % speedup
filter::benches::bench_box_filter 2,069,851 2,067,948 -1,903 -0.09% x 1.00
filter::benches::bench_filter_clamped_gray_3x3 1,235,373 1,187,812 -47,561 -3.85% x 1.04
filter::benches::bench_filter_clamped_gray_5x5 2,860,594 2,790,122 -70,472 -2.46% x 1.03
filter::benches::bench_filter_clamped_gray_7x7 4,821,814 5,087,034 265,220 5.50% x 0.95
filter::benches::bench_filter_clamped_parallel_gray_3x3 780,587 459,600 -320,987 -41.12% x 1.70
filter::benches::bench_filter_clamped_parallel_gray_5x5 962,982 927,658 -35,324 -3.67% x 1.04
filter::benches::bench_filter_clamped_parallel_gray_7x7 1,402,118 1,595,752 193,634 13.81% x 0.88
filter::benches::bench_gaussian_f32_stdev_1 286,052 295,040 8,988 3.14% x 0.97
filter::benches::bench_gaussian_f32_stdev_10 1,377,759 1,384,059 6,300 0.46% x 1.00
filter::benches::bench_gaussian_f32_stdev_3 524,585 527,970 3,385 0.65% x 0.99
filter::benches::bench_horizontal_filter 902,045 913,429 11,384 1.26% x 0.99
filter::benches::bench_separable_filter 663,439 705,558 42,119 6.35% x 0.94
filter::benches::bench_vertical_filter 934,897 941,840 6,943 0.74% x 0.99
@ripytide can you also try with the benchmarks listed on https://github.com/image-rs/imageproc/issues/644#issuecomment-2133278304 so we can get a consistent comparison between my benchmarking and yours.
So on my machine the only massive change is the parallel_3x3 which is 1.7x
but everything else is quite similar and goes either way. But definitely not 4x
.
use std::{hint::black_box, time::Instant};
use imageproc::image::RgbImage;
const TRIES: usize = 20;
fn main() {
const W: u32 = 2600;
const H: u32 = W;
let img = black_box(RgbImage::new(W, H));
let kernel = black_box([0i32; 3 * 3]);
let t = Instant::now();
for _ in 0..TRIES {
let _: RgbImage = black_box(imageproc::filter::filter3x3(&img, &kernel));
}
dbg!(t.elapsed());
let ker = imgproc::kernel::Kernel::new(&kernel, 3, 3);
let t_new = Instant::now();
for _ in 0..TRIES {
let _: RgbImage = black_box(imgproc::filter::filter_clamped(&img, ker));
}
dbg!(t_new.elapsed());
}
[dependencies]
imageproc = "0.25.0"
imgproc = { git = "https://github.com/image-rs/imageproc", branch = "master", package = "imageproc" }
[src/main.rs:17:5] t.elapsed() = 2.099322542s
[src/main.rs:24:5] t_new.elapsed() = 8.245356375s
Interesting, I wonder what the difference between that test and the 3x3
test in the benchmarks is that's making such a big difference.
The regressions are much larger for RGB images: https://github.com/image-rs/imageproc/issues/644#issuecomment-2133356957
Which might explain some of the difference between what your respective benchmarks are showing. I expect we're missing benchmarks on RGB images for quite a lot of functions in this crate!
@theotherphil By the way, what should happen if P::CHANNEL_COUNT != Q::CHANNEL_COUNT
?
The zip
will be trimmed to the minimum length.
Should probably panic in that scenario, is there a compile-time assert function?
Compile time assert is const _: () = assert!(expr)
But we can't use this, for the same reason we can't allocate the stack by the number of channels
The only way to do it in stable Rust is to add typenum in Pixel
trait, I think
I wonder if we could use the tinyvec
crate to remove heap-allocation from the previous implementation and what effect that would have on its performance. Or just use a 4-channel array and panic if the pixel type has greater than 4-channels which should be generic enough for most usecases. Or even if P::CHANNEL_COUNT == 1 {[0; 1]} else if P::CHANNEL_COUNT == 2 {[0; 2]} ... up to 4
.
@cospectrum I’d go with a runtime assert to check that channel counts match. Seems like a less likely user error than many of the places we already use panics to check preconditions.
@cospectrum I’d go with a runtime assert to check that channel counts match. Seems like a less likely user error than many of the places we already use panics to check preconditions.
Done
@cospectrum , @ripytide as both the function signatures and the set of benchmarks have changed since 0.25 comparisons across the two versions are a bit of a chore.
So I'll merge this, check manually that performance now matches 0.25, and we can then use this commit as the baseline for future changes.
Related https://github.com/image-rs/imageproc/issues/644