Closed theotherphil closed 5 months ago
I bet the compiler is able to do some fancy optimizations when the kernel size is hard-coded as 3. We could have an if statement in the generic filter()
where if kernel.width() == 3
and kernel.height() == 3
then we call filter3x3()
otherwise we just use the generic version. Then we don't need to expose two different kernel filter functions to the user.
Yes, we can hide some some manual specialisation behind filter
as necessary.
The current Implementation of
filter_clamped
is 3x slower than a simple hardcoded implementation for a grayscale image and 3x3 kernel.