libmir / mir-cv

[WIP] Computer Vision
1 stars 2 forks source link

Separable filtering #3

Open ljubobratovicrelja opened 7 years ago

ljubobratovicrelja commented 7 years ago

Hey @9il!

Just after we opened mir-cv, next semester started, and I got caught up in an academic meat grinder again. So sorry for the delay.

Anyhow, I wanted to take a shot at separable filtering as my first mir-cv addition. I was hopping you'd take a look at what I've done so far. I think this meets betterC standards (once compiled with betterC flag and linked to another executable, separable_imfilter function works as expected).

I've started a PR right away to have dedicated chat space, and hopefully this makes change tracking easier for you. Hopefully that's ok with you.

Design

I've tried building a filtering framework on which many other filtering algorithms (hopefully) could rely on. Having kernels as function pointers with the form I believe would be flexible enough to support many other filtering algorithms (sobel, bilateral, median etc.) - not really sure current design allows this, but nevertheless- that is the goal.

Implementation

Please note that I'm aware this looks more like the 'realC', rather than a 'betterC', but I'm hopping you'll point me where this can be improved. Regardless, I've implemented basic SIMD support, which shows nice results with AVX and single precision floating vectors on my machine. I've also tried making code cache friendly - I'm not sure how'd I test cache missing, but this scheme sure is faster than naive variant, so I'm hopeful I've done it right.

Results

On 512x512x1 test image, here are some rough results (I'll make a real profiling once I'm done with implementation):

Note: these results are with inlined (template) calls (without linking). I failed setting up a link time optimization with LDC, without which each 3x3 filtering takes about 1-2ms. Is this expected, and should LTO fix this?

To be done

Feel free to change anything - just please let me know why the change so I can follow. :)

ljubobratovicrelja commented 7 years ago

@9il I've created mir-rt, and added glas.internal.memory and glas.internal.config as instructed, but can I ask you to take care of stuff like descriptions, and to check/add the details like copyright and license files etc.? Let me know when you're ok with making a release and registering it to the dub.

9il commented 7 years ago

Thanks @ljubobratovicrelja. Yes. I will remove config because it is compile time feature and should go to mir-algorithm.

9il commented 7 years ago

I will try to make memory a template source file and include it into mir-algorithm as well as config. Sorry for mir-rt, we will use this repo for non templated code.

BTW, we need to think about completely generic Allocators, that would not have instance field.

ljubobratovicrelja commented 7 years ago

You have sent me a link to special language. Do not remember what

It was halide-lang. And here is an awesome talk describing filter scheduling schemes in detail.

But i am sure that current code can be improved few times in terms of performance.

Awesome- I was hoping you'd say that! :)

Please use multilevel kermel architecture. Raw level, Column level, cache (block) level.

You'd have to help me on this one. Could you describe this architecture in more detail, or maybe point me to some sources where I can learn about it?

Should be available API for separate column processing and raw processing. Separate for common convolution and borders convolution.

I agree, and that was the plan. Also we should implement in the same package kernel separability check. There is a nice explanation on matlab's function.

9il commented 7 years ago

Heh, hope I will have a time to describe my thoughts. BTW, may be helpfull: http://docs.algorithm.dlang.io/latest/mir_ndslice_topology.html#.slide