Faster Gaussian blur implementation for PhotoModularFX

tannerhelland commented 6 years ago

I notice that PMFX's current Gaussian Blur filter slows down significantly as radius increases. Since (I think?) PhotoModularFX works on floating-point data, it is a good candidate for an IIR implementation of Gaussian Blur, similar to what GIMP uses.

A nice BSD-licensed reference implementation is available here:

http://www.getreuer.info/home/gaussianiir

The original 1994 paper that inspired the approach is available here (free download):

https://www.researchgate.net/publication/236268568_Signal_and_Image_Restoration_Using_Shock_Filters_and_Anisotropic_Diffusion

On a 10-megapixel image (3872x2592), PMFX takes about 135 seconds to process a 3-channel gaussian blur. In PhotoDemon, IIR code very similar to the above project does the same thing in about 6 seconds, and that's including the overhead of moving all data from 8-bit integer RGB to 32-bit floating-point RGB and back again, plus processing a fourth channel (alpha )- so PMFX could be even faster, I think.

miorsoft commented 6 years ago

Thank you, I'll see. What's the Radius you used for the test ? I know it's slow, even if it's done using separable (1D) kernels (first gauss blur in X and then in Y). Alternatively user can use Box Blur ("Box" option, always BLUR node)

tannerhelland commented 6 years ago

200 px radius.

(So sorry that I forgot to mention this in the first post!)

miorsoft commented 6 years ago

I removed some unnecessary IFs on 3528x2344 (7.8 Mpix) 200px-radius to me it take about 105 secs low gain....

miorsoft commented 6 years ago

I translated gaussianiir2d from your link http://www.getreuer.info/home/gaussianiir super Fast !!!!! ( I used numsteps=4)

About license , (If I decide to put this update) do I have to mention that link somewhere ?

PS: Today I updated "mousewheel issues"

tannerhelland commented 6 years ago

Nice work!!

Fast gaussian blur is really helpful for many things, so I think it is a good one to optimize, and IIR is a great technique for this. (For example, I use gaussian blur at large radius for shadow/highlight correction - if you blend the result with the original image, using different blend modes, you can get many useful effects.)

About license , (If I decide to put this update) do I have to mention that link somewhere ?

At the link, he says the code is licensed by Simplified BSD license. This is one of the easiest open-source licenses (https://opensource.org/licenses/bsd-license.html). You can just put some text in your instruction manual, I think - from the link:

Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

miorsoft commented 6 years ago

Updated with Gaussian IIR

[Modules] -Blur; Blur3; Fast Mode -ToneMap; ToneMap3; LocalMap; DoG; Noise2; PlasticWrap; -Faster (because of Internal faster Blur) -PlastWrap: Changed 'Details' parameter (Now proportional to image size) -RGB2HCY and HCY2RGB: reviewed colorspaces conversion algorithm -EDGES: Reviewed algorithm [Other] -Added 'Add FX-Node Selector' Window -Added Projects-Thumbs Preview. +50 MBytes -Renaming 'Node' to 'Module'

https://miorsoft.github.io/Site/PhotoModularFX/PhotoModularFX.zip

miorsoft commented 6 years ago

I need a fast Bilateral now. I thought it can be done in a similar way of IIR Blur.

Then I found Tanner's useful link on my vbForums page:

Fast O(1) bilateral filtering using trigonometric range kernels

Tanner, did you implement it ? or do you know where I can found some source code to stat from ? I'm not so good in reading..... (mathematical symbols)

tannerhelland commented 6 years ago

Sorry that I have not had time to try your new Fast Blur implementation! I have been really busy and had not had time for VB6 stuff for many weeks...

Unfortunately, I have not implemented trigonometric range kernel bilateral filtering. :( I don't understand all the math involved, either. I think they do have a source code implementation available here:

http://bigwww.epfl.ch/algorithms/bilateral-filter/

It is implemented as an ImageJ plugin, but maybe there is enough code to get started? I am happy to help if I can find some free time.

One thing I did to improve bilateral performance in PhotoDemon is offer a "Fast" separable option (like gaussian blur). If you run the bilateral filter in the horizontal direction, then run the filter in the vertical direction, it only takes O(w + h) time instead of O(w h). The results are not exactly* the same (because the algorithm is not technically separable) but they are close, and for large radii especially the visual difference is not too noticeable.

On a 10 megapixel photo, bilateral filter radius 25 takes about 11 seconds separable, compared to 240 seconds in regular mode. So for me, this is a good trade-off. Maybe it's an easier thing to try first?

miorsoft commented 6 years ago

Yes, I know about separable Bilateral, but I don't like the results. For large radii I used a (don't know how to say) "Kernel Sub-Sampling", and the results are quite good. I made it using only some points within Kernel Radius. The points are selected with Poisson disk sampling technique. (a quite good result of quality and speed ...)

Ok, about the link of source code:

I translated to VB6 the code in Bilateral_Filter_Polynomial.java It easy to do.

It's super fast and the results are good for quite large RANGE sigma pmfx18893211_1455623897865901_3690259958413146238_n - copia

While for Small RANGE Sigmas it becomes strange/odd

pmfx18893211_1455623897865901_3690259958413146238_n

It think I translated it correctly

miorsoft commented 6 years ago

Tanner, if you'd like to implement Possion Disc sampling I found this cool link: https://bost.ocks.org/mike/algorithms/ Search for "Bridson’s algorithm" for Poisson-disc sampling. it's the algorithm I used. For big radii, this techinque is very useful, and despite the approximation the results are good. (in my program I use Poisson Disc in Bilateral and Threshold Blur with a selectable minimun distance between kernel sample points from 6% to 15% of the Kernel Radius.)

tannerhelland commented 6 years ago

Wow, great link!! That is a beautiful visualization, very easy to understand and follow.

Thank you for sharing the screenshots of your results; the large radius looks very good to me. I would need to study the algorithm further to know if the small radii result is expected, or if something else is wrong. (Many approximation algorithms seem to have mixed results at small radii, so it's entirely possible that the small radius problem lies with the algorithm itself.)

I have used Voronoi diagrams before (for "Stained Glass" and "Crystallize" effects) and it is interesting to see a similar approach for sampling. I would like to investigate Bridson's algorithm further for my own code - thank you for showing it to me. I have used some fixed supersampling techniques before (for example, all Distort tools in PhotoDemon require it), but they are much simpler than Bridson's algorithm, and probably not as high-quality.

Re: separable bilateral, I have not noticed problems in my own work with it, but maybe I have not studied it closely? For example, here is my result on your sample image from above, with the displayed settings (the "use estimation" checkbox toggles separable vs standard algorithm):

bilateral_sample

They look very close to my eye, but it is always possible that I have missed something in my implementation. (Or maybe, a certain combination of settings causes bad results?) I just mention this in case you notice something I didn't, or maybe there are different ways to implement separable bilateral, with varying results!

Either way, if you would like me to test any of these updates for you, please let me know. It may take me a few days but I am always happy to help. Congratulations on your great work on this topic.

miorsoft commented 6 years ago

Thank you very much Tanner. The example pictures I posted are about Bilateral_Filter_Polynomial.java I translated to VB6 (taken from your link (http://bigwww.epfl.ch/algorithms/bilateral-filter/) Sorry, but maybe you missed it. The "Wrong" one (the second) is obtained with a small RANGE sigma (not a small kernel radius.) In the other side, It works good for quite big RANGE sigmas , as you see. To be honest, I did not fully understand the codes of this link. It seems to me that there are more algorithms and I do not know which one is the best.

About Separable, Yes, I was wrong, it is not so bad, I noticed it too in my algorithm. I do not know why, but I had this idea (that the result was too low-quality), which remained in my mind from the past, without actually carrying out tests. You are right , Separable is a good approximation.

I'm curious about your 2 parameters : "Color Strenght" and "Color Preservation", [ also because my filter is using only 1 channel, not all three RGB (or four RGBA). ]

Another algorithm I try to implement without success is this: Adaptive Manifolds for Real-Time High-Dimensional Filtering I just try with only 3 manifolds , but I encountered many problems that maybe we'll talk about later.

tannerhelland commented 6 years ago

"Sorry, but maybe you missed it. The "Wrong" one (the second) is obtained with a small RANGE sigma (not a small kernel radius.) In the other side, It works good for quite big RANGE sigmas , as you see."

You're right, I did miss that! Thank you for the clarification. I apologize for not reading your original post carefully.

"I'm curious about your 2 parameters : "Color Strenght" and "Color Preservation", [ also because my filter is using only 1 channel, not all three RGB (or four RGBA). ]"

Sometimes I try to create names that are easier for users to understand, even if they don't have a technical background. These parameters are just basic adjustments that control the kernel used for intensity differences - e.g., to build a lookup table for RGB colors on the byte range [0, 255]:

    Dim i As Long, k As Long

    ReDim m_colorFunc(0 To 255, 0 To 255) As Double

    For i = 0 To 255
        For k = 0 To 255
            m_colorFunc(i, k) = Exp(-0.5 * ((Abs(i - k) / colorFactor) ^ colorPower))
        Next k
    Next i

The full source is here.

"Another algorithm I try to implement without success is this: Adaptive Manifolds for Real-Time High-Dimensional Filtering I just try with only 3 manifolds , but I encountered many problems that maybe we'll talk about later.

That looks like a very interesting algorithm, especially running in real-time on HD video! I will download the PDF and study it more closely. Thank you for the link.

miorsoft commented 5 years ago

About bilateral I found an interesting code. It is in C I tried to ask for help on vbforums even though I think the hopes are few. If you're interested, look here. http://www.vbforums.com/showthread.php?866895-Translate-Recursive-Bilateral-Filter-from-C-to-VB6 Do you think it's worth trying to translate it?

If you could understand something, let me know. (do it here on github since you do not log in anymore in vbforums)

miorsoft commented 5 years ago

On my own I found a really simple way to make a very fast edge preserving smoothing filter. It is done in a similar way to the IIR Blur. Here are some examples: pmfx18893211_1455623897865901_3690259958413146238_n pmfxlamp

Like BLUR IIR it's time invariant. The speed depends only on the size of the image, not on the smoothing parameter (Sigma or radius) In the next update it will be available, probably with the names: BilatFast and BilatFast3. The results are good even if it is not perfect and is a simplification of the algorithm in my previous post. (that maybe one day I will be able to do)

tannerhelland commented 5 years ago

Hi @miorsoft, sorry that it has taken me so long to reply. I have been on paternity leave with my new baby girl. I return to work this month and will hopefully have time for VB hobbies again.

Congratulations on your custom-built edge-preserving smoothing filter! I am always impressed by time-invariant implementations of complicated filters. It looks like you have done a great job.

I am excited to run some tests on it when the next version of PMFX releases.

miorsoft commented 5 years ago

hi ! Thank you. https://youtu.be/fL1JuDXHIbM This is a test for my new "Fast Edge Preserve Smoothing" filter algorithm. The interesting aspect is that it is very fast and depends only on the size of the input, not on the "Smooth" parameter. It will be in a few days / weeks in the new update of PhotoModularFX. This is a 1D test, which greatly simplifies the problem with respect to a 2D space. But it was very useful for me to notice a bug in my first version (Not very visible by applying it to an image, but very visible in this type of test). This algorithm is entirely designed and developed by me. The 2D result, especially in the case of a big Smooth, suffers a bit of the artifact similar to the Separable Bilateral filter.

ORIGINAL original 3 Smooth 30 Iterat. 2 30-2 3 Smooth 20 Iterat. 6 20-6 3 Smooth 50 Iterat. 4 50-4 2

miorsoft commented 5 years ago

@tannerhelland Hi I've update / upload new Version with Fast Edge Preserve Smoothing ( Nodes FastEPS and FastEPS3 )

BTW In Projects List it is possible do display only projects that contains a specific Node.

First of all Click "Add FX-Module" Button

Then click the Module you are interested to. And click "Cancel" or "Add Module". ooo

Finally you can Enable/Disable the projects Filter ( to display only the projects that contains selected Node or All projects ) ooo2

(First time of filtering may take a while for intensive disk access )

... waiting to know what you think of these modules.

from here on down to see the news of the update (vbforums)

tannerhelland commented 5 years ago

👍 Looks like you've done a great job on this, @miorsoft. When iterations are 4+, the horizontal and vertical banding is barely noticeable, and performance remains very good, even on large images. Adding diagonals further improves the results, without too much of a performance hit.

Do you plan to publish details on your algorithm, or will it remain a "secret"?

miorsoft commented 5 years ago

I will not publish the code, but I can give a small description of the FastEPS algorithm. So I disclose the "Secret". The algorithm is really very simple.

Fast Edge Preserve Smoothing Filter Algorithm Description:

Let's assume we work on a single channel (GrayScale). The values of this 2D array of gray scale are in a range from 0 to 1 (real numbers).

Well, the process consists of the following steps.

A - Make four copies of the 2D input array. (One for Left to Right, One for Right To left One for Up to Down ad one for Down to Up) :arrow_right: :arrow_left: :arrow_down: :arrow_up:
B - Perform the "Smooth" on these 4 copies, according to the directions of each one.
C - Merge the 4 outputs by averaging the sum of them to obtain the output. [eventually reiterate again]
B(2) The Smooth is performed in this way: (for each Row / Column and Directions) (Taking as example the Left to Right copy :arrow_right: , the processing of a pixels line (of the Left To Right output) is made like this: )
```
    For X = 1 To N
        D = Abs (O(X) - O(X - 1)) * MulFactor
        If D > 1 Then D = 1
        O(X) = D * O(X) + (1 - D) * O(X - 1)
    Next
```
In this way the new value O(X) is mixed with the previous one in a percentage equal to the similarity of the range difference.(1-D) MulFactoris the weight of the range distance. It is inversely proportional to the value of Smooth.

Very simple and effective isn't it ?

Then there is the variant + Diagonals which practically does (and adds) the same operation along the diagonals in four directions. :arrow_lower_right: :arrow_upper_left: :arrow_lower_left: :arrow_upper_right: (In this case we have 8 2D arrays to do average. ) [ Now I have also done an option for the distance range calculation (D), It can be choosed to do it using the squared elevation instead of the absolute value. (Available in the next update) ]

Honestly, I'm a little sorry to have opened it, it's a creature of mine, but what can I do with it? I hope that anyone who uses this algorithm has at least the grace to give me credit by mentioning me (MiorSoft - Roberto Mior) as the original author.

PS: @tannerhelland , do you have any suggestions about how I should behave to get this? Would it be necessary to release the code or is the description of the algorithm sufficient? Just to know, maybe it will be useful to me in the future.

tannerhelland commented 5 years ago

Thank you for the comprehensive description, @miorsoft! It looks like a very fast and clever solution.

Your description actually reminds me a lot of anisotropic diffusion: https://en.wikipedia.org/wiki/Anisotropic_diffusion . The similarity makes me wonder if you can use a "trick" of anisotropic diffusion in your own filter - if you reverse small aspects of the calculation, you can use the exact same code to intelligently sharpen an image along boundaries only. (I think you already have an anisotropic filter in your app? If not, I have VB code for it in PhotoDemon, or you can see the original paper on the topic: http://authors.library.caltech.edu/6498/1/PERieeetpami90.pdf )

PS: @tannerhelland , do you have any suggestions about how I should behave to get this? Would it be necessary to release the code or is the description of the algorithm sufficient? Just to know, maybe it will be useful to me in the future.

For better or worse, it's not possible to copyright an algorithm "idea". This is why Adobe can put a feature in Photoshop, for example, and I can develop my own version of that feature in my software without consequence.

You can, however, copyright the specific source code you use to accomplish something. It's a lot like books - you can't copyright the idea of "a fantasy book about hobbits and a magic ring", but if you actually wrote Lord of the Rings, you could copy that specific text. (Hopefully that makes sense.)

If you want to protect something that is just an "idea", you have to apply for a patent. It's a much more complicated and expensive process.

This link goes into more detail on copyright vs patents.

I mostly ask for details on the algorithm because it is very difficult to offer constructive feedback without understanding the underlying algorithm. Otherwise, I just have to "guess" at what the code is doing, and I will probably guess wrong. (Also, maybe a Google search will lead other developers here, and you can get input from someone a lot more knowledgeable than me! :)

Anyway, I think you have made great strides on a lot of different features lately. Nice work, and thanks for discussing these topics with me!

miorsoft commented 5 years ago

Thank you very much! (it would be nice to talk about other alorithms too) Yes I already have anisotropic filter in my app. (it's called "DIFFUSION"). I made it quite long time ago and I think it's a little mess. (But at the moment I don't think I'll review it)

Yes, my algorithm has a small similarity with anisotropic diffusion. Although mine is different in the following:

Do not work one pixel at a time by looking at the neighbors in four directions. Rather, it "diffuses" the similarity in a "continuous" way along the 4 directions. For this reason, with a high conductivity (Smooth) some horizontal / vertical "stripes" can be visible. Problem that is reduced by using low conductivity (smooth) and many iterations. And by adding the diagonal steps.

As I said, the algorithm is really very simple. Fast Edge Preserve Smoothing Filter Algorithm CODE: (for 1 channel)

' Inputs and outputs values ranges from 0 to 1 (float) 
' V1 is the Input Array
' OV1 is the Output Array
' W is the image width
' H is the image Height
' DiffMULX = DiffMULY  ~  "Some constant" / "Smooth Value"
' ArrCopyDS    Copy array to Destination from Source (Array Copy Destination Source )
' To speedup the copy it uses:
' Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (ByVal pDest As Long, ByVal pSrc As Long, ByVal ByteLen As Long)

    For IT = 1& To NIT
        If IT > 1& Then ArrCopyDS V1, OV1

        ArrCopyDS LE, V1
        ArrCopyDS RI, V1
        ArrCopyDS UP, V1
        ArrCopyDS DW, V1

        For X = 1& To W
            X2 = W - X
            For Y = 0& To H
                D = (RI(X, Y) - RI(X - 1&, Y)) * DiffMULX
                D = Abs(D) 'we can use instead D*D
                If D > 1! Then D = 1!
                RI(X, Y) = D * RI(X, Y) + (1! - D) * RI(X - 1&, Y)
                D = (LE(X2, Y) - LE(X2 + 1&, Y)) * DiffMULX
                D = Abs(D)
                If D > 1! Then D = 1!
                LE(X2, Y) = D * LE(X2, Y) + (1! - D) * LE(X2 + 1&, Y)
            Next
        Next

        For Y = 1& To H
            y2 = H - Y
            For X = 0& To W
                D = (DW(X, Y) - DW(X, Y - 1&)) * DiffMULy
                D = Abs(D)
                If D > 1! Then D = 1!
                DW(X, Y) = D * DW(X, Y) + (1! - D) * DW(X, Y - 1&)
                D = (UP(X, y2) - UP(X, y2 + 1&)) * DiffMULy
                D = Abs(D)
                If D > 1! Then D = 1!
                UP(X, y2) = D * UP(X, y2) + (1! - D) * UP(X, y2 + 1&)
             Next
        Next

        For X = 0& To W
            For Y = 0& To H
                OV1(X, Y) = (LE(X, Y) + RI(X, Y) + UP(X, Y) + DW(X, Y)) * 0.25!
            Next
        Next
    Next IT

Similarly one can add diagonal passes and change it to perform 3 (4) channels at a time.

(I know , pre calculating X - 1 , X2 + 1 , Y - 1 , Y2 + 1 could lead to some speed gain)

tannerhelland commented 4 years ago

Hi @miorsoft. I just wanted to drop by and let you know that I've recently translated the recursive bilateral filter you mentioned to VB6. Here is a link:

https://github.com/tannerhelland/PhotoDemon/blob/master/Classes/pdFxBilateral.cls

You can try it in PhotoDemon's latest nightly builds under the Effects > Blur > Surface blur menu, same as Photoshop.

The performance of the recursive algorithm is excellent. On my old VB6 development PC, the function can process 3-4 megapixels of RGBA data per second. This outperforms Photoshop and most other photo editors by a significant margin.

For example, applying a 100-radius bilateral filter to a 20-megapixel RGBA image takes ~70 seconds in Paint.NET, but less than 6 seconds in PhotoDemon.

The filter is only an approximation of a true bilateral, obviously, but it is so much faster that it is hard to dislike it!

If you are still looking for a fast bilateral filter implementation, I think the recursive approach is currently the best anyone can hope for.

👋

miorsoft commented 4 years ago

Thank you very much @tannerhelland ! I think the Bilateral is one of the most useful filter. I'll take a look. I think I'll try to adapt/re-code it to for my needs. (this maybe should bring to create a new "FX-Module" on PhotoModularFX... in a way that could be not so good [already Bilater and FastEPS, if I add one more, it could be confusing somehow ] I'll see how to manage this relative problem )

In case I'll put here my progress-state.

Thanks again, I can't wait to try it.

PS: The Module in PhotoModularFX FastEPS (Edge Preserve Smoothing) https://github.com/miorsoft/Site/issues/4#issuecomment-455897500 that I have "created" is really fast and the quality is good. The Downside is that it requires some 'Extra Iterations' and an amount of memory of (4+4)-Times the source image. (Left,Right,Top,Down,and + the 4 Diagonals)

miorsoft commented 4 years ago

@tannerhelland

Now looking at your code. It seems to me very similar to FastEPS. https://github.com/miorsoft/Site/issues/4#issuecomment-455897500

I don't remember well, probably my FastEPS is a simplification of the algorithm I managed to draw/understand from the code. (https://github.com/ufoym/recursive-bf)

At a quick look the things I am missing are:

Range Lookup Table (mine is just Linear)
Radius Setting. (In my case it resemble DiffMULX )

For sure your version appropriately manage Spatial and Range parameters.

(hopefully maybe my code just need a little tweak)

I'll dig in to it,

EDIT: It seems to me that the main different part is this

            weight = range_table(range_dist)
            alpha_ = weight * tmpAlpha

            ycb = inv_alpha_ * img(in_x) + alpha_ * ypb

That is:

            D = range_table(range_dist)

            ycb = ( 1 - tmpAlpha ) * img(in_x) +  tmpAlpha * ypb * D

tmpAlpha is given by Radius parameter and it is what I'm missing.

but... Still haven't understand how it works .... How ( 1-D ) [=] (1 - RangeWeightedDifference[called Weight] ) is missing ?

I mean: for each "displacement" of pixels the result is given by a percentage of the adjacent pixel. To better explain: Let's think we are going from Left to Right. Let say A = Img (x) and B = img (x-1)

We compute the difference D = Abs (A-B)
we can apply a Weight to D (D = WeightTable (D))
NewA = D * A + (1 - D) * B That means that the "influence" of B (pixel at left) will be proportional to (1 - D) = (1 - WeightedRangeDifference)

Another difference seems to me that I do it in the 4 directions at a time and then add them together out = ( A+B+C+D) * 1/4 while your code first do
OUT1 = ( A + B ) * 1/2 (Left-Right passes) and then do the Up-Down passes to the result of (left Right) OUT1 OUT = (C1 + D1) * 1/2 But I guess this makes no difference (except the amount of helper-temp memory used)

miorsoft commented 4 years ago

At the moment I don't think I will change my code so much... Just added a Gaussian Shaped Distance: https://www.desmos.com/calculator/xnms39j6tr

usherbob commented 4 years ago

Hi, brilliant discussion about fast implementation of Gaussian blur and Bilaterial filtering. I'm looking for a fast implementation of Gaussian Blur and notice the link @tannerhelland provided is lost. May you share your code in a new location?

tannerhelland commented 4 years ago

Hi @usherbob. I recently took a deep dive into this topic thanks to a dedicated user, and I currently think this paper by Pascal Getreuer is the ideal starting point for surveying gaussian blur strategies:

http://www.ipol.im/pub/art/2013/87/

Getreuer is the same author of the broken link; in that paper he surveys multiple strategies and compares performance and accuracy. The anisotropic diffusion strategy I originally linked at the top of this page has a large error (10+% as a 3rd-order function). Based on Getreuer's paper, I changed to a recursive approach by Rachid Deriche, which has an error of less than 1% as a 3rd-order function, and less than 0.1% as a 4th-order, specifically this technique:

https://hal.inria.fr/inria-00074778/document

There are many trade-offs depending on your ultimate goal. Pascal Getreuer's paper is excellent for discussing all those trade-offs.

Good luck!

usherbob commented 4 years ago

Thanks for your quick reply. I'll read these docs and better my code.

miorsoft commented 4 years ago

hi @usherbob MA (moving-average) filter could also be useful: https://www.vbforums.com/showthread.php?869373-VB6-Blur-effect-on-GDI-bitmaps&p=5346951&viewfull=1#post5346951 ( There is a link to VB6 source code )

miorsoft / Site

Faster Gaussian blur implementation for PhotoModularFX #4