Closed GoogleCodeExporter closed 9 years ago
Sorry for the typos:
What about off sized images? -> What about odd sized images?
Original comment by alt...@gmail.com
on 17 Oct 2013 at 4:05
Sure. Also note that I plan to do some work on blur - mainly we want a planar
version (e.g. yuv).
Going from memory, radius is number of pixels around the pixel used. The box
would be 1+(radius * 2) for width and height.
The function has never been ported to neon.
All functions should support any size.. odd is fine.
blur relates to box filter in resize, and I noticed box filter C code didnt
handle box size of more than 256, but blur uses an int, so it'll overflow for
16 million pixels.
blur box size of about 111 pixels is typical for a very blurred image.
Original comment by fbarch...@google.com
on 17 Oct 2013 at 5:43
I added some comments to the header.
As an alternative to planar blur, I used a box filter scale down and bilinear
scale up to achieve my goal.
The scale up is not optimized for planar scaling, like the ARGBScale is, but so
it will be interesting to do the upsampler, and still may make sense to do
planar blur.
Next step for blur is to do a small blur specialization. The current code uses
floats, since the sum of many pixels can be large. But if the sum is less than
65536, fixed point could be used for better performance.
I'll close this bug (doc) if the function is well enough documented.
Original comment by fbarch...@google.com
on 31 Oct 2013 at 9:29
I don't see blur related changes in planar_functions.h
Am I looking at the right file?
(Current rev # is 838)
Original comment by alt...@gmail.com
on 4 Nov 2013 at 8:37
Added a bit more info to blur header documentation.
Just look at ARGBBlur(), and the unittest in planar_functions.
If its still unclear, let me know.
I've also been optimizing blur for the use case of small amounts of blur...
radius 5 - 11x11 or less, is about 20% faster.
Original comment by fbarch...@google.com
on 8 Nov 2013 at 3:30
I think the documentation is OK now
Thank you for the doc update and the speedup for small blur sizes!
Original comment by alt...@gmail.com
on 8 Nov 2013 at 12:34
Thanks. This is still an area of development. As such, your feedback is
appreciated.
In hangouts we use blur in a number of effects, and its still a bottleneck.
There are 3 optimizations in mind.
1. The accumulation buffer is a 32 bit int. it could be float, and then when
used, would need 4x fewer int to floats.
2. When the area is small, using 16 bit differences would help.
3. A planar version of blur
I'm leaning toward 2 then 3 then 1.
Original comment by fbarch...@google.com
on 9 Nov 2013 at 8:05
Here is some feedback from our use case.
We use the blur function to "hide" some parts of screenshots.
So we have screen sized images (e.g. 1280x1024 pixels).
We call ARGBComputeCumulativeSum() once for the whole image then call
ARGBBlur() for each area (rectangles, of any size) we want to blur, with a big
radius (> 15).
This is indeed a performance bottleneck.
At first we had memory errors when blurring small rectangles in corners of the
image, because the big radius (greater than width & height) was causing invalid
read/write before/after the image memory. We now use a smaller radius for small
rectangles and no blur at all for very tiny rects, like 2x2 which were still
causing issues.
This case is maybe a good candidate for unit tests: blur a 1x1 rectangle in
each corner of the image with a big radius.
I hope this can help for future developments.
Keep up the good work
Original comment by alt...@gmail.com
on 18 Nov 2013 at 8:29
ARGBComputeCumulativeSum() should not be called - its an internal function used
by ARGBBlur(). The higher level, ARGBBlur() will clip the blur region to the
image.
The cumulative sum buffer does not need to be the full image. Here's a snippet
to show usage:
uint8* row_raw = reinterpret_cast<uint8*>(&data()[top * width() + left]);
// Allocate circular CumulativeSum buffer which is a ARGB int pixels.
// 1 row for above, 1 for the pixel row, and 2 radius for above/below.
int circular_height = radius * 2 + 2;
CreateTemp(region_width * sizeof(int), circular_height); // NOLINT
int32* cumsum = reinterpret_cast<int32*>(temp_data());
libyuv::ARGBBlur(row_raw, Stride(), // Number of bytes in 'row'.
row_raw, Stride(),
cumsum, region_width * 4, // Number of ints with 4 channels.
region_width, region_height, // Region size in pixels.
radius); // Radius for amount of blur.
The unittests can tests arbitrary image sizes. ie
set LIBYUV_WIDTH=2
set LIBYUV_HEIGHT=2
out\release\libyuv_unittest --gtest_filter=*Blur*
which works on 1x1 but fails on 2x2. But no crash.
ARGBBlur_Opt is 13 radius (27x27)
ARGBBlurSmall_Opt is 5 radius (11x11)
which tests the 2 implementations - small blur radius of 5 or less uses fixed
point, while larger radius uses float.
Changing the tests to 55 and 2 behavior is the same. Heres a benchmark with
radius 55 and 2.
set LIBYUV_WIDTH=1280
set LIBYUV_HEIGHT=720
set LIBYUV_REPEAT=1000
out\release\libyuv_unittest --gtest_filter=*Blur* | grep ms
[ OK ] libyuvTest.ARGBBlur_Any (2838 ms)
[ OK ] libyuvTest.ARGBBlur_Unaligned (2816 ms)
[ OK ] libyuvTest.ARGBBlur_Invert (2815 ms)
[ OK ] libyuvTest.ARGBBlur_Opt (2771 ms)
[ OK ] libyuvTest.ARGBBlurSmall_Any (1887 ms)
[ OK ] libyuvTest.ARGBBlurSmall_Unaligned (1847 ms)
[ OK ] libyuvTest.ARGBBlurSmall_Invert (1860 ms)
[ OK ] libyuvTest.ARGBBlurSmall_Opt (1851 ms)
[----------] 8 tests from libyuvTest (18685 ms total)
Trying various heights and widths, it seems like any height is okay, but width
less than radius has a bug.
Original comment by fbarch...@google.com
on 18 Nov 2013 at 4:06
I removed the explicit call to ARGBComputeCumulativeSum() and indeed the result
is the same!
Thank you for this information and the memory efficient example.
Original comment by alt...@gmail.com
on 18 Nov 2013 at 5:07
I reproduced an issue on width. The row level loop is doing
r+1 clipped left edge blur
n unclipped middle pixels
r clipped right edge
where n needs to be 1 or more. Rather than add a check for n>=1, I recompute r
(radius) based on width, and immediately return if radius is <= 0.
See r858 for that.
In the header I note that ARGBComputeCumulativeSum is internal. Originally I
only provided that function, and not the Blur itself.
The unittest is doing radius 55 and 2.
Original comment by fbarch...@google.com
on 18 Nov 2013 at 10:39
Original comment by fbarch...@google.com
on 20 Nov 2013 at 12:18
Original issue reported on code.google.com by
alt...@gmail.com
on 17 Oct 2013 at 4:04