qiqian / webp

Automatically exported from code.google.com/p/webp
0 stars 0 forks source link

out of bound memory access when bypass_filtering #193

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
How to reproduce the problem:
1. Decompress attached image with bypass_filtering = 1
2. Place bounds checking in vp8l.c:GetMetaIndex:

static WEBP_INLINE int GetMetaIndex(
    const uint32_t* const image, int xsize, int bits, int x, int y) {
  if (bits == 0) return 0;

  assert(xsize * (y >> bits) + (x >> bits) < _msize((void*)image) / sizeof(uint32_t)); // bounds checking

  return image[xsize * (y >> bits) + (x >> bits)];
}

Using:
webp 0.4 on windows visual studio 2012

please advice

Original issue reported on code.google.com by era...@phobicgames.com on 25 Mar 2014 at 7:53

Attachments:

GoogleCodeExporter commented 8 years ago
Here is the correct image file.

Original comment by era...@phobicgames.com on 25 Mar 2014 at 7:55

Attachments:

GoogleCodeExporter commented 8 years ago
We are currently using this workaround:
// Building webp with -Dmalloc=hooked_webp_malloc
void* webp_hooked_malloc(size_t n)
{
    unsigned char* ret = malloc(n + 4);
    memset(ret + n, 0, 4);
    return ret;
}

Original comment by yo...@phobicgames.com on 26 Mar 2014 at 12:55

GoogleCodeExporter commented 8 years ago
having a look (I'm surprised bypass_filtering has an impact on _lossless_ 
decoding though, which is not using lossy's in-loop filtering).

Original comment by pascal.m...@gmail.com on 26 Mar 2014 at 2:32

GoogleCodeExporter commented 8 years ago
update: i can reproduce with valgrind using dwebp: 'valgrind dwebp test.WEBP 
-nofilter' exhibits the out-of-bound read.

Still debugging....

Original comment by pascal.m...@gmail.com on 26 Mar 2014 at 2:35

GoogleCodeExporter commented 8 years ago
When stepping in the code, we noticed that the last uint32_t is indexed with 
row = width of image (256 not 255), x = 0. It seems that the code is reading 
one extra uint32_t in the huffman image. 

Original comment by yo...@phobicgames.com on 26 Mar 2014 at 2:41

GoogleCodeExporter commented 8 years ago
FYI we have to use bypass_filtering because otherwise decompressing is 
extremely slow on ARMV7 class hardware. Decompressing a lossy 1024 x 1024 x 
RGBA texture takes around 500ms with bypass_filtering = 0, otherwise with 
bypass_filtering = 1 it takes 150-200 ms.

This is with NEON activated.

Original comment by yo...@phobicgames.com on 26 Mar 2014 at 2:44

GoogleCodeExporter commented 8 years ago
You can try the patch here: https://gerrit.chromium.org/gerrit/#/c/69363/

(still testing it, but it seems reasonably ok)

A note about filtering speed:
actually, the ARMv7 code still lacks NEON optimization for _strong_ filtering 
(which is the default encoding option).
But support for light filtering in assembly is here. So you can maybe encode 
the picture using -nostrong option in cwebp if that's a possibility. (asm 
support for strong filtering is in the work. ETA unknown).

Also, you might be interested in using no_fancy_upsampling=1 to speed-up the 
YUV->RGB conversion.

Original comment by s...@google.com on 26 Mar 2014 at 4:10

GoogleCodeExporter commented 8 years ago
We've tested no_fancy_upsampling and it does not affect performance 
significantly. 

Another issue we have on ARMv7 is memory usage. When not decoding 
incrementally, libwebp mallocs at least the number of pixels required by the 
output image. This is excluding the memory required for the output buffer. For 
a complete decompression, this means around 8.5 megabytes of mallocs when 
decompressing a lossy 1024 x 1024 RGBA image.

Moreover, webp does around 500 tiny mallocs on a lossy 1024 x 1024 RGBA. Most 
of those allocs may be replaced by alloca style stack allocations. Especially 
the tiny ones done by the huffman tree code. 

Will apply your patch to webp 0.4 and run it on a few thousand images on 8 
cores. These sort of issues are extremely rare to detect. We detected it 
because we are hooking malloc/free in webp and we do not pad with memsetted 
memory as is done by most libc mallocs. 

Original comment by yo...@phobicgames.com on 26 Mar 2014 at 4:20

GoogleCodeExporter commented 8 years ago
I've verified that the patch works on our dataset after running the code using 
the same memory a couple of times. 

This is a quick-fix and seems to work but the core of the issue seems to be 
that the caller of the patched function asks for a row/pixel out-of-bounds. 

Thanks a lot Pascal, you saved us of a lot of headache!

Johan

Original comment by yo...@phobicgames.com on 26 Mar 2014 at 4:37

GoogleCodeExporter commented 8 years ago
Yohan,

> This is a quick-fix and seems to work but the core of the issue seems to be 
that the caller of the patched function asks for a row/pixel out-of-bounds. 

Actually, it really seems like a reasonable fix: we were calling 
GetHtreeGroupForPos()
one extra time (only) outside of the loop at vp8l.cc:822
The loop itself was not executed since the decoding was already finished, but 
this call was done anyway. So best is just to not even get into this function 
if it's not needed.

Thanks for the tests. I will also have a look at the separate memory problems.

Original comment by s...@google.com on 26 Mar 2014 at 5:03

GoogleCodeExporter commented 8 years ago
Great. Will use this patch until the next version of WebP comes out. 

Original comment by yo...@phobicgames.com on 26 Mar 2014 at 6:19

GoogleCodeExporter commented 8 years ago
patch submitted.

Original comment by pascal.m...@gmail.com on 26 Mar 2014 at 10:23

GoogleCodeExporter commented 8 years ago
Issue 194 has been merged into this issue.

Original comment by pascal.m...@gmail.com on 27 Mar 2014 at 3:58