IncrementalCopyFastPath can overshoot buffers

nemequ commented 9 years ago

I'm getting a segfault because IncrementalCopyFastPath can overshoot the src or dst buffers (by attempting to copy a uint32_t from src to dst when less than sizeof(uint32_t) remains in one, or both, buffers).

Just using memcpy fixes the issue. gcc should inline the memcpy call so in addition to working correctly it should also be faster on many platforms since it is pretty aggressively optimized. Not sure how other compilers behave, but I would be surprised if they didn't inline calls like memcpy, too.

richox commented 9 years ago

libzling allocated several bytes as sentinel after src/dst (tricky), so IncrementalCopyFastPath is safe even if it overshoots some bytes. This is also what libsnappy do. memcpy won't work here because of memory overlapping.

nemequ commented 9 years ago

libzling allocated several bytes as sentinel after src/dst (tricky), so IncrementalCopyFastPath is safe even if it overshoots some bytes

The demo program may do this, but people using it as a library probably don't. I can't just provide extra space in the output buffer since I'm getting that buffer from someone else.

An example where this is a big problem is when the buffers are really memory mapped files (which can be a huge performance benefit), which leads to:

==11816== Process terminating with default action of signal 11 (SIGSEGV)
==11816==  General Protection Fault
==11816==    at 0x5C262CA: IncrementalCopyFastPath (libzling_lz.cpp:87)
==11816==    by 0x5C262CA: baidu::zling::lz::ZlingRolzDecoder::Decode(unsigned short*, unsigned char*, int, int, int*) (libzling_lz.cpp:279)
==11816==    by 0x5C24ECF: baidu::zling::Decode(baidu::zling::Inputter*, baidu::zling::Outputter*, baidu::zling::ActionHandler*) (libzling.cpp:387)
==11816==    by 0x5C23967: squash_zling_process_stream(_SquashStream*, SquashOperation) (squash-zling.cpp:261)
==11816==    by 0x4E3E752: squash_stream_thread_func (stream.c:166)
==11816==    by 0x4E3F47D: _thrd_wrapper_function (tinycthread.c:561)
==11816==    by 0x5408529: start_thread (in /usr/lib64/libpthread-2.20.so)

This is also what libsnappy do.

I also have a snappy plugin and it does not touch memory outside of the buffer provided to it. The only other plugin which has a this problem is libbsc, and it is being fixed.

nemequ commented 9 years ago

diff --git a/src/libzling_lz.cpp b/src/libzling_lz.cpp
index 353efa6..f4599bb 100644
--- a/src/libzling_lz.cpp
+++ b/src/libzling_lz.cpp
@@ -83,12 +83,7 @@ static inline void IncrementalCopyFastPath(unsigned char* src, unsigned char* ds
         len -= dst - src;
         dst += dst - src;
     }
-    while (len > 0) {
-        *reinterpret_cast<uint32_t*>(dst) = *reinterpret_cast<uint32_t*>(src);
-        len -= 4;
-        dst += 4;
-        src += 4;
-    }
+    memcpy(dst, src, len);
     return;
 }

This passes all my unit tests, works for everything in silesa and caterbury, as well as enwik8, and is faster than the existing code.

richox commented 9 years ago

have you tried copying overlapping matches? for example: decoding: "1234" into "123232323234"

memcpy() requires "The memory areas must not overlap", otherwise the behavior is undefined.

nemequ commented 9 years ago

I don't know. Like I mentioned I tried quite a few files with no issue—no idea if they had any overlapping buffers or not.

If the buffers really can overlap just use memmove instead of mempy. It's still faster than the code you have, and it handles lengths which are not multiples of 4 correctly.

nemequ commented 9 years ago

It's still faster than the code you have

I guess I should quantify that—on this machine, the existing code was coming in at about 0.002562 seconds/iteration (of decompressing alice29.txt). memcpy is at 0.002503, memmove 0.002518).

qiqian commented 9 years ago

memmove doesn't work, it creates incorrect uncompressed files, but i don't know why

nemequ commented 9 years ago

IIRC memmove worked for me when I tried it. It passed all my unit tests plus everything in the Squash Compression Benchmark. Could you provide some data that doesn't work?

nemequ commented 9 years ago

BTW, not sure if you care but I'm planning on removing the zling plugin from Squash after the next release (in a couple days) due to this issue. Currently it is just disabled by default, but I don't see much point in keeping it around if there is no interest in fixing it.

qiqian commented 8 years ago

I have an xml which can't decompressed using memmove, how do I upload it

richox commented 8 years ago

it's already fixed. (https://github.com/richox/libzling/issues/8) the problem happens when encoding produces overflow matches. there's no bug in IncrementalCopyFastPath(), it is copied from Google's snappy.

richox commented 8 years ago

123 shoud produce 123232323, when you use memmove() you get 12323????. that's why memmove can't be used. IncrementalCopyFastPath() has no bug, it is copied Google's libsnappy.

richox / libzling

IncrementalCopyFastPath can overshoot buffers #6