Closed nemequ closed 8 years ago
libzling allocated several bytes as sentinel after src/dst (tricky), so IncrementalCopyFastPath is safe even if it overshoots some bytes. This is also what libsnappy do. memcpy won't work here because of memory overlapping.
libzling allocated several bytes as sentinel after src/dst (tricky), so IncrementalCopyFastPath is safe even if it overshoots some bytes
The demo program may do this, but people using it as a library probably don't. I can't just provide extra space in the output buffer since I'm getting that buffer from someone else.
An example where this is a big problem is when the buffers are really memory mapped files (which can be a huge performance benefit), which leads to:
==11816== Process terminating with default action of signal 11 (SIGSEGV)
==11816== General Protection Fault
==11816== at 0x5C262CA: IncrementalCopyFastPath (libzling_lz.cpp:87)
==11816== by 0x5C262CA: baidu::zling::lz::ZlingRolzDecoder::Decode(unsigned short*, unsigned char*, int, int, int*) (libzling_lz.cpp:279)
==11816== by 0x5C24ECF: baidu::zling::Decode(baidu::zling::Inputter*, baidu::zling::Outputter*, baidu::zling::ActionHandler*) (libzling.cpp:387)
==11816== by 0x5C23967: squash_zling_process_stream(_SquashStream*, SquashOperation) (squash-zling.cpp:261)
==11816== by 0x4E3E752: squash_stream_thread_func (stream.c:166)
==11816== by 0x4E3F47D: _thrd_wrapper_function (tinycthread.c:561)
==11816== by 0x5408529: start_thread (in /usr/lib64/libpthread-2.20.so)
This is also what libsnappy do.
I also have a snappy plugin and it does not touch memory outside of the buffer provided to it. The only other plugin which has a this problem is libbsc, and it is being fixed.
diff --git a/src/libzling_lz.cpp b/src/libzling_lz.cpp
index 353efa6..f4599bb 100644
--- a/src/libzling_lz.cpp
+++ b/src/libzling_lz.cpp
@@ -83,12 +83,7 @@ static inline void IncrementalCopyFastPath(unsigned char* src, unsigned char* ds
len -= dst - src;
dst += dst - src;
}
- while (len > 0) {
- *reinterpret_cast<uint32_t*>(dst) = *reinterpret_cast<uint32_t*>(src);
- len -= 4;
- dst += 4;
- src += 4;
- }
+ memcpy(dst, src, len);
return;
}
This passes all my unit tests, works for everything in silesa and caterbury, as well as enwik8, and is faster than the existing code.
have you tried copying overlapping matches? for example:
decoding: "123
memcpy() requires "The memory areas must not overlap", otherwise the behavior is undefined.
I don't know. Like I mentioned I tried quite a few files with no issue—no idea if they had any overlapping buffers or not.
If the buffers really can overlap just use memmove instead of mempy. It's still faster than the code you have, and it handles lengths which are not multiples of 4 correctly.
It's still faster than the code you have
I guess I should quantify that—on this machine, the existing code was coming in at about 0.002562 seconds/iteration (of decompressing alice29.txt
). memcpy is at 0.002503, memmove 0.002518).
memmove doesn't work, it creates incorrect uncompressed files, but i don't know why
IIRC memmove
worked for me when I tried it. It passed all my unit tests plus everything in the Squash Compression Benchmark. Could you provide some data that doesn't work?
BTW, not sure if you care but I'm planning on removing the zling plugin from Squash after the next release (in a couple days) due to this issue. Currently it is just disabled by default, but I don't see much point in keeping it around if there is no interest in fixing it.
I have an xml which can't decompressed using memmove, how do I upload it
it's already fixed. (https://github.com/richox/libzling/issues/8) the problem happens when encoding produces overflow matches. there's no bug in IncrementalCopyFastPath(), it is copied from Google's snappy.
123
I'm getting a segfault because IncrementalCopyFastPath can overshoot the src or dst buffers (by attempting to copy a uint32_t from src to dst when less than sizeof(uint32_t) remains in one, or both, buffers).
Just using memcpy fixes the issue.
gcc
should inline the memcpy call so in addition to working correctly it should also be faster on many platforms since it is pretty aggressively optimized. Not sure how other compilers behave, but I would be surprised if they didn't inline calls like memcpy, too.