olle / lz77-kit

Some LZ77 compression utilities for the handy hands-on developer in search for a few less bytes of I/O.
http://www.studiomediatech.com/projects/lz77-kit
MIT License
65 stars 23 forks source link

decompressing error #3

Open gitPhate opened 9 years ago

gitPhate commented 9 years ago

Hi, I was doing some tests, and I encountered an error compressing and decompressing 64 kb of text with the maximum window size of 9220 I did this test:

String msg = new String(Files.readAllBytes(Paths.get("C:\\path\\to\\file", "test_data.txt"))); // input
String x = LZ77.compressStr(msg);
String y = LZ77.decompressStr(x);
System.out.println(x.equalsIgnoreCase(y));

I get

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -10
    at java.lang.String.substring(String.java:1960)
    at lz77.LZ77.decompress(LZ77.java:216)
    at lz77.LZ77.decompressStr(LZ77.java:82)
    at Main.main(Main.java:29)

Which is the same error as the one in the c# impl. Data is just a random text, here's the link: http://pastebin.com/fywqdyi3

I also tried to compress the same data with the max window length (9220) using this online tool: http://www.geocities.ws/diogok_br/lz77/demo.html. The compression works, but decompressing loses data. Example using the online tool and the same 64kb input data: http://pastebin.com/7VYcT7t6 As you can see in the decompressed text there are some scattered weird chars, and the size is 65125 bytes instead of the original 65536 chars of the input.

As I don't know the algorithm, could you fix this?

PS: that online tool uses your js version, here's their source code: http://www.geocities.ws/diogok_br/lz77/lz77.js so the problem is for all versions

olle commented 9 years ago

Thank you, great find. Perhaps there is an off-by-one error?

gitPhate commented 9 years ago

I tried with different data but the result is quite the same. In the case of the exception it might also be an off-by-one error, as there are no exceptions in the js version but the wrong decompressing is a big deal

olle commented 9 years ago

Yes, too bad. I really think it's interesting. Perhaps this is also a moment where you can dig deeper and learn the algorithm then?

It shouldn't be that complicated. This looks like a good source: http://www.cs.waikato.ac.nz/Teaching/COMP317B/Week_5/lz77.html

gitPhate commented 9 years ago

Well, I don't have much time to spend on it, and since your implementation seemed to work well, I thought that you were interested in fixing this. I will gladly port it to C# Il 10/nov/2015 20:10, "Olle Törnström" notifications@github.com ha scritto:

Yes, too bad. I really think it's interesting. Perhaps this is also a moment where you can dig deeper and learn the algorithm then?

It shouldn't be that complicated. This looks like a good source: http://www.cs.waikato.ac.nz/Teaching/COMP317B/Week_5/lz77.html

— Reply to this email directly or view it on GitHub https://github.com/olle/lz77-kit/issues/3#issuecomment-155534500.

olle commented 9 years ago

I'll see if I get the time. Big thanks again for the bug report.