ulikunitz / xz

Pure golang package for reading and writing xz-compressed files
Other
485 stars 45 forks source link

low compression ratio #5

Closed jpillora closed 8 years ago

jpillora commented 8 years ago
$ ll
-rw-r--r--  1 jpillora  wheel   550K 30 Jan 21:18 a.log
$ cp a.log b.log
$ cp a.log c.log
$ xz a.log
$ gxz b.log
$ gzip c.log
$ ll
-rw-r--r--  1 jpillora  wheel   6.0K 30 Jan 15:43 a.log.xz
-rw-r--r--  1 jpillora  wheel   207K 30 Jan 21:16 b.log.xz
-rw-r--r--  1 jpillora  wheel    10K 30 Jan 21:16 c.log.gz

Any idea why this is?

kofalt commented 8 years ago

Would it be possible to provide the file in question?

jpillora commented 8 years ago

Yep, will post once I get home On Sun, 31 Jan 2016 at 6:19 AM Nathaniel Kofalt notifications@github.com wrote:

Would it be possible to provide the file in question?

— Reply to this email directly or view it on GitHub https://github.com/ulikunitz/xz/issues/5#issuecomment-177280461.

jpillora commented 8 years ago

redacted copy with similar results: https://gist.github.com/jpillora/f8e0d0b3eb6a066ae5b0/raw/84847dae70ba396d7a2d19ba888e756a7b37362f/a.log

ulikunitz commented 8 years ago

Hi, many thanks for reporting the issue.

I have stated in the README.md that the compression ratio is not as good as with xz, so this is not a surprise. The way of selecting matches has been poor leading to a bad compression ratio. I have worked on it over the last few days. Now the dev branch is compressing your file to 6.8 KiB, which is still not as good than the xz tool but better than gzip. Since compression is now much faster than the xz tool I'm optimistic that I can trade compression speed to improve the compression ratio. But this requires still some work and tooling.

I have marked the issue for closure by the v0.5 milestone.

jpillora commented 8 years ago

Oh nice work. I'll give it a try tonight On Mon, 1 Feb 2016 at 7:16 AM Ulrich Kunitz notifications@github.com wrote:

Hi, many thanks for reporting the issue.

I have stated in the README.md that the compression ratio is not as good as with xz, so this is not a surprise. The way of selecting matches has been poor leading to a bad compression ratio. I have worked on it over the last few days. Now the dev branch is compressing your file to 6.8 KiB, which is still not as good than the xz tool but better than gzip. Since compression is now much faster than the xz tool I'm optimistic that I can trade compression speed to improve the compression ratio. But this requires still some work and tooling.

I have marked the issue for closure by the v0.5 milestone.

— Reply to this email directly or view it on GitHub https://github.com/ulikunitz/xz/issues/5#issuecomment-177600478.

jpillora commented 8 years ago
$ gxz a.log
$ ll
total 16
-rw-r--r--  1 jpillora  wheel   6.8K  1 Feb 12:30 a.log.xz

Great work :)