Open aegoroff opened 5 years ago
Thank you for reporting. This is expected and I have following language in README.md:
At this time the package cannot compete with the xz tool regarding compression speed and size.
I haven't found the time so far to work on code optimization. On the plus side there is a lot of potential on improving the situation. Unfortunately I cannot promise when I will work on it.
There is work ahead. I left the issue open.
I just ran into slow decompression and the (partial) solution is to wrap your reader in bufio.NewReader()
. It turns out this library uses ReadByte()
a great deal and on unbuffered input this is incredibly slow.
I say "partial" as unfortunately this fails on some inputs with
writeMatch: distance out of range
Very weird that it fails when buffered but works when unbuffered..
Yes, the library doesn't implement its own buffering and because it uses ReadByte it benefits from buffered readers. I should have documented it.
Rationale at the time has been that I wanted to use a buffered reader only if there is a need for it. For instance I didn't want to use a buffered reader for a bytes.Buffer.
A buffered reader shouldn't make a difference for the reading process. The gxz tool is using a buffered reader and I have run extensive tests for it.
Can you provide the file that you want to decompress?
Sure, I was decompressing the Zig tarballs from here.
Fixed!
I have now downloaded all 0.8.0 files and decompressed it with the gxz tool, which uses bufio.Reader and there were no problems to decompress all of them.
Please provide:
Oh you're asking for the failing one, sorry, that wasn't clear - I thought you were asking for one of the slow ones.
This is the one that fails. Interestingly it also fails with github.com/xi2/xz
Hi, this a deb file, which is an ar file. You must do the following:
$ ar xv bzip2_1.0.6-9.2_deb10u1_amd64.deb
x - debian-binary
x - control.tar.xz
x - data.tar.xz
The two xz files can easily be uncompressed and generate no issues for me. The debian-binary is a plain-text file. Infos about the deb format can be found by the manual page for deb.
I used xz to unpack Python-3.11.4.xz. Using Python 3.10 it took 4sec; using Go it took 1m55sec. So I do think Go xz has a speed issue.
I just tried github.com/therootcompany/xz and it took 5sec.
I posted this two years ago but it got deleted. here is it again. should help with the speed:
package test
import (
"archive/tar"
"bufio"
"github.com/ulikunitz/xz"
"io"
"os"
"path"
"testing"
)
const cargo = "cargo-1.54.0-x86_64-pc-windows-gnu.tar.xz"
func readFrom(r io.Reader) error {
tr := tar.NewReader(r)
for {
n, err := tr.Next()
if err == io.EOF {
break
} else if err != nil {
return err
} else if n.Typeflag != tar.TypeReg {
continue
}
os.MkdirAll(path.Dir(n.Name), os.ModeDir)
f, err := os.Create(n.Name)
if err != nil {
return err
}
defer f.Close()
f.ReadFrom(tr)
}
return nil
}
func TestUlikunitz(t *testing.T) {
f, err := os.Open(cargo)
if err != nil {
t.Fatal(err)
}
defer f.Close()
r, err := xz.NewReader(bufio.NewReader(f))
if err != nil {
t.Fatal(err)
}
if err := readFrom(r); err != nil {
t.Fatal(err)
}
}
When i tried to unzip big file (about 3 GiB size in xz and about 18 GiB unpacked) the process was too slow - only 3 GiB of 18 unpacked in about 40 min on my machine. The same file was unpacked for about 5 minutes using 7 zip tool