ulikunitz / xz

Pure golang package for reading and writing xz-compressed files
Other
477 stars 45 forks source link

Unzipping is too slow #23

Open aegoroff opened 5 years ago

aegoroff commented 5 years ago

When i tried to unzip big file (about 3 GiB size in xz and about 18 GiB unpacked) the process was too slow - only 3 GiB of 18 unpacked in about 40 min on my machine. The same file was unpacked for about 5 minutes using 7 zip tool

ulikunitz commented 5 years ago

Thank you for reporting. This is expected and I have following language in README.md:

At this time the package cannot compete with the xz tool regarding compression speed and size.

I haven't found the time so far to work on code optimization. On the plus side there is a lot of potential on improving the situation. Unfortunately I cannot promise when I will work on it.

ulikunitz commented 3 years ago

There is work ahead. I left the issue open.

alecthomas commented 3 years ago

I just ran into slow decompression and the (partial) solution is to wrap your reader in bufio.NewReader(). It turns out this library uses ReadByte() a great deal and on unbuffered input this is incredibly slow.

I say "partial" as unfortunately this fails on some inputs with

writeMatch: distance out of range

Very weird that it fails when buffered but works when unbuffered..

ulikunitz commented 3 years ago

Yes, the library doesn't implement its own buffering and because it uses ReadByte it benefits from buffered readers. I should have documented it.

Rationale at the time has been that I wanted to use a buffered reader only if there is a need for it. For instance I didn't want to use a buffered reader for a bytes.Buffer.

A buffered reader shouldn't make a difference for the reading process. The gxz tool is using a buffered reader and I have run extensive tests for it.

Can you provide the file that you want to decompress?

alecthomas commented 3 years ago

Sure, I was decompressing the Zig tarballs from here.

alecthomas commented 3 years ago

Fixed!

ulikunitz commented 3 years ago

I have now downloaded all 0.8.0 files and decompressed it with the gxz tool, which uses bufio.Reader and there were no problems to decompress all of them.

Please provide:

alecthomas commented 3 years ago

Oh you're asking for the failing one, sorry, that wasn't clear - I thought you were asking for one of the slow ones.

alecthomas commented 3 years ago

This is the one that fails. Interestingly it also fails with github.com/xi2/xz

ulikunitz commented 3 years ago

Hi, this a deb file, which is an ar file. You must do the following:

$ ar xv bzip2_1.0.6-9.2_deb10u1_amd64.deb 
x - debian-binary
x - control.tar.xz
x - data.tar.xz

The two xz files can easily be uncompressed and generate no issues for me. The debian-binary is a plain-text file. Infos about the deb format can be found by the manual page for deb.

mark-summerfield commented 1 year ago

I used xz to unpack Python-3.11.4.xz. Using Python 3.10 it took 4sec; using Go it took 1m55sec. So I do think Go xz has a speed issue.

I just tried github.com/therootcompany/xz and it took 5sec.

ghost commented 1 year ago

I posted this two years ago but it got deleted. here is it again. should help with the speed:

package test

import (
   "archive/tar"
   "bufio"
   "github.com/ulikunitz/xz"
   "io"
   "os"
   "path"
   "testing"
)

const cargo = "cargo-1.54.0-x86_64-pc-windows-gnu.tar.xz"

func readFrom(r io.Reader) error {
   tr := tar.NewReader(r)
   for {
      n, err := tr.Next()
      if err == io.EOF {
         break
      } else if err != nil {
         return err
      } else if n.Typeflag != tar.TypeReg {
         continue
      }
      os.MkdirAll(path.Dir(n.Name), os.ModeDir)
      f, err := os.Create(n.Name)
      if err != nil {
         return err
      }
      defer f.Close()
      f.ReadFrom(tr)
   }
   return nil
}

func TestUlikunitz(t *testing.T) {
   f, err := os.Open(cargo)
   if err != nil {
      t.Fatal(err)
   }
   defer f.Close()
   r, err := xz.NewReader(bufio.NewReader(f))
   if err != nil {
      t.Fatal(err)
   }
   if err := readFrom(r); err != nil {
      t.Fatal(err)
   }
}