ulikunitz / xz

Pure golang package for reading and writing xz-compressed files
Other
484 stars 45 forks source link

high allocation ratio #47

Open yuvalgut opened 2 years ago

yuvalgut commented 2 years ago

I have a processing scenario where I read lzma objects and I need to decompress them. while using pprof I could see that the lzma reader allocates buffers for every message: 0 0% 0.0046% 433804685.70MB 96.50% github.com/ulikunitz/xz/lzma.NewReader (inline) 4122.21MB 0.00092% 0.0056% 433804685.70MB 96.50% github.com/ulikunitz/xz/lzma.ReaderConfig.NewReader 2414.61MB 0.00054% 0.0061% 432805222.15MB 96.28% github.com/ulikunitz/xz/lzma.newDecoderDict (inline) 432802807.54MB 96.28% 96.28% 432802807.54MB 96.28% github.com/ulikunitz/xz/lzma.newBuffer (inline)

can we add some option for allowing to have a pool of that buffer? or some other way to reuse a reader?

ulikunitz commented 2 years ago

Why is that a problem? The buffer is allocated once per LZMA object and collected by the GC. You can control the size of the buffer, while creating the LZMA object.

yuvalgut commented 2 years ago

Hi thanks for the response! from the reader simple test: r, err := NewReader(xz) if err != nil { t.Fatalf("NewReader error %s", err) } var buf bytes.Buffer if _, err = io.Copy(&buf, r); err != nil { t.Fatalf("io.Copy error %s", err) } when r, err := NewReader(xz) is called the dict buffer gets allocated. then we call io.Copy(&buf, r) which reads the uncompressed data into the 'client' buffer. so now we have the dict buffer already allocated - we could have used it in order to decompress another lzma data but there is no 'reset' option, so we have to recreate a reader with NewReader(xz) which will allocate another dict buffer instead of using the one we already allocated and used.

let me know if that makes sense thanks again

ulikunitz commented 2 years ago

I'm currently reworking the LZMA package to support parallel & faster compression and faster decompression. I will look into Reset options.