guzba / zippy

Pure Nim implementation of deflate, zlib, gzip and zip.
MIT License
246 stars 29 forks source link

High memory usage when decompressing tarballs #31

Open dom96 opened 2 years ago

dom96 commented 2 years ago

I have seen memory usage as high as 4gb for certain tarballs, for example https://github.com/nim-lang/csources/archive/64e34778fa7e114b4afc753c7845dee250584167.tar.gz.

guzba commented 2 years ago

Zippy works in memory only at this point and the tarball implementation stores the entire contents of the tarball in memory after being opened (since it just got fully unzipped in memory). A streaming implementation of Zippy would enable this to work without the same memory requirements. This is something I want and intend to work on, but am working on other things for now.

guzba commented 2 years ago

Update here (released in zippy >= 0.9.3)

I have reworked a lot of Zippy's internals lately and rewritten how tarball extractAll is done. This has enabled some significant improvement here:

(From echo GC_getStatistics() right after uncompressing)

previous impl + arc + release:
[GC] total memory: 3909815487
[GC] occupied memory: 3582487439

previous impl + default gc + release:
[GC] total memory: 4169059543
[GC] occupied memory: 3595771911

* current impl + arc + release:
* [GC] total memory: 1195044921
* [GC] occupied memory: 1194516745

* current impl + default gc + release:
* [GC] total memory: 1195044945
* [GC] occupied memory: 1194578289

The above csources archive is 187 MB compressed and uncompresses to 1.2 GB. Since I do still inflate everything into memory, that sets the floor for memory usage at this point. I still want to get to a fully streamed version someday but progress is progress.

Clonkk commented 11 months ago

Following up on this issues : I encounter very high memory usage when creating tarball with ".tar.gz" extension when comparing to bash "tar -czf file.tar.gz".

"tar -czf " also creates tarball smaller in size. I'm not sure what the difference is :)

Did you made any progress on the streaming version ?