webrecorder / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives
https://pypi.python.org/pypi/pywb
GNU General Public License v3.0
1.41k stars 217 forks source link

Allow non compressed `WARC` files to be used while recording #915

Open Lisias opened 3 months ago

Lisias commented 3 months ago

Modern file systems now have compression schemes that surpasses userland compression schemes in convenience and sometimes even on efficiency, rendering the current use of gzip inconvenient when such filesystems are used.

As a use case, BTRFS with zstd:15 gave me excellent results on compression, surpassing what gzip could do, with faster read access. At very least, I don't need to recompress the WARCball after a recording session.

Ideally it should be possible to allow the user to choose if they are want to use the warcio gzip support, or prefer to rely on the file system for such - preventing the user to uncompress the thing manually to take advantage of the file system compression.