webrecorder / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives
https://pypi.python.org/pypi/pywb
GNU General Public License v3.0
1.42k stars 217 forks source link

Allows `multifilewarcwriter` to write non compressed `WARC` files by selective activation. #916

Open Lisias opened 3 months ago

Lisias commented 3 months ago

That's the thing: we have file systems with transparent compression nowadays (and to think this started with Stacker on MS-DOS!), so it makes sense to use uncompressed WARC files on a BTRFS or NTFS with it activated. This commit deactivates the WARCIO gzip support when the filename does not ends with .gz, allowing the user to use these filesystems to reach the compression he wants without having to deal with uncompressing the WARC on use.

for https://github.com/webrecorder/pywb/issues/915

This code is being used for 2 months already on a linux box and btrfs using zstd:15 compression for the WARC files. The penalty on writing is negligible,the readings are perceptively faster and the compression level is way better.