sergey-dryabzhinsky / python-zstd

Simple python bindings to Yann Collet ZSTD compression library
BSD 2-Clause "Simplified" License
165 stars 27 forks source link

Stable / reproducible output with automatic thread scaling #68

Closed tasket closed 3 years ago

tasket commented 3 years ago

I'm adding zstd support to Wyng backup using your module. I would prefer not to specify the number of threads when calling compress() so that compression can scale with the number of cores (and so I can also avoid using a lambda for compress), however that leaves open the question of whether python-zstd will then sometimes use the zstandard single-threaded mode on single-core CPUs.

My technical requirement is to compress data chunks in a reproducible way to enable deduplication during the backup process. This means always using the zstandard multi-threaded mode, even on single-core CPUs. Automatic switching between the single-threaded and multi-threaded code is what I need to avoid.

I read through the Readme and issue #48 looking for indications about the modules behavior under these conditions, but didn't find any. What I'm looking for is guidance on exactly when python-zstd uses single-threaded mode, if at all, so I can avoid it.

sergey-dryabzhinsky commented 3 years ago
  1. compress() use all cores if number of threads not specified.
  2. python-zstd always compile/use libzstd in multithreaded mode.
  3. any switching to single threaded mode is up to libzstd.
  4. you can set any number of threads even on single-core CPU.
tasket commented 3 years ago
  • you can set any number of threads even on single-core CPU.

This is an important point. Thank you!