Closed CrendKing closed 3 years ago
@CrendKing pgzip is for compressing large streams, for small payloads it doesn't make sense to use.
You are welcome to send in a PR that addresses this, but I can't spend time on it for the foreseeable future.
I see. I reran the benchmark with the full 400MB file while repeating 10 times. I was expecting an explosive number on the pgzip line. Instead, I find pgzip is in line with other compressors (see output below). So it seems pgzip basically has "plateau" pattern (like "allocate all memory needed early and keep recycling them"), while s2 has "linear" pattern ("allocate as more required").
So, as you mentioned, pgzip is optimized for large streams. If our users mostly compress small data blocks, they should consider other compressors. Thanks!
Hello. I'm from the Kopia project. We use a bunch of your compressors in our repo. Kopia has a benchmark command that given an input file, we run all the compressors on it and report metrics, such as compression ratio, throughput and memory consumption. pgzip seems to have huge stats comparing to, say s2.
The memory consumption stats is calculated by calling
runtime.ReadMemStats()
before and after the compression loop, then compare the delta. Note that this is not about memory leak, just allocation.Baseline: compressing a 400MB highly compressible file just once. All compressors behave similarly
``` Repeating 1 times per compression method (total 466.7 MiB). Compression Compressed Throughput Memory Usage ------------------------------------------------------------------------------------------------ 0. s2-default 127.1 MiB 4 GiB/s 3126 375.4 MiB 1. s2-better 120.1 MiB 3.4 GiB/s 2999 351.7 MiB 2. s2-parallel-8 127.1 MiB 2.8 GiB/s 2981 362.2 MiB 3. s2-parallel-4 127.1 MiB 2.3 GiB/s 2951 344.1 MiB 4. pgzip-best-speed 96.7 MiB 2.1 GiB/s 4127 324.1 MiB 5. pgzip 86.3 MiB 1.2 GiB/s 4132 298.7 MiB 6. lz4 131.8 MiB 458.9 MiB/s 17 321.7 MiB 7. zstd-fastest 79.8 MiB 356.2 MiB/s 22503 246 MiB 8. zstd 76.8 MiB 323.7 MiB/s 22605 237.8 MiB 9. deflate-best-speed 96.7 MiB 220.8 MiB/s 45 310.8 MiB 10. gzip-best-speed 94.9 MiB 165 MiB/s 40 305.2 MiB 11. deflate-default 86.3 MiB 143.1 MiB/s 34 311 MiB 12. zstd-better-compression 74.2 MiB 104 MiB/s 22496 251.4 MiB 13. pgzip-best-compression 83 MiB 55.9 MiB/s 4359 299.1 MiB 14. gzip 83.6 MiB 40.5 MiB/s 69 304.8 MiB 15. zstd-best-compression 68.9 MiB 19.2 MiB/s 22669 303.4 MiB 16. deflate-best-compression 83 MiB 5.6 MiB/s 134 311 MiB 17. gzip-best-compression 83 MiB 5.1 MiB/s 137 304.8 MiB ```Compressing the first 128KB of the same file but repeat 10 times, you can see the higher memory consumption of pgzip among compressors
``` Repeating 10 times per compression method (total 1.2 MiB). Compression Compressed Throughput Memory Usage ------------------------------------------------------------------------------------------------ 0. s2-default 43.6 KiB 625.3 MiB/s 71 2.1 MiB 1. s2-parallel-4 43.6 KiB 625.3 MiB/s 67 2.1 MiB 2. s2-parallel-8 43.6 KiB 624.5 MiB/s 67 2.1 MiB 3. s2-better 41.3 KiB 416.8 MiB/s 72 2.1 MiB 4. deflate-best-speed 34.3 KiB 208.3 MiB/s 22 874.6 KiB 5. zstd-fastest 28.6 KiB 178.6 MiB/s 160 9.4 MiB 6. lz4 44.7 KiB 178.5 MiB/s 38 88.6 MiB 7. gzip-best-speed 33.7 KiB 138.9 MiB/s 28 1.2 MiB 8. deflate-default 31.2 KiB 125 MiB/s 22 1.1 MiB 9. zstd 26.8 KiB 113.6 MiB/s 174 18.4 MiB 10. pgzip-best-speed 34.3 KiB 113.6 MiB/s 252 27.3 MiB 11. zstd-better-compression 26.3 KiB 96.2 MiB/s 156 37.2 MiB 12. pgzip 31.2 KiB 74.5 MiB/s 342 31.7 MiB 13. gzip 30.4 KiB 39.1 MiB/s 26 874.7 KiB 14. deflate-best-compression 30.4 KiB 25.5 MiB/s 21 1 MiB 15. gzip-best-compression 30.4 KiB 24 MiB/s 26 874.7 KiB 16. pgzip-best-compression 30.4 KiB 23.2 MiB/s 285 30.2 MiB 17. zstd-best-compression 25.1 KiB 16.9 MiB/s 155 99.2 MiB ```Repeating 100 times. s2 has exactly same stats, while pgzip grows accordingly
``` Repeating 100 times per compression method (total 12.5 MiB). Compression Compressed Throughput Memory Usage ------------------------------------------------------------------------------------------------ 0. s2-parallel-4 43.6 KiB 833.4 MiB/s 533 2.1 MiB 1. s2-parallel-8 43.6 KiB 833.3 MiB/s 555 2.1 MiB 2. s2-default 43.6 KiB 833.3 MiB/s 579 2.1 MiB 3. s2-better 41.3 KiB 500 MiB/s 610 2.1 MiB 4. zstd-fastest 28.6 KiB 240.4 MiB/s 925 9.5 MiB 5. deflate-best-speed 34.3 KiB 198.4 MiB/s 22 874.6 KiB 6. zstd 26.8 KiB 165.4 MiB/s 907 18.5 MiB 7. zstd-better-compression 26.3 KiB 162.3 MiB/s 881 37.3 MiB 8. gzip-best-speed 33.7 KiB 150.6 MiB/s 28 1.2 MiB 9. pgzip-best-speed 34.3 KiB 143.7 MiB/s 1649 220.2 MiB 10. deflate-default 31.2 KiB 126.3 MiB/s 22 1.1 MiB 11. lz4 44.7 KiB 112.6 MiB/s 435 816.7 MiB 12. pgzip 31.2 KiB 94.6 MiB/s 2634 277.5 MiB 13. gzip 30.4 KiB 39.5 MiB/s 26 874.7 KiB 14. deflate-best-compression 30.4 KiB 25.4 MiB/s 21 1 MiB 15. gzip-best-compression 30.4 KiB 24.5 MiB/s 27 874.9 KiB 16. pgzip-best-compression 30.4 KiB 23.1 MiB/s 2646 281.8 MiB 17. zstd-best-compression 25.1 KiB 19.3 MiB/s 882 99.3 MiB ```I did some experiments around
SetConcurrency()
and found that: 1) The consumption grows slowly asblocks
increases, and exponentially asblockSize
increases, possibly due toz.dstPool.New = func() interface{} { return make([]byte, 0, blockSize+(blockSize)>>4) }
line. 2) Even by just creating a new writer and immediately close it, the allocation still happens, possibly due to the internalcompressCurrent()
.Is there a bug here? Why allocate memory when no data is compressed? And can
Reset()
reuse previously allocated memory instead of creating new (like s2)?