Closed mxmlnkn closed 2 years ago
Archivemount does not use zlib directly but libarchive, hence the name. But libarchive seems to use zlib, which makes some of the timing differences a bit weirder.
Here are some more low-level timings for decompressing 397MiB gzipped data to to 512 MiB base64-encoded random data (so basically pure Huffman-encoding with only a few LZ77 backreferences):
base64 /dev/urandom | head -c $(( 512*1024*1024 )) > small
gzip -k small
tar -cf small.tar small
gzip small.tar
Running my self-written benchmark inside the indexed_bzip2
repository:
Decompressing with zlib: min: 2.19763s max: 2.20379 s, min: 189.079 MB/s max: 189.608 MB/s
Decompressing with libarchive: min: 1.97612s max: 1.98285 s, min: 210.147 MB/s max: 210.863 MB/s
Decompressing with custom decoder (pragzip): min: 8.78324s max: 9.04052 s, min: 46.0913 MB/s max: 47.4414 MB/s
Some benchmarks via the command line:
python3 -c 'import indexed_gzip as igz; import time; t0 = time.time(); igz.open("small.gz").read(); print(f"Decompression took {time.time() - t0:.3f} s")'
Decompression took 4.594 s
python3 -c 'import gzip; import time; t0 = time.time(); gzip.open("small.gz").read(); print(f"Decompression took {time.time() - t0:.3f} s")'
Decompression took 3.069 s
time python3 -m pgzip -o - small.gz | wc -c
real 0m3.289s
time archivemount small.tar.gz mountpoint/
real 0m1.657s
time ( fuse-archive small.tar.gz mountpoint && stat mountpoint &>/dev/null; )
real 0m1.697s
time gzip -d -k -c small.gz | wc -c
real 0m3.510s
time pigz -d -k -c small.gz | wc -c
real 0m2.225s
time bgzip -d -c small.gz | wc -c
real 0m2.548s
time igzip -d -c small.gz | wc -c
real 0m1.523s
time minigzip-zlib-ng -d -k -c small.gz | wc -c
real 0m1.903s
time gunzip-libdeflate -d -k -c small.gz | wc -c
real 0m1.841s
time pugz small.gz | wc -c
real 0m2.372s
time pugz -t 4 small.gz | wc -c
using 4 threads for decompression (experimental)
real 0m0.975s
time pugz -t $( nproc ) small.gz | wc -c
using 24 threads for decompression (experimental)
real 0m0.985s
Some observations:
indexed_gzip
is still more than twice as slow compared to all but gzip itself and my prototype custom gzip decoder.Repeat benchmarks with tarred and gzipped Silesia corpus like lzbench uses.
make benchmarkGzip && tests/benchmarkGzip silesia.tar.gz
Decompressing 68238807 B to 211957760 B with zlib: min: 0.643727s max: 0.652675 s, min: 104.553 MB/s max: 106.006 MB/s
Decompressing 68238807 B to 211957760 B with libarchive: min: 0.545091s max: 0.554049 s, min: 123.164 MB/s max: 125.188 MB/s
Decompressing 68238807 B to 211957760 B with custom decoder (pragzip): min: 0.127799s max: 0.129439 s, min: 527.189 MB/s max: 533.956 MB/s
python3 -c 'import indexed_gzip as igz; import time; t0 = time.time(); igz.open("silesia.tar.gz").read(); print(f"Decompression took {time.time() - t0:.3f} s")'
Decompression took 1.249 s
python3 -c 'import gzip; import time; t0 = time.time(); gzip.open("silesia.tar.gz").read(); print(f"Decompression took {time.time() - t0:.3f} s")'
Decompression took 0.908 s
time python3 -m pgzip -o - silesia.tar.gz | wc -c
real 0m1.837s
time archivemount small.tar.gz mountpoint/
real 0m1.663s
time ( fuse-archive small.tar.gz mountpoint && stat mountpoint &>/dev/null; )
real 0m1.629s
time gzip -d -k -c silesia.tar.gz | wc -c
real 0m1.000s
time pigz -d -k -c silesia.tar.gz | wc -c
real 0m0.622s
time bgzip -d -c silesia.tar.gz | wc -c
real 0m0.700s
time igzip -d -c silesia.tar.gz | wc -c
real 0m0.357s
time pugz silesia.tar.gz | wc -c
terminate called after throwing an instance of 'gzip_error'
what(): INVALID_LITERAL
0
real 0m0.078s
user 0m0.005s
sys 0m0.006s
time minigzip-zlib-ng -d -k -c silesia.tar.gz | wc -c
real 0m0.443s
time gunzip-libdeflate -d -k -c silesia.tar.gz | wc -c
real 0m0.403s
time crc32 silesia.tar
78e42bf0
real 0m0.236s
Some observations on Silesia decompression:
indexed_gzip
is still twice as slow as zlibHi @mxmlnkn, I haven't spent any time optimising the indexed_gzip codebase, so I'm sure performance could be improved. The core of indexed_gzip is essentially an adaptation of the original zran.c
example that ships with the zlib source code. It bears little resemblance to the original version, but it does share the same fundamental design for reading/inflation.
The conda-forge packages are built with -O2
. I think GHA has deleted the logs for the pypi builds, so I'm not sure about them.
However, using indexed_gzip purely for a one-off decompress isn't intended usage, so I'm not sure if these comparisons makes a lot of sense. The point of indexed_gzip is for improved random seek time, after an index has been built.
(edited - added example code)
If you are interested in "raw" performance, you could adjust your benchmark code to disable the CRC32 check, increased read buffer sizes, and with a seek point spacing larger than the size of the uncompressed data, which would effectively disable creation of the index. e.g.:
import indexed_gzip as igzip
IndexedGzipFile.open('file.gz', spacing=2**30, readbuf_size=2**30, buffer_size=2**30).read()
This should cause the read to be performed via a single call to the zlib inflate
function.
I get some very weird results when testing out these parameters. When I increase the buffers or even just the spacing, then it takes a lot more time. This might explain why I observed the 10x slower speed with ratarmount! I think, I changed the default spacing to 16 MiB.
python3 -m pip install --user pgzip indexed_gzip
python3 -c '
import indexed_gzip as igz; import time; t0 = time.time();
igz.open("small.gz").read();
print(f"Decompression took {time.time() - t0:.3f} s")'
# Decompression took 4.666 s
python3 -c '
import indexed_gzip as igz; import time; t0 = time.time();
igz.IndexedGzipFile("small.gz", spacing=2**30, readbuf_size=2**30, buffer_size=2**30).read();
print(f"Decompression took {time.time() - t0:.3f} s")'
# Floating point exception
python3 -c '
import indexed_gzip as igz; import time; t0 = time.time();
igz.open("small.gz", spacing=int(1*1024**2), readbuf_size=int(1*1024**2), buffer_size=int(1*1024**2)).read();
print(f"Decompression took {time.time() - t0:.3f} s")'
# Decompression took 4.286 s
python3 -c '
import indexed_gzip as igz; import time; t0 = time.time();
igz.open("small.gz", spacing=int(32*1024**2), readbuf_size=int(1*1024**2), buffer_size=int(1*1024**2)).read();
print(f"Decompression took {time.time() - t0:.3f} s")'
# Decompression took 7.407 s
python3 -c '
import indexed_gzip as igz; import time; t0 = time.time();
igz.open("small.gz", spacing=int(128*1024**2), readbuf_size=int(1024**2), buffer_size=int(128*1024**2)).read();
print(f"Decompression took {time.time() - t0:.3f} s")'
# Decompression took 13.290 s
python3 -c '
import indexed_gzip as igz; import time; t0 = time.time();
igz.IndexedGzipFile("small.gz", spacing=16*1024**2).read();
print(f"Decompression took {time.time() - t0:.3f} s")'
# Decompression took 6.403 s
Note also the weird floating point exception when I try the large buffer / spacing sizes you recommended.
Also, I think there might be a problem with the buffers because even 512 MiB buffers should not slow down a program that much. Except maybe if you are clearing them to zeros without using memset or something like that..
That does sound strange - I'll look into it when I get a minute...
So I did try to look at the code and did some benchmarks and I arrived at the conclusion that the underlying problem is that zran_read
always tries to seek from an index point even if no seek was actually applied since the last call to zran_read
. But I don't feel up to the task of fixing it because that would mean checking when seeking is actually necessary and also storing the zlib state of the last call. I'm not sure whether this information is already available or needs to be added somewhere.
As a quick workaround, zran_read
should only be called to decode buffers much longer than the index spacing, which might be easily doable in Python by wrapping IndexedGzipFile in an io.BufferedReader
with buffer size ~ 2-10x index spacing.
The above explanation does still not explain why runtimes get worse for larger readbuf_size
. But, I saw a lot of heuristics like data_size = 4 * spacing
in zran_expand_index
, which shouldn't even be necessary, a hardcoded data_size = 128 * 1024
should also work and be faster and more memory-efficient. Similarly, there were a lot of usages of calloc
, where I think malloc
should have sufficed. I'm not sure what the performance difference between these two are, but calloc
sounds slower.
Here are some benchmarks I did for varying configurations. It can be seen that calls with 4 KiB even slows down Python's GzipFile, but everything above 32 KiB should be sufficiently large reads for optimal speed. This is a stark difference to IndexedGzipFile which might lead to performance bugs like I observed with ratarmount. With large enough reads (in the limit, reading the whole sample file at once), the speeds are good though.
1.086 s Read 4 KiB chunks from GzipFile
0.817 s Read 32 KiB chunks from GzipFile
0.774 s Read 128 KiB chunks from GzipFile
0.760 s Read 1024 KiB chunks from GzipFile
0.811 s Read 32768 KiB chunks from GzipFile
0.811 s Read 131072 KiB chunks from GzipFile
1.722 s Read 4 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=0)
1.487 s Read 32 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=0)
1.497 s Read 128 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=0)
1.470 s Read 1024 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=0)
1.032 s Read 32768 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=0)
0.739 s Read 131072 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=0)
1.788 s Read 4 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=0)
1.467 s Read 32 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=0)
1.454 s Read 128 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=0)
1.450 s Read 1024 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=0)
1.036 s Read 32768 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=0)
0.739 s Read 131072 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=0)
9.044 s Read 4 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB)
8.957 s Read 32 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB)
8.902 s Read 128 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB)
8.854 s Read 1024 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB)
1.413 s Read 32768 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB)
0.733 s Read 131072 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB)
9.056 s Read 4 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB)
8.824 s Read 32 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB)
8.834 s Read 128 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB)
8.886 s Read 1024 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB)
1.415 s Read 32768 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB)
0.736 s Read 131072 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB)
13.596 s Read 4 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB, readbuf_size=uncompressed_size)
13.536 s Read 32 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB, readbuf_size=uncompressed_size)
13.491 s Read 128 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB, readbuf_size=uncompressed_size)
13.500 s Read 1024 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB, readbuf_size=uncompressed_size)
1.697 s Read 32768 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB, readbuf_size=uncompressed_size)
0.787 s Read 131072 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB, readbuf_size=uncompressed_size)
13.745 s Read 4 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB, readbuf_size=uncompressed_size)
13.518 s Read 32 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB, readbuf_size=uncompressed_size)
13.451 s Read 128 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB, readbuf_size=uncompressed_size)
13.468 s Read 1024 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB, readbuf_size=uncompressed_size)
1.669 s Read 32768 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB, readbuf_size=uncompressed_size)
0.770 s Read 131072 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB, readbuf_size=uncompressed_size)
I saw that you are already using a BufferedReader and that there even is a buffer_size
argument for IndexedGzipFile
. I added a better default in #90 if it has not been specified, which increases performance by ~10x. Here are the results for buffer-size = 4 * spacing
.
1.157 s Read 4 KiB chunks from GzipFile
0.816 s Read 32 KiB chunks from GzipFile
0.781 s Read 128 KiB chunks from GzipFile
0.773 s Read 1024 KiB chunks from GzipFile
0.811 s Read 32768 KiB chunks from GzipFile
0.812 s Read 131072 KiB chunks from GzipFile
1.369 s Read 4 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=0)
1.181 s Read 32 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=0)
1.162 s Read 128 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=0)
1.167 s Read 1024 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=0)
1.026 s Read 32768 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=0)
0.728 s Read 131072 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=0)
1.321 s Read 4 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=0)
1.169 s Read 32 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=0)
1.150 s Read 128 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=0)
1.157 s Read 1024 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=0)
1.040 s Read 32768 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=0)
0.723 s Read 131072 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=0)
1.028 s Read 4 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB)
0.761 s Read 32 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB)
0.741 s Read 128 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB)
0.736 s Read 1024 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB)
0.783 s Read 32768 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB)
0.734 s Read 131072 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB)
1.044 s Read 4 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB)
0.766 s Read 32 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB)
0.743 s Read 128 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB)
0.735 s Read 1024 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB)
0.780 s Read 32768 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB)
0.732 s Read 131072 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB)
1.090 s Read 4 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB, readbuf_size=uncompressed_size)
0.807 s Read 32 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB, readbuf_size=uncompressed_size)
0.783 s Read 128 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB, readbuf_size=uncompressed_size)
0.781 s Read 1024 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB, readbuf_size=uncompressed_size)
0.814 s Read 32768 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB, readbuf_size=uncompressed_size)
0.758 s Read 131072 KiB chunks from IndexedGzipFile(drop_handles=True, spacing=32 MiB, readbuf_size=uncompressed_size)
1.056 s Read 4 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB, readbuf_size=uncompressed_size)
0.803 s Read 32 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB, readbuf_size=uncompressed_size)
0.775 s Read 128 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB, readbuf_size=uncompressed_size)
0.773 s Read 1024 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB, readbuf_size=uncompressed_size)
0.825 s Read 32768 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB, readbuf_size=uncompressed_size)
0.775 s Read 131072 KiB chunks from IndexedGzipFile(drop_handles=False, spacing=32 MiB, readbuf_size=uncompressed_size)
Hello,
I noticed that
indexed_gzip
seems to be 10 times slower than archivemount and fuse-archive when decoding large gzipped TARs with ratarmount. I would think that both use zlib, which makes this surprising to me. Are you compiling with-O3
? If not, would that help? Or maybe are there other bottlenecks like storing the windows? I just think that this 10x difference would be really helpful to improve upon. I'm pretty sure that it isn't ratarmount who is at fault here but some benchmarks using onlyindexed_gzip
or even the C backend directly still might be helpful for performance "debugging".