Closed immerrr closed 8 years ago
Looks good, thanks @immerrr ! Could you share your benchmarking results?
The timings are as follows:
$ time python test_gzip.py ds_dump_US_1.jl.gz
1691290714
real 0m16.099s
user 0m15.968s
sys 0m0.116s
$ time python test_gzip_buf.py ds_dump_US_1.jl.gz
1691290714
real 0m12.040s
user 0m11.904s
sys 0m0.120s
With test_gzip.py being:
import gzip
import sys
total_bytes = 0
with gzip.open(sys.argv[1], 'rb') as f:
for l in f:
total_bytes += len(l)
print(total_bytes)
and test_gzip_buf.py
being:
import gzip
import sys
import io
total_bytes = 0
with gzip.open(sys.argv[1], 'rb') as f:
with io.BufferedReader(f) as bf:
for l in bf:
total_bytes += len(l)
print(total_bytes)
neat, thank you!
GzipFile does buffer-related stuff, such as
GzipFile.readline
in pure Python and does it quite slowly, unlikeio.BufferedReader
which does it in C.Check out