Open iii-i opened 1 year ago
On one hand, using the C code makes code faster, and it may help to solve #89672.
On other hand, see #112346. zlib
implementation can produce different result. We should solve that issue first, and ensure that similar errors will not happen here.
@serhiy-storchaka, @rhpvorderman suggested that we should use a simpler approach to achieve the same result: instead of using the new C functions, let zlib generate both gzip header and gzip trailer, and strip the gzip header. I have implemented it here: https://github.com/python/cpython/pull/112199. It passes the #114116 test. Could you please take a look?
Feature or enhancement
Replace manual gzip format handling with zlib's
inflateGetHeader()
anddeflateSetHeader()
.Pitch
RHEL, SLES and Ubuntu for IBM zSystems (aka s390x) ship with a zlib optimization [1] that significantly improves deflate and inflate performance on this platform by using a specialized CPU instruction.
This instruction not only compresses the data, but also computes a checksum. At the moment Pyhton's gzip support performs compression and checksum calculation separately, which creates unnecessary overhead on s390x.
The reason is that Python needs to write specific values into gzip header; and when this support was introduced in year 1997, there was indeed no better way to do this.
Since v1.2.2.1 (2011) zlib provides inflateGetHeader() and deflateSetHeader() functions for that, so Python does not have to deal with the exact header and trailer format anymore.
Previous discussion
https://discuss.python.org/t/read-and-write-gzip-header-and-trailer-with-zlib/25703/2
[1] https://github.com/madler/zlib/pull/410
Linked PRs