python / cpython

The Python programming language
https://www.python.org
Other
62.75k stars 30.07k forks source link

bz2.BZ2File / gzip.GZipFile / lzma.LZMAFile expose misleading `fileno` method. #100066

Open Yhg1s opened 1 year ago

Yhg1s commented 1 year ago

The various compressing/decompressing file wrappers (bz2.BZ2File, gzip.GZipFile, lzma.LZMAFile) currently have fileno methods that return the underlying file descriptor: https://github.com/python/cpython/blob/0a4c82ddd34a3578684b45b76f49cd289a08740b/Lib/bz2.py#L126-L129

I imagine this was done because it seemed useful, but I'm not sure what use it is. You can't safely use things like select since the compression/decompression might buffer, and passing it to things that use the file descriptor directly will produce garbage (when reading) or corrupt the file (when writing).

An example how misleading this can be, courtesy of @ericfrederich:

>>> import bz2
>>> import subprocess
>>> with bz2.open('/tmp/out.bz2', 'w') as f:
...   subprocess.check_call(['echo', '-n', "Why doesn't this work?"], stdout=f)
...
0
>>> bz2.open('/tmp/out.bz2', 'r').read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/bz2.py", line 178, in read
    return self._buffer.read(size)
  File "/usr/lib/python3.7/_compression.py", line 103, in read
    data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream
>>> open('/tmp/out.bz2', 'rb').read()
b"Why doesn't this work?BZh9\x17rE8P\x90\x00\x00\x00\x00"

Note the (empty) bz2 data after the data written by the subprocess.

Am I missing a situation where this is actually useful? If there isn't one, can we consider adding a warning for the confusing behaviour?

fungs commented 8 months ago

Yes, this is confusing! When a file consumer (c++ based package in my case) works with the fileno() directly, and you want to add a wrapper like LZMAFile for transparent decompression, it will give you errors because the data is comproessed. A fileno should only be provided if it gives an emulation for decompressed data (like when using an os pipe object).

fungs commented 7 months ago

Duplicate of #68546