python / cpython

The Python programming language
https://www.python.org
Other
63.5k stars 30.41k forks source link

File mode wb+ appears as rb+ #69528

Open 554cef86-861b-46db-b1bb-e50b2576ab0f opened 9 years ago

554cef86-861b-46db-b1bb-e50b2576ab0f commented 9 years ago
BPO 25341
Nosy @warsaw, @pitrou, @benjaminp, @serhiy-storchaka, @markrwilliams, @zhangyangyu, @iritkatriel
Files
  • file_mode.patch: treat wb+ and rb+ differently
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['interpreter-core', 'type-bug', 'library', 'expert-IO', '3.11'] title = 'File mode wb+ appears as rb+' updated_at = user = 'https://github.com/markrwilliams' ``` bugs.python.org fields: ```python activity = actor = 'iritkatriel' assignee = 'none' closed = False closed_date = None closer = None components = ['Interpreter Core', 'Library (Lib)', 'IO'] creation = creator = 'Mark.Williams' dependencies = [] files = ['40739'] hgrepos = [] issue_num = 25341 keywords = ['patch'] message_count = 6.0 messages = ['252518', '252696', '252697', '252698', '264051', '407081'] nosy_count = 10.0 nosy_names = ['barry', 'pitrou', 'benjamin.peterson', 'stutzbach', 'Arfrever', 'mahmoud', 'serhiy.storchaka', 'Mark.Williams', 'xiang.zhang', 'iritkatriel'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue25341' versions = ['Python 3.11'] ```

    554cef86-861b-46db-b1bb-e50b2576ab0f commented 9 years ago

    There is at least one mode in which a file can be opened that cannot be represented in its mode attribute: wb+. This mode instead appears as 'rb+' in the mode attribute:

    Python 3.5.0 (default, Oct  3 2015, 10:40:38)
    [GCC 4.2.1 Compatible FreeBSD Clang 3.4.1 (tags/RELEASE_34/dot1-final 208032)] on freebsd10
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os
    >>> if os.path.exists('some_file'): os.unlink('some_file')
    ...
    >>> with open('some_file', 'r+b') as f: print(f.mode)
    ...
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    FileNotFoundError: [Errno 2] No such file or directory: 'some_file'
    >>> with open('some_file', 'w+b') as f: print(f.mode)
    ...
    rb+
    >>> with open('some_file', 'r+b') as f: print(f.mode)
    rb+

    This means code that interacts with file objects cannot trust the mode of binary files. For example, you can't use tempfile.TemporaryFile (the mode argument of which defaults to 'wb+') and GzipFile:

    >>> import gzip
    >>> from tempfile import TemporaryFile
    >>> with TemporaryFile() as f:
    ...     gzip.GzipFile(fileobj=f).write(b'test')
    ...
    Traceback (most recent call last):
      File "<stdin>", line 2, in <module>
      File "/usr/local/lib/python3.5/gzip.py", line 249, in write
        raise OSError(errno.EBADF, "write() on read-only GzipFile object")
    OSError: [Errno 9] write() on read-only GzipFile object

    This occurs because without a mode argument passed to its initializer, GzipFile checks that the fp object's mode starts with 'w', 'a', or 'x'.

    For the sake of completeness/searchability: w+ and r+ are different modes, so rb+ and wb+ must be different modes. Per https://docs.python.org/3/library/functions.html#open :

    """ For binary read-write access, the mode 'w+b' opens and truncates the file to 0 bytes. 'r+b' opens the file without truncation. """

    I haven't been able to test this on Windows, but I expect precisely the same behavior given my understanding of the relevant source.

    _io_FileIOinitimpl in _io/fileio.c does the right thing and includes O_CREAT and O_TRUNC in the open(2) flags upon seeing 'w' in the mode:

    https://hg.python.org/cpython/file/3.5/Modules/_io/fileio.c#l324

    this ensures correct interaction with the file system. But it also sets self->readable and self->writable upon seeing '+' in the mode:

    https://hg.python.org/cpython/file/3.5/Modules/_io/fileio.c#l341

    The open flags are not retained. Consequently, when the mode attribute is accessed and the get_mode calls the mode_string function, the instance has insufficient information to differentiate between 'rb+' and 'wb+':

    https://hg.python.org/cpython/file/3.5/Modules/_io/fileio.c#l1043

    If the FileIO instance did retain the 'flags' variable that's declared and set in its initializer, then mode_string could use it to determine the difference between wb+ and rb+.

    I would be happy to write a patch for this.

    zhangyangyu commented 9 years ago

    I think Mark is right. Since wb+ and rb+ have different behaviours they should be treat separately.

    But this behaviour treating wb+ and rb+ as the same is well tested and seems to intended to do so.

    554cef86-861b-46db-b1bb-e50b2576ab0f commented 9 years ago

    Python's test suite may test the current behavior but that does not lessen the problem.

    I gave an example of apparently correct code that fails (that was actually encountered by a Python user) in my original description. Another such example: you cannot duplicate a file object -- same path, same mode --- and be sure that the duplicate is a true duplicate. Data corruption could occur in application code if the duplicated file were opened "rb+" instead of "wb+", as the duplicate would not truncate existing data.

    Another way to think about the problem is accuracy of intent. The mode attribute on file objects can be incorrect, and by "incorrect" I mean "not describe the mode under which the file was opened." Why have a mode attribute at all, then? I, for one, would prefer *no* mode attribute to one that's sometimes incorrect. But a correct one is even better!

    On Sat, Oct 10, 2015 at 1:27 AM, Xiang Zhang \report@bugs.python.org\ wrote:

    Xiang Zhang added the comment:

    I think Mark is right. Since wb+ and rb+ have different behaviours they should be treat separately.

    But this behaviour treating wb+ and rb+ as the same is well tested and seems to intended to do so.

    ---------- nosy: +xiang.zhang


    Python tracker \report@bugs.python.org\ \http://bugs.python.org/issue25341\


    zhangyangyu commented 9 years ago

    I make a patch which now identifies the difference between wb+ and rb+, and modifies the corresponding tests. Though I don't know whether this need to be fixed.

    serhiy-storchaka commented 8 years ago

    But this behaviour treating wb+ and rb+ as the same is well tested and seems to intended to do so.

    I think this is not intended behavior. Tests just test that the current behavior is not changed accidentally. If I'm right, the patch LGTM. But since third-party code can depend on this behavior, I would fix it only in 3.6.

    Tests were added in bpo-4362 and Barry asked the same question about "w+" (msg76134).

    Barry, Benjamin, what are you think about this now?

    iritkatriel commented 2 years ago

    Reproduced on 3.11.

    iritkatriel commented 1 year ago
    >>> import gzip
    >>> from tempfile import TemporaryFile
    >>> with TemporaryFile() as f:
    ...     gzip.GzipFile(fileobj=f).write(b'test')
    ...
    Traceback (most recent call last):
      File "<stdin>", line 2, in <module>
      File "/usr/local/lib/python3.5/gzip.py", line 249, in write
        raise OSError(errno.EBADF, "write() on read-only GzipFile object")
    OSError: [Errno 9] write() on read-only GzipFile object

    This occurs because without a mode argument passed to its initializer, GzipFile checks that the fp object's mode starts with 'w', 'a', or 'x'.

    I think this is a problem in gzip. It has only READ and WRITE modes, and it ignores the '+'.