python / cpython

The Python programming language
https://www.python.org
Other
63.39k stars 30.36k forks source link

open() of read-write non-seekable streams broken #64273

Open 4370a425-28d3-42cf-bb92-ba0db7ff422b opened 10 years ago

4370a425-28d3-42cf-bb92-ba0db7ff422b commented 10 years ago
BPO 20074
Nosy @pitrou, @benjaminp, @bitdancer, @hynek, @vadmium, @serhiy-storchaka, @cagney

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', 'expert-IO'] title = 'open() of read-write non-seekable streams broken' updated_at = user = 'https://bugs.python.org/Dolda2000' ``` bugs.python.org fields: ```python activity = actor = 'ken.teh' assignee = 'none' closed = False closed_date = None closer = None components = ['IO'] creation = creator = 'Dolda2000' dependencies = [] files = [] hgrepos = [] issue_num = 20074 keywords = [] message_count = 19.0 messages = ['206957', '206960', '206961', '206980', '206981', '206982', '206983', '206991', '206998', '207001', '207002', '207003', '207010', '207012', '273258', '339809', '366115', '366117', '387929'] nosy_count = 12.0 nosy_names = ['pitrou', 'benjamin.peterson', 'stutzbach', 'r.david.murray', 'python-dev', 'hynek', 'martin.panter', 'pavlix', 'serhiy.storchaka', 'Dolda2000', 'cagney', 'ken.teh'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue20074' versions = ['Python 3.3', 'Python 3.4'] ```

4370a425-28d3-42cf-bb92-ba0db7ff422b commented 10 years ago

It seems open() is slightly broken in Python 3, in that one cannot open non-seekable files in read-write mode. One such common use is open("/dev/tty", "r+") for interacting directly with the controlling TTY regardless of standard stream redirections. Note that this is a regression for Python 2, where this worked as expected.

What happens is the following:
>>> open("/dev/tty", "r+")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
io.UnsupportedOperation: File or stream is not seekable.

Just for the record, the same thing happens with "w+" and "rb+".

This also means that the getpass module is slightly broken, since it will always fail whenever stdin is redirected.

bitdancer commented 10 years ago

I don't think getpass will fail, since it uses os.open, not open.

Presumably you can work around the problem by opening the stream twice: once for reading and once for writing. I'll leave it to Antoine to say whether or not that is in fact the expected solution, or if this is a design bug.

serhiy-storchaka commented 10 years ago

Use unbuffered binary file.

>>> open("/dev/tty", "r+b", buffering=0)
<_io.FileIO name='/dev/tty' mode='rb+'>
4370a425-28d3-42cf-bb92-ba0db7ff422b commented 10 years ago

I don't think getpass will fail, since it uses os.open, not open.

It also uses fdopen() on the resulting file descriptor, however, which has the same problem.

Use unbuffered binary file.

It's nice that that's at least possible, but I, for one, would still consider it a bug that it isn't possible to open it in text mode.

4370a425-28d3-42cf-bb92-ba0db7ff422b commented 10 years ago

Just to demonstrate failure of getpass, by the way:

$ cat >/tmp/pwtest.py
import getpass
print(getpass.getpass())
$ python3 /tmp/pwtest.py </dev/null
/usr/lib/python3.3/getpass.py:83: GetPassWarning: Can not control echo on the terminal.
  passwd = fallback_getpass(prompt, stream)
Warning: Password input may be echoed.
Password: Traceback (most recent call last):
  File "/usr/lib/python3.3/getpass.py", line 63, in unix_getpass
    old = termios.tcgetattr(fd)     # a copy to save
termios.error: (25, 'Inappropriate ioctl for device')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/pwtest.py", line 2, in <module>
    print(getpass.getpass())
  File "/usr/lib/python3.3/getpass.py", line 83, in unix_getpass
    passwd = fallback_getpass(prompt, stream)
  File "/usr/lib/python3.3/getpass.py", line 118, in fallback_getpass
    return _raw_input(prompt, stream)
  File "/usr/lib/python3.3/getpass.py", line 134, in _raw_input
    raise EOFError
EOFError
serhiy-storchaka commented 10 years ago

It's nice that that's at least possible, but I, for one, would still consider it a bug that it isn't possible to open it in text mode.

>>> io.TextIOWrapper(open("/dev/tty", "r+b", buffering=0))
<_io.TextIOWrapper name='/dev/tty' encoding='UTF-8'>

Just to demonstrate failure of getpass, by the way:

Looks as this was fixed in bpo-18116 for 3.4. David, perhaps bpo-18116 should be backported to older Python versions.

4370a425-28d3-42cf-bb92-ba0db7ff422b commented 10 years ago
Python 3.3.3 (default, Dec  8 2013, 14:51:59) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import io
>>> io.TextIOWrapper(open("/dev/tty", "rb+"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
io.UnsupportedOperation: File or stream is not seekable.

Was this also fixed in 3.4?

serhiy-storchaka commented 10 years ago
> >>> import io
> >>> io.TextIOWrapper(open("/dev/tty", "rb+"))
> 
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> io.UnsupportedOperation: File or stream is not seekable.
buffering=0
4370a425-28d3-42cf-bb92-ba0db7ff422b commented 10 years ago

Oh sorry, my bad. I messed up. :)

Given that that works, though, why can't open() handle opening /dev/tty directly in text mode? Clearly, TextIOWrapper can handle the necessary buffering without the stream having to be seekable.

1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 10 years ago

New changeset 100f632d4306 by R David Murray in branch '3.3': bpo-18116: backport fix to 3.3 since real-world failure mode demonstrated. http://hg.python.org/cpython/rev/100f632d4306

bitdancer commented 10 years ago

Having buffering doesn't make the stream seekable. So the question is, is the *design* of the IO module that '+' requires a seekable stream the best behavior, or can that constraint be relaxed? You have to keep in mind that the IO module is a bunch of building blocks, which are plugged together automatically for the most common scenarios. The goal is a portable, consistent IO system, not one that completely mimics unix/C IO primitives.

pitrou commented 10 years ago

Having buffering doesn't make the stream seekable. So the question is, is the *design* of the IO module that '+' requires a seekable stream the best behavior, or can that constraint be relaxed?

A non-seekable read/write stream doesn't really make sense (think about it). What you may be thinking about, instead, is a pair of non-seekable streams, one readable and one writable. There is BufferedRWPair for that: http://docs.python.org/dev/library/io.html#io.BufferedRWPair

(granted, BufferedRWPair isn't wired in open(), so you have to do all the wrapping yourself)

4370a425-28d3-42cf-bb92-ba0db7ff422b commented 10 years ago

So the question is, is the *design* of the IO module that '+' requires a seekable stream the best behavior, or can that constraint be relaxed?

What purpose does that constraint serve? Is there any reason it shouldn't be relaxed?

It seems to work quite well without it in Python 2.

bitdancer commented 10 years ago

Antoine already answered that question: it does not make sense to have a single stream that is open for *update* if it is not seekable. The fact C conflates "update" with "both read and write" can be seen as a design bug in C :)

The remaining question might be: is there a sensible way (that fits with the design of the IO system) to hook BufferedRWPair up to open?

vadmium commented 8 years ago

Currently, the documentation for TextIOWrapper says it is “a buffered text stream over a BufferedIOBase binary stream.” If passing a RawIOBase (buffering=0) file works, that seems like an undocumented accident. This is also explicitly disallowed with open(). “Pass 0 to switch buffering off (only allowed in binary mode)”:

>>> open("/dev/tty", "r+t", buffering=0)
ValueError: can't have unbuffered text I/O

One benefit of requiring "r+b" mode to be seekable is that if you open a pipe for random access, it reports the error immediately, rather than after you have half written your data and then try to seek(). But I admit this difference between buffered=0 mode and random-access mode is obscure.

Given that seeking text streams is rather obscure (bpo-25849), and there are sometimes problems in conjunction with writing (bpo-12215) and with truncating (bpo-26158), it could make some sense to use BufferedRWPair in text mode. But IMO open() is already too complicated, so I am not excited about such a change.

5ce72146-384d-40c0-9c1f-3ebc748d5463 commented 5 years ago

Another example is PTY:

Python 2.7.15 (default, Oct 15 2018, 15:24:06) 
[GCC 8.1.1 20180712 (Red Hat 8.1.1-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> pty, tty = os.openpty()
>>> print(pty, tty)
(3, 4)
>>> import io
>>> rw = io.open(pty, "r+")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: File or stream is not seekable.
>>> rw = open("/dev/tty", "r+")
>>> rw.write("hi")
>>> rw.flush()
hi>>> 

Python 3.7.3+ (heads/3.7-dirty:0a16bb15af, Apr  9 2019, 13:45:22) 
[GCC 8.3.1 20190223 (Red Hat 8.3.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> pty, tty = os.openpty()
>>> import io
>>> rw = io.open(pty, "r+")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
io.UnsupportedOperation: File or stream is not seekable.
50f5ac5d-8ffb-4bd6-be06-d8fa5eb48399 commented 4 years ago

A non-seekable read/write stream doesn't really make sense (think about it).

How does it help the issue to ask the reporter to "think" when the have already provided an easily reproducable use case?

What purpose does that constraint serve? Is there any reason it shouldn't be relaxed?

Can we *please* get an answer to this question?

Antoine already answered that question: it does not make sense to have a single stream that is open for *update* if it is not seekable.

How does this statement, already refuted by the reporter, bring us any closer to the answer to the question above?

Was this an arbitrary decision of someone who didn't think about character devices? Or is there any particular reason to prevent the use of "r+", "w+", "rb+" and "wb+" with readable-writable character devices?

50f5ac5d-8ffb-4bd6-be06-d8fa5eb48399 commented 4 years ago

Okay, let me help the situation a bit. The “workaround” of using buffering=0 works perfect with rb+ and wb+ and I'm pretty sure I used this in the past. I don't know how well this is documented.

I would usually need to disable buffering for any real-time binary communication anyway, so for me the workaround is reasonable. As for buffered I/O and r+/b+ where you cannot disable buffering, I don't know.

It would be useful to have a good reason behind the limitation of Python and have it documented, or have the limitation lifted if there's none, so this issue can be closed.

Could anyone please explain why binary streams default to buffered I/O in the first place? Does it make any sense or it's just historical cruft?

b8d375e3-4fac-4e42-95ad-11b1260e9f46 commented 3 years ago

It works on macos but not on linux, ie, I have a usb-serial dongle connected to a serial device (a CAMAC controller), and I do an os.open() with 'wb+', followed by a io.fdopen(). I can talk to my device on macos BigSur 11.2, buffered, but not on Linux which throws the non-seekable error when I do the io.fdopen().