python / cpython

The Python programming language
https://www.python.org
Other
63.4k stars 30.36k forks source link

readline() causes output to be written at eof unless seek() is used #113439

Open rss81 opened 10 months ago

rss81 commented 10 months ago

Bug report

Bug description:

# Add a code block here, if required

In python3 it seems that there is a bug with the readline() method.

I have a file txt.txt that contains two lines:

1234567890
abcdefghij

I then run the following code:

g = open("txt.txt","r+")
g.write("xxx")
g.flush()
g.close()

It modifies the file as expected:

xxx4567890
abcdefghij

I then run the following code:

g = open("txt.txt","r+")
g.readline()    
Out[99]: 'xxx4567890\n'
g.tell()
Out[100] 12
g.write("XXX")
g.flush()
g.close()

I get the following:

xxx4567890
abcdefghij
XXX

Why is "XXX" being written to the end of the file instead of just after the first line?

If I run the following:

g = open("txt.txt","r+")
g.readline()    
Out[99]: 'xxx4567890\n'
g.tell()
Out[100] 12
g.seek(12)
g.tell()
g.write("XXX")
g.flush()
g.close()

I get:

xxx4567890
XXXdefghij
XXX

seems like this is a bug in readline() - it says the cursor is at 12 but writes at EOF unless I use seek()

CPython versions tested on:

3.11

Operating systems tested on:

Windows

benjaminJohnson2204 commented 10 months ago

I was able to reproduce this issue, and I'd like to work on fixing it.

I looked into the bug, and it seems like it's being caused by the TextIOWrapper class (in both the C io and Python pyio modules) reading an entire chunk at a time, then not rewinding its stream pointer before performing the write.

Reproducing this example with opening a file in binary mode (as opposed to text) works as expected, rather than being buggy. The TextIOWrapper class, which is used for text file I/O, has its own buffer it reads an entire chunk into for every read or readline call, and doesn't seek to the correct position when writing after reading. So I think the TextIOWrapper class should be changed.

The simplest fix would be to essentially add self.seek(self.tell()) to both the C and Python implementations of TextIOWrapper whenever we are writing to a stream that is readable and has a non-empty read buffer. For non-seekable streams, we may just want to leave the implementation as is (i.e. calling self.seek(self.tell()) only if the stream is seekable). The only way I see to make this bug not occur for non-seekable streams would be to change TextIOWrapper to not buffer its read calls if it is both readable and writable, but not seekable.

I'd be interested to hear other opinions on this.

vadmium commented 3 months ago

Probably the same as Issue #82891 (and #56424, closed as not worth fixing)