Open MatthiasWiesmann opened 1 year ago
Note: If anything is changed here, the relevant doc should also be changed. https://docs.python.org/3/library/io.html#io.IOBase.seek
@MatthiasWiesmann would you have time to write a PR with suggestied revisions of the help text?
To note, Python 3.12 will include the constants in os
(see #104418), though there is quite a bit of inconsistency between the various seek()
methods' docstrings:
cc @erlend-aasland as the author of #104418 if you have any thoughts.
A
I agree that it is confusing that the parameter names for the low-level API seek()
aren't used consistently throughout the docs/docstrings. Luckily, these are all positional-only, so we can easily normalise them without backwards compatibility concerns.
Suggested improvements:
seek()
paramsio.SEEK_*
constants instead of os.SEEK_*
io.SEEK_
constantsTextIOWrapper.seek()
has no docstring, so pydoc shows a docstring from a parent class, which is inadequate due to significant difference in semantic.TextIOWrapper.seek()
is not a byte offset, and not an offset at all. It is a "cookie", i.e. zero or an opaque number returned by tell()
. Passing anything other than zero and the value returned by tell()
when whence=SEEK_SET
has undefined behavior.Neither forward seeking nor backward seeking are supported. The only supported are rewinding to the start and restoring the previous position saved by tell()
. Confusion could be avoided if TextIOWrapper.tell()
returned a special type instead of integer, but for now it would be a large breaking change (because the cookie can be saved in a file as an integer and used in other process).
As a data point: on macOS, Linux, and FreeBSD, the lseek
and fseek
manpages uses offset and whence as parameter names.
Because they only work with binary files. They do not work with TextIOWrapper which support multibyte stateful codecs.
There are a lot of tasks here. Suggesting to proceed in this order:
os.lseek
. Rationale: the SEEK_*
constants are also used for this function. If the SEEK_*
constants are to talk about the whence parameter, we should use it for os.lseek
as well.os.SEEK_SET
, os.SEEK_CUR
and os.SEEK_END
; explain that they are used for the whence parameter of os.lseek
and of the seek()
method of file objects; add a very short description to each constant.os.SEEK_HOLE
and os.SEEK_DATA
(exposed in os
though posix
)_io.*.seek()
params where it makes sense.seek()
methods (Doc/library/io.rst)seek()
methodsBecause they only work with binary files. They do not work with TextIOWrapper which support multibyte stateful codecs.
Yes, that is fine; TextIOWrapper
is a special case. For the other cases, it may make sense to use a consistent naming.
__text_signature__
for _io
types with seek
:
Even in C, fseek
/ftell
and lseek
and not interoperable. You will have issues when pass the result of lseek
to fseek
on Windows (because newline translation and buffering). There were such issues in the CPython interpreter. So "offset" in fseek
/ftell
is not just an offset.
fgetpos
and fsetpos
are more modern analogs of fseek
/ftell
, and they work with position which is not a number.
Even in C,
fseek
/ftell
andlseek
and not interoperable. You will have issues when pass the result oflseek
tofseek
on Windows (because newline translation and buffering). There were such issues in the CPython interpreter. So "offset" infseek
/ftell
is not just an offset.
fgetpos
andfsetpos
are more modern analogs offseek
/ftell
, and they work with position which is not a number.
Are you responding to a particular suggested change of https://github.com/python/cpython/issues/107801#issuecomment-1676940817? Are you suggesting another change? I'm not sure how to interpret your post.
No, it was response to https://github.com/python/cpython/issues/107801#issuecomment-1676868944.
As for your proposition, it all looks reasonable. The main entries are 6 and 5. Perhaps there should be only two separate docstrings: for binary and text streams, all other classes perhaps can inherit them from parent. The module documentation can be more verbose and provide more specific details. Other entries can go to other issues.
No, it was response to #107801 (comment).
Thanks for clarifying.
As for your proposition, it all looks reasonable. The main entries are 6 and 5. Perhaps there should be only two separate docstrings: for binary and text streams, all other classes perhaps can inherit them from parent. The module documentation can be more verbose and provide more specific details. Other entries can go to other issues.
That sounds reasonable. And of course, the TextIOWrapper docstring is a special case.
Lots of merged PRs here. 🚀
Anything still to do, or can the issue be closed?
Issue #82969 is open about IOBase.seek/tell vs TextIOWrapper and the offset/cookie parameter Issue #122299 is about the byte vs character seek offsets
Documentation
The documentation on text handles is misleading, in particular for the
seek
method.Steps to reproduce
This produces the following documentation:
Issues
os
module, likeos.SEEK_END
are not mentioned.Generally, the behaviour is very inconsistent, seek relative to the end fail, seek relative to the start works, but might yield a situation when
read
fails.Full investigation of the issue
Linked PRs