Open 23dad19e-9ca4-43ba-8483-d783d18dcb4c opened 13 years ago
A prefetch() method for Buffered IO may greatly assist 3rd party buffering among other gains. If nothing else, it is worth experimenting with.
Discussion on the topic is here: http://mail.python.org/pipermail/python-ideas/2010-September/008180.html
A summary of the method proposed (by Antoine Pitrou):
prefetch(self, buffer, skip, minread)
Skip skip
bytes from the stream. Then, try to read at
least minread
bytes and write them into buffer
. The file
pointer is advanced by at most skip + minread
, or less if
the end of file was reached. The total number of bytes written
in buffer
is returned, which can be more than minread
if additional bytes could be prefetched (but, of course,
cannot be more than len(buffer)
).
Arguments:
buffer
: a writable buffer (e.g. bytearray)skip
: number of bytes to skip (must be >= 0)minread
: number of bytes to read (must be >= 0 and <= len(buffer))I started a draft in python. I am attaching the _pyio version along with tests. I will continue work on the C implementation and eventually documentation if this is well received. It seems straightforward, I am interested to see what you guys think.
Also, there are now 2 places which use hasattr(self, "peek"). I was wondering if it would make sense to add peek() to BufferedIOBase and raise UnsupportedOperation or return b"".
Some benchmarks..
$ ./python -m timeit -s "from _pyio import open;f = open('LICENSE', 'rb'); b=bytearray(128)" 'while f.prefetch(b, 4, 4): pass'
_pyio.BufferedIOBase.prefetch:
100000 loops, best of 3: 10.6 usec per loop
_pyio.BufferedReader.prefetch:
100000 loops, best of 3: 6 usec per loop
$ ./python -m timeit -s "from _pyio import open;f = open('LICENSE', 'rb');b=bytearray(4);" 'while f.read(4): f.readinto(b)'
100000 loops, best of 3: 5.07 usec per loop
I started a draft in python. I am attaching the _pyio version along with tests. I will continue work on the C implementation and eventually documentation if this is well received. It seems straightforward, I am interested to see what you guys think.
Thank you. I think performance measurements are prematurate until we have an optimized C implementation anyway.
I think ultimately we also want a default implementation of read(), peek() and read1() which uses prefetch(), so that BufferedReader implementations only have to implement prefetch(). (care must be taken to avoid infinite loops)
That said, I think the python-dev mailing-list needs to be convinced of the usefulness of prefetch() (if it was only me, there wouldn't be any problem :-)). Perhaps you want to run another discussion there.
Here is an update with the C implementation. I think a working prototype will be helpful before another round on python-dev.
I'm not sure how to handle unseekable, non-blocking streams where the read returns before skip
bytes are exhausted. If prefetch() returns 0, then the caller would then have to use tell() to ensure subsequent reads are sane. In other words it seems prefetch() will leave the stream in an unpredictable state. Antoine, what are your thoughts?
Sounds like this might be more appropriate for the BufferedReader and related classes, and less so for the writer and abstract base class.
The proposed API seems strange to me. Is there an illustration of how it might be used? I suspect it wouldn’t be all that useful, and could more or less be implemented with the existing methods:
def prefetch(buffered_reader, buffer, skip, minread):
buffered_reader.read(skip)
consumed = buffered_reader.readinto(buffer[:minread])
if consumed < minread:
return consumed
spare = len(buffer) - consumed
extra = buffered_reader.peek(spare)[:spare]
total = consumed + len(extra)
buffer[consumed:total] = extra
return total
Maybe it would be better to focus on clarifying or redefining the existing peek() method (bpo-5811), rather than making a brand new do-everything method which only seems to do what the other methods already do.
The proposed API seems strange to me. Is there an illustration of how it might be used? I suspect it wouldn’t be all that useful, and could more or less be implemented with the existing methods:
True, but having it a Buffered method would allow it to optimize buffer usage and avoid some memory copies (the first read() call and the peek() call, for example).
In any case, Guido was rather against this proposal, so I'm not sure there's much sense in keeping discussing it.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['type-feature', 'expert-IO']
title = 'Add prefetch() for Buffered IO (experiment)'
updated_at =
user = 'https://bugs.python.org/jcon'
```
bugs.python.org fields:
```python
activity =
actor = 'pitrou'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['IO']
creation =
creator = 'jcon'
dependencies = []
files = ['22168', '22169', '23308']
hgrepos = []
issue_num = 12053
keywords = ['patch']
message_count = 6.0
messages = ['135731', '137143', '138118', '144848', '240247', '240672']
nosy_count = 8.0
nosy_names = ['pitrou', 'vstinner', 'nadeem.vawda', 'benjamin.peterson', 'stutzbach', 'jcon', 'martin.panter', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue12053'
versions = ['Python 3.3', 'Python 3.4']
```