Weird behaviour when reading windowed ranges

lmmx / range-streams

Streaming range requests in Python

https://range-streams.readthedocs.io/en/latest/

MIT License

8 stars 0 forks source link

Weird behaviour when reading windowed ranges #30

Closed lmmx closed 3 years ago

lmmx commented 3 years ago

from range_streams import RangeStream, _EXAMPLE_URL
s = RangeStream(url=_EXAMPLE_URL, single_request=True)
s.add((0,3))
s.read(3)

gives the expected 3 bytes

but

from range_streams import RangeStream, _EXAMPLE_URL
s = RangeStream(url=_EXAMPLE_URL, single_request=True)
s.add((0,3))
s.read(1)
s.read(1)
s.read(1)
s.read(1)
s.read(1)
s.read(1)

shows you can just keep reading the bytes, there’s no gating of the stream done at all.

The active range response is not ‘decommissioned’ in the normal mode: like any stream, you can keep reading from it but you’ll just get the empty byte b''. This could be implemented by checking the tell on the response iterator and making it dependent on the byte range of the active range response (?)

lmmx commented 3 years ago

It may be necessary to ‘fake out’ the active range response after it’s been exhausted.

To act like an exhausted partial content request (when in single request mode), you could intercept the active_range_response.read() and active_range_repsonse.tell() calls via the RangeStream read/tell methods themselves, and where the active_range_response property would report its RangeResponse object with normal stream capabilities, when in single request mode it could actually create a faked RangeResponse with a dummy read method only in the event that it has been consumed.

There remains the problem of how to stop it reading too many bytes from the stream (and solving that may in fact make the previous suggestion unnecessary/overbaked)

lmmx commented 3 years ago

Maybe:

set the buffer size
set a default value on read in the case it's called without argument

But I don't think either of these will limit the actual stream's ability to return too much of the underlying stream

Maybe seek to the start and call socket.rcv_into but this returns the number of bytes, not the bytes

The simplest approach would be to just add a window_size (or window_end) to the RangeResponse, and if set it would limit the size read() into the response as window_size - tell(), which becomes 0 when the window size is reached, and if not set then the stream is allowed to just naturally self-limit

lmmx commented 3 years ago

Resolved, see #32