Closed lmmx closed 3 years ago
Oops… never mind, I did already consider it
As the motivation docs page says, I did already consider this and had forgotten the reason I decided against it:
If you use a single stream, then you’re required to load it linearly. You can’t skip on a single stream, seeking later than the cursor loads all bytes up to it.
The current design permits non-linear reading (which is costly for e.g. ZIP central directories but not for linear PNGs).
This is helpful to clarify and can be documented in the motivation page.
I will need to add this as a reading mode then, so all the previous code can stay as it is as long as I leave the current mode as the default.
This will entail some ‘hackyness’ (essentially having to fake the interface to add
and active_range_response
) but it’s worth not having to rewrite code that already supports this, and leave it up to the user to decide which way they want to use the library (which may depend on aspects as simple as file size)
Rather than simply overwrite the add
method wholesale when single_request
is True in the params of RangeStream
, I suggest adding a new method divide
, which can be used in this case, and also be available for less simple use cases (such as "single request for the first N bytes of a file but new requests for any range after that")
- `add` is a fitting name for a method that supplies a particular range on a stream, but `divide` does not accurately suggest the provision of a sub-range from a pre-existing range.
- Better names for this method:
- `partition` (but to me this suggests producing multiple sub-ranges rather than one)
- `split` (same problem as above)
- `subset`
- "Subset" is too specific though (and more of a noun than a verb)
- "Cut" may be neater, but sounds like it mutates its target (as does "trim").
- "Excise" has a clearer meaning than "cut" and does not suggest it mutates its target
- From looking it up, apparently "excise" does sometimes mean mutating its target, but it's better than "extract" (as in 'take/draw out') which is a bit too generic of a term
- I would choose either:
- `excise`,
- `snip`,
- `clip`,
- `snippet`,
- `withdraw`
window
(as in sliding window)The matter of whether the target is mutated is relevant here: the underlying RangeResponse
will have a different range but the same httpx.Response
(in RangeResponse.request.response
, i.e. :attr:RangeRequest.response
)
The RangeResponse
gets its range (solely in the is_consumed
method) through its RangeRequest.range
This could therefore be ‘gated’…
RangeResponse
itself, which is to do with keeping ranges distinct in the ranges
RangeDict)?The idea of a "window" on the pre-stream makes clearer that it is not the same type of add
, and can be facilitated with either a flag add(stream_window=False)
or a new method add_stream_window
stream_window
is equally concise as RangeStream
and clear about the abstraction (a single streamed GET request) and the unit of computation (a mutable window onto an immutable underlying response)
monostream_window
would be clearer, but would also limit the extensibility of this method: in future I may remove the requirement for a window to be on a single stream over the entire file range.Rather than storing the single stream separately, it should just be made clear in the documentation that a single stream should be used deliberately to avoid sending further requests, and therefore add
should be called with stream_window=True
(I may abbreviate it to simply window=True
in fact, the class itself is a stream so it doesn't need repeating)
Implemented with use_window=True
, it all seems to work perfectly.
The existing interface for
RangeStream
is fine, but what it does internally should change. There should be only one streaming GET request, and all rangesadd
ed to this stream will not have their ownRangeResponse
but a simulated version derived from modifying the original, total range.This is motivated by the discussion for PngStream in #28 in which I noticed the slow performance of the chunk populating method which relies on sending range requests for the next chunk after determining their location:
This was essentially a misunderstanding in the design phase of how to use these streams, and should be fairly simple to change over since it’s tested pretty thoroughly.
The only weirdness may be from having a single response… so maybe that should be ‘faked’ by changing the values and making them as copies…? Unclear if worth the memory toll though
Expect will need some mechanism to ‘gate’ ranges so reading past the end behaves as expected
For testing, perhaps switch over gradually with a flag in the init (
singlestream=False
) that gets toggled toTrue
once stable and eventually removed in future versions?