lmmx / range-streams

Streaming range requests in Python
https://range-streams.readthedocs.io/en/latest/
MIT License
8 stars 0 forks source link

PNG chunk enumeration is slow due to repeated range requests #28

Closed lmmx closed 3 years ago

lmmx commented 3 years ago

I know this isn’t supposed to be the spirit of this library, but the procedure for these chunk-sized range requests is prohibitively slow in practice so the approach should be revisited.

For chunks in sequence like: IHDR, IDAT, FFOO, BBAR, IEND

Rather than send one request each (making the time to await each request add up), perhaps it’d be better to use a single range, perhaps from the end of the IHDR chunk if that’s already been scanned [surely it should be done first, always], and also permit non-exhaustive enumeration (e.g. when only the IDAT chunk is desired, simply ignoring any others and finishing after a non-IDAT is found).

This more ‘ruthless’ approach to enumeration would not be in-keeping with the library’s original design but would make a practically useful alternative approach for when speed is prioritised over exhaustiveness.

I suggest adding this as an option within the existing codec

lmmx commented 3 years ago

I think it’s best to distinguish this approach completely from the PngStream, unfortunately, and call it a PngMonoStream to emphasise that it does not take the expected one.

Alternatively, the existing PngStream could be rewritten to avoid sending new requests after an initial one. In fact, this could be a wise thing to add to the RangeStream class itself: a MonoStream that can then be ‘added’ to where the added sub-stream (within the monostream’s total range) is made by copying and splitting the original range, with no need to wait for any new range request to be set up.

On thinking about it, there’s no real reason the entire library couldn’t have been set up like this. You can imagine rewriting it by simply changing what the add method does (split off a subdivision of the total range) and what happens upon initialisation if no range is passed in (a streaming request for the total range instead of a head request).

lmmx commented 3 years ago

Implemented the above idea, but not yet implemented the actual range handling (each "windowed" request just has a copy of the original stream), so does not yet work when switching the PngStream single_request option from False (default) to True in the codecs/png_test.py test.

lmmx commented 3 years ago

Closing as the single request feature was implemented and it is faster. Further speed ups will come from async (#26)