Open d-v-b opened 2 hours ago
This would be a good opportunity to think if we should also consider a type that allows doing things like "last N bytes" (without knowing the size beforehand). Most (all?) object stores, for example, allow that type of query.
This would be a good opportunity to think if we should also consider a type that allows doing things like "last N bytes" (without knowing the size beforehand). Most (all?) object stores, for example, allow that type of query.
Today the elements of ByteRangeRequest
can be negative, so I think (-1, -100)
would express "read the the last 100 bytes". Maybe @normanrz can correct me if this is wrong.
@paraseba Suffix requests are already used (https://github.com/zarr-developers/zarr-python/pull/1661#issuecomment-2435452928), we just want to ensure the type is well defined so that semantics are clear.
Most (all?) object stores, for example, allow that type of query.
All except Azure.
Today the elements of
ByteRangeRequest
can be negative, so I think(-1, -100)
would express "read the the last 100 bytes".
From here it's (-100, None)
that would express the last 100 bytes:
https://github.com/zarr-developers/zarr-python/blob/bc588a760a804f783c4242d4435863a43a5f3f9f/src/zarr/codecs/sharding.py#L700-L702
Yes, it is (-100, None)
That is great, then making this type more clear is even more important.
So, the dataclass would be like this?
@dataclass
class ByteRangeRequest:
start: int # can be negative for indexing from the back
length: int | None # None = read to the end
Continuing a conversation from #1661
We have IO operations that can fetch ranges of bytes. That byte range parameter is parameterized by an (optional) 2-tuple of nullable ints, i.e.
tuple[int | None, int | None] | None
. The tuple part is ambiguous between two similar specifications of a range of integers --(start, step)
and(start, stop)
; this ambiguity is bad.We should rework this type so that it's harder to be confused about these two possibilities. One option suggested by @kylebarron would be a dataclass, e.g. :
It would also be nice if there was an instance of this class (e.g., the default) that conveyed "fetch the whole thing", and then
byte_range
wouldn't need to be optional in all the store methods. I'm also not sure we get much fromNone
in this type, since there are integers that can express "the end" of a range, e.g.-1
. Curious to hear other ideas here.