Open ilan-gold opened 4 months ago
Interesting question. I hadn't considered it, are the offsets you're indexing beyond the largest you can specify with Number
? Or is this just a convenience to coerce a bigint into a number?
I can imagine a case where you are indexing a subset of a massive array, and in that case I wonder if we should prefer bigint for slicing and indexing. You can't index into a TypedArray or Array with bigint, so somewhere it needs to be converted into a number (with the offset removed), but I'd rather avoid this complexity unless there is a use case that requires it.
So we do have a specific use-case here, which is indexing into sparse arrays on disk (see https://github.com/scverse/anndata/pull/1348 https://github.com/scverse/anndata/issues/1261 and https://github.com/scverse/anndata/pull/1493).
You can't index into a TypedArray or Array with bigint
I was surprised to discover that actually Array(10).fill(15)[5n]
works and returns 15 unless I am missing something, and same for TypedArrays. Maybe I'm misinterpreting this though. My understanding is that there is an upper bound, though, on the size of the array, which is 2^31 - 1 (although this can be fudged, which I am using to hopefully construct tests for this).
In any case, I actually don't really care about the indexing use-case for in-memory. I am more interested in pulling ranges from the on-disk data, which I think is actually something that is a general issue i.e., constructing a slice for a zarr array longer than 2^31 - 1
elements seems like it's a good use-case. Would this require indexing into an array with a bigint? I think maybe not, but I'm just starting to peer into the internals here.
Hmmm, there is quite a bit of arithmetic that gets much more challenging with bigint
. This is quite complex. I will need to think about this a bit, spend more time with the code base. My use-case is simple, but the potential use-cases in this package are manifold so implementing this is not straightforward.
Right, I was assuming that would be a problem. I can see the value of being able slice the absolute space of the array using bigint, but we need to turn any of that indexing into relative offsets internally and there is quite a bit of arithmetic to handle that case. I'm not sure I'll have time to work on this, but I'd be open to turning all indexing logic internally into bigint-based, and then expose a more flexible top level zarr.slice
to allow either number/bigint (and convert to bigint).
Oh that'd be interesting and is definitely more approachable. If you're open to it, I can take a whack at it soon.
Hello!
Would it be possible to change the
{S,s}lice
type to acceptbigint
? I will make a PR to demo/check this out myself. Just wanted to make an issue. Maybe I will discover this is not an issue but ATM:gives me a type error: