manzt / zarrita.js

A JavaScript toolkit for working with chunked, compressed, n-dimensional arrays
https://zarrita.dev
MIT License
39 stars 5 forks source link

BoolArray fails to get #118

Open keller-mark opened 11 months ago

keller-mark commented 11 months ago

https://observablehq.com/d/7152024fe1caf825 Using get() on a BoolArray seems to not work as expected - the returned value is { data: {}, shape: [34406], stride: [1] }

keller-mark commented 11 months ago

Ah I see now the empty object is an iterator so I can do

Array.from((await zarr.get(arr)).data);

so perhaps not a bug after all, but now I have to always check for whether .data is an iterator on my end

const data = await zarr.get(arr);
if(data.data?.[Symbol.iterator]) {
  return Array.from(data.data);
}
return data.data;
keller-mark commented 11 months ago

Maybe the user should opt-in to getting an iterator

zarr.getIterator(arr)

manzt commented 11 months ago

Yeah, the idea was to keep the data in a TypedArray-like object for as long as possible (i.e., a strided view of the underlying bytes). But maybe this is more trouble than it's worth if there aren't use cases for keeping the underlying bytes.

This case happens with the string/bool array types, which is why I introduced the zarr.Array.is type guard. You could do something like:


if (arr.is("string") || arr.is("bool")) {
   data = Array.from((await get(arr)).data);
} else {
   data = (await get(arr)).data;
}

but maybe that's still not very ergonomic. We could probably wrap this in a separate API as you suggested, which will coerce the typed arrays into object arrays.

manzt commented 10 months ago

A thought I had yesterday. Maybe we could have a type-aware helper for coercing the data types:

let { data } = zarr.refineChunk(await get(array), {
   string: ({ data, shape, stride }) => ({ data: Array.from(data), shape, stride }),
   boolean: ({ data, shape, stride }) => ({ data: Array.from(data), shape, stride }),
})

Will need to think about it more, but curious to hear your thoughts.