gzuidhof / zarr.js

Javascript implementation of Zarr
https://guido.io/zarr.js
Apache License 2.0
132 stars 23 forks source link

Remote dataset issue #142

Closed xiaor2 closed 1 year ago

xiaor2 commented 1 year ago

Hi, I tried to load an array of shape (3, 52411, 52411) from AWS S3. And I used a filter in the get function. data = SOA.get([0, id, slice(null, 52411)]).then(async data => await data.data) Because I used a filter, I should get one-dimensional array with length of 52411. I think it should save me some time since it loads a much smaller array. However, it takes the same time as loading the whole array. Is there any way to save time for that?

manzt commented 1 year ago

What is the chunk shape of your array. (You can find this in the .zarray metadata under chunks). Zarr operates at the "chunk" level, so the most granular requests that it can make are for individual chunks. It will load whatever chunks are necessary to complete your desired selection (and cannot load partial data from within chunks). You can have more fine-grain control over how requests are made in the client but chunking your data differently to optimize for the type of access you intend to make in the client.

xiaor2 commented 1 year ago

What is the chunk shape of your array. (You can find this in the .zarray metadata under chunks). Zarr operates at the "chunk" level, so the most granular requests that it can make are for individual chunks.

The chunk shape is (3, 3344, 3344). In my case, what chunk shape would you suggest?

manzt commented 1 year ago

This is really hard to suggest without knowing your data or any benchmarking. From the single query above, it seems like you want to be able to access an individual (1, 1, 52411) view of this cube. Will you make repeated accesses of this shape, or intend to load different shaped views? When deciding on a chunk shape, think of the queries you are likely to make the most and then probably chunk in a way that benefits that type of request.

xiaor2 commented 1 year ago

For now, I will only use this shape (1, 1, 52411). Is the (1, 1, 52411) best chunk shape?

manzt commented 1 year ago

That would make sense to make, especially if the (1, 1, 52411) views are accessed somewhat uniformly at random (i.e., two nearby chunks aren't likely to get accessed together). It's probably worth reading the zarr documentation on chunk size and shape to learn more.

xiaor2 commented 1 year ago

Thank you! I used the (1, 100, 52411) and it works well. But I have encountered another problem. I have five arrays with shape of (3, 52411, 52411). They have the same chunk size of (1, 100, 52411). And I saved them on the AWS cloud. When I use openArray to get them, four of them have the correct chunk size. But one of them has the chunk size of (3, 3344, 3344). But all the .zarray metadata of the five arrays are the same, it doesn't make sense to have a (3, 3344, 3344) in one of them. Is there any potential mistake causing this?

manzt commented 1 year ago

Perhaps try clearing your browser cache. Sometimes there can be issues with updating data on s3, but this shouldn't be a zarr.js issue.

xiaor2 commented 1 year ago

Yes, that solves my problem! Appreciate your suggestion!!