gzuidhof / zarr.js

Javascript implementation of Zarr
https://guido.io/zarr.js
Apache License 2.0
132 stars 23 forks source link

Partial chunk reads #109

Closed guigrpa closed 2 years ago

guigrpa commented 2 years ago

Are partial chunk reads supported, as is the case in zarr-python for datasets using Blosc compression? (see this issue and this merged PR).

We're interested in accessing extremely large public datasets (tens-hundreds of TB) with chunks as large as 100 MB, from a web application. Given their size, it's unlikely that we can create new copies with a more web-manageable chunk size (say, 1-2 MB). Any idea?

cc @manzt

gzuidhof commented 2 years ago

Hi @guigrpa,

Currently it doesn't have any special support for these queries, it is technically possible I presume (with a HTTP range request header to specify what part you want to read). I had a look at the merged PR, from what I understand it actually "reads" the entire file and then decompresses only part of it. Now "reading" of course has a different meaning (one can "open" a file that is local and then only actually access a part of it), on the web we have to do this through range requests.

I'm of course happy to accept a PR for this behavior, otherwise perhaps the best way to solve this is with some intermediate service on a server that takes requests for a smaller chunk size, translating it to the larger chunks and serving them partially (and it should probably have a cache for the parts that you access often). I hope that makes sense!