silx-kit / h5web

React components for data visualization and exploration
https://h5web.panosc.eu/
MIT License
167 stars 17 forks source link

Partial read of large HDF files stored publicly on S3 (access via HTTPS) #1566

Closed grzanka closed 5 months ago

grzanka commented 5 months ago

Is your feature request related to a problem?

I like very much https://myhdf5.hdfgroup.org online browser for HDF files. It works well even with files store publicly on S3 and accessible via HTTPS. I am able for example to load ~600MB file from https://s3p.cloud.cyfronet.pl/datarawlv2v3/20231204m4.hdf What is only puzzling me is that the whole file is being fetched over the web into my local storage of web browser.

Requested solution or feature

Would it be possible to implement partial read of remote files ? I saw once interesting solution of such problem for Python https://gist.github.com/ajelenak/db0d9bf14b7ea4c48acf20249e189c80

Another online reader of different binary format (ROOT files) have similar feature: https://github.com/root-project/jsroot/issues/284#issuecomment-1932041133

Ideally I would like point the https://myhdf5.hdfgroup.org online browser to an URL with pretty large file (for example https://s3p.cloud.cyfronet.pl/datarawlv2v3/20231204m4.hdf). Then I would expect it will get range of bytes from the URL and read the file structure. Then when I select something for plotting, like given dataset, it would download only the data from dataset needed for plotting.

axelboc commented 5 months ago

Hi @grzanka, totally agree! In fact, this feature is already on the road map: https://github.com/silx-kit/h5web/issues/1264 (but I can't give you a timeline, sorry). Closing this issue as duplicate but thanks for providing file samples; they'll be very useful when the time comes to test the feature.