silx-kit / h5web

React components for data visualization and exploration
https://h5web.panosc.eu/
MIT License
160 stars 17 forks source link

Add range-based partial fetch for h5wasm provider #1264

Open bmaranville opened 1 year ago

bmaranville commented 1 year ago

Is your feature request related to a problem?

Reading very large files with the h5wasm provider is not possible, for several reasons:

  1. maximum size of ArrayBuffer and also "file" in emscripten is often < about 2GB
  2. maximum size of memory in the browser is a limitiation for in-memory file representation
  3. unreasonable demands on network/infrastructure to download entire huge files.

Requested solution or feature

For web file servers with HDF5/NeXus files that support range requests, on-demand loading could enable access to very large NeXus files that would be infeasible to read as a whole, using emscripten's lazyFile functionality

Alternatives you've considered

HSDS and grove providers already allow this type of random access to parts of a NeXus file.

Additional context

Because sync file access is required, this might require refactoring the h5wasm provider to operate from a worker. Note that it could potentially be refactored to a service worker that uses the same API as a grove server, if that simplifies things.

bmaranville commented 1 year ago

Note that for local files, the emscripten WORKERFS interface could be used to get random access to huge local files from a worker without copying the whole file into memory, which is another benefit of moving the provider to a worker.

axelboc commented 1 year ago

Note that it could potentially be refactored to a service worker that uses the same API as a grove server, if that simplifies things.

This would be brilliant! However, it seems that synchronous XHR requests inside Service Workers are currently not supported in Chrome and Safari—only in Firefox.

imathews commented 3 days ago

Does the recent work on the H5wasmLocalFileProvider #1604 perhaps provide a pathway for something similar to be implemented with Range request headers in URLs?

This would be of huge benefit to our use case, where multi-gigabyte files are stored remotely and loading the entire file is both memory and network prohibitive.

axelboc commented 2 days ago

It's definitely going to help. @bmaranville also developed a lazyFileLRU demo to show feasibility. However, the amount of code required and its complexity has me worried a bit; it's not going to be trivial to make a production service out of this. I need to look into it more to better understand what's going on.