Closed dstansby closed 3 weeks ago
I actually have similar questions myself! Since we are now using TensorStore to read and write the data (instead of Dask) I'm not so familiar with the data loading behaviour.
I am starting to test on larger Images e.g. https://github.com/ome/ome2024-ngff-challenge/pull/23 to see how well they are handled by the conversion...
Currently the default sharding behaviour (create a single shard that contains the whole array) isn't ideal for bigger images.
There is provision for providing shard shapes in a user-edited parameters.json
file but it would be nicer to define some logic for automatically picking shard shape based on chunk shape etc.
Any suggestions, feedback, help etc appreciated!
The README says to run
ome2024-ngff-challenge input.zarr output.zarr
, but I am reluctant to run this on my multi-TB datasets in case it reads the whole lot into memory 😆 . It would be nice to add a bit of clarification as to whatome2024-ngff-challenge
does. Does it create a copy of the data? Is is parallelised somehow? Does it modify data or metadata in place?