[FEA] Support Zarr-based image format (such as NGFF)

gigony commented 3 years ago

Is your feature request related to a problem? Please describe.

We need to look at Next-generation file formats (NGFF) (https://ngff.openmicroscopy.org/latest/) which use Zarr format for general microscopy images with distributed computing.

We want to support Zarr/NGFF after supporting major digital pathology formats (including .svs format).

Describe the solution you'd like

We can support Zarr or OME-Zarr format by 1) reusing existing library (such as ome-zarr-py or z5) or 2) implementing it from scratch.

Reusing a C++ library such as z5 may be a preferred option.

With GDS(GPUDirected-Storage, https://docs.nvidia.com/gpudirect-storage/overview-guide/index.html), the performance of loading the chunks(files) of the Zarr file(folder) could be greatly improved.

Other useful libraries that cuCIM can exploit may be available (we need to collect the information).

Additional context

Do you plan to support images with greater depth than 8 bits? More than 3 channels? (For multiple stains)

CuImage class holds DLPack's structures that can represent various data types, strides, shapes (including Channel dimension) so supporting them is possible. CuImage class already have a public API for that (dtype, dims, shape, and channel_names. see this)

We have a plan to also support microscopy-related image formats which usually store images with data types such as float and int16/32 and multi-channels (such as NGFF which is based on Zarr format), once we address the need for supporting Digital Pathology image formats.

scikit-image API functions often support more than 3 channels (with exceptions for things like rgb2gray where the input must be a 3 channel image). Many operations involve internal conversion to floating-point. We made an effort to preserve single-precision computation when the input is single-precision and have started pushing those same modifications back upstream to scikit-image itself (which traditionally did most operations in double precision).

References

Articles

Zarr: Scalable Storage of Tensor Data for Use in Parallel and Distributed Computing | SciPy 2019 |(opens in new tab)

Data

Public OME-Zarr Data

Python Implementation

C/C++ Implementation

grlee77 commented 2 years ago

For Zarr support from C++ there is also https://github.com/xtensor-stack/xtensor-zarr

joshmoore commented 2 years ago

As well as https://github.com/google/tensorstore and in C, https://github.com/Unidata/netcdf-c . Thanks to @jakirkham and @grlee77 for the heads up about this issue. I'll try to follow along in case there are any issues. One that comes to mind is if there is any API alignment that could/should occur between CuImage & NGFF. For example, there have been a number of ideas around having a base multi-resolution image spec somewhere between Zarr & the microscopy-specific OME types, most recently with the xarray community. All early days but thoughts welcome.

rapidsai / cucim

[FEA] Support Zarr-based image format (such as NGFF) #94