ome / ngff

Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
https://ngff.openmicroscopy.org
Other
117 stars 38 forks source link

consolidate Plate to preview image #141

Open will-moore opened 2 years ago

will-moore commented 2 years ago

Currently, when viewing an NGFF Plate in vizarr or napari, the pyramid is created by using 1 down-sampled tile from the first Image of every Well.

In vizarr (https://github.com/hms-dbmi/vizarr/pull/119) we check the .zattrs for every Well to find the path/to/first/Image for each. The multiscales of the first Image in the plate is used to find what the lowest resolution is, then we load that same resolution for all Wells. This makes it quite slow to load a Plate, since we have to load for each Well: .zattrs, .zarray & chunks (1 per channel). E.g. for a 384-Well plate, this is at least 1920 requests for the lowest resolution. https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.1/plates/5966.zarr

A similar strategy of checking the path/to/first/Image for every Well is also in an open PR for ome-zarr-py (for viewing in napari): https://github.com/ome/ome-zarr-py/pull/207

In the case of vizarr, we only load the smallest (lowest resolution) level of the pyramid, and don't load higher resolutions when we zoom in.

I wonder whether we could specify a "preview" image of a Plate that clients could use to view a Plate as a NGFF Image. This would enable the viewing of the down-sampled Plate as simply as we currently view a single Image: load only the .zarray and the chunks of optimum size. Instead of 1 chunk per Well, we would stitch Wells together.

The lowest resolution of the Plate could be loaded from the full resolution Images as it is now.

This plate "preview" would also allow vizarr to view multi-resolution levels of the plate (but not the full resolution).

I could imagine an ome_zarr command like $ ome_zarr consolidate_plate path/to/plate.zarr to generate a plate preview Image from an existing NGFF plate. This could take into account a different path/to/first/Image for each Well.

The top-level .zattrs "plate":{} object could specify the existence of a `"plate_preview": "path/to/multiscales_image.zarr".

cc @jluethi @sbesson

jluethi commented 2 years ago

In the case of vizarr, we only load the smallest (lowest resolution) level of the pyramid, and don't load higher resolutions when we zoom in.

I think it's awesome that the ome-zarr-py library and its napari plugin gives access to the whole pyramid. While it takes a bit to initially load a plate, the browsing of a plate afterwards is super cool, getting relevant resolutions loaded whenever I zoom in. Here's an example video with napari async PR & the PRs of ome-zarr-py & napari-ome-zarr (from the discussion in https://github.com/ome/ome-zarr-py/pull/207): https://www.dropbox.com/s/sasxqw60q7r7440/20220929_napari_async_omeZarr_PR.mov?dl=0

It takes 45s for initial loading to finish (though that varies a lot with connection speed), but then it's a super-interactive plate to browse afterwards!

Now, if we can use something like a "preview" image for a plate that makes the initial loading a few seconds instead of 45s in my example (or longer in examples with e.g. 384 wells), that's a super interesting thought! But I wouldn't want to give up the pyramidal loading for it.

My 2 main concerns would be: 1) How much of the time is spent loading the initial lowest resolution vs. preparing all the pyramids? If we really spend most time loading some low-resolution image of the plate, then this approach sounds great. But can we still dynamically load the intermediate resolutions as needed without having to load all the .zattrs and .zarray files for each well anyway? Because I really wouldn't want to give up this dynamic loading.

The lowest resolution of the Plate could be loaded from the full resolution Images as it is now.

=> is your proposal that we only load preview & full-res, or could we still dynamically load the pyramid levels?

2) Could we run the computation for consolidate_plate in parallel for each well? We currently do a lot of processing per well and send out these jobs for each well to a slurm cluster (see https://github.com/fractal-analytics-platform/fractal). Could we build up such a top-level in parallel or would we have to run a single job that consolidates the plate after processing is finished for all wells? I assume it's more the latter. If that brings big visualization benefits, that's alright of course and we'd just build in a "collection" step after each parallelized step that calculates the plate preview.

PS: sorry for the long delay in response time. Conference & covid interfered with my usual rhythm.

jluethi commented 2 years ago

One additional thought that just came up: Have you profiled the loading pattern once to measure which part slows down the initial loading most?

If a lot of time is spent on the .zattrs & .zarray loading, maybe an alternative option would be to just use a consolidation to generate centralized (but duplicated) metadata, i.e. have all the metadata about pyramid levels, resolutions etc. both in the .zattrs & .zarray per image, but also in a centralized .zattrs & .zarray for the whole plate.

In that way, only a single (larger) .zattrs & .zarray would need to be loaded for the plate to get all the metadata for all the pyramid levels, while the image data could still be stored in the same way as usual. Plus, if we accept the duplicate metadata, then all the wells & individual images could still be loaded as usual.