HCS view in clients, napari / vizarr

will-moore commented 3 years ago

Since we have decided on an OME-zarr spec for HCS data (see https://github.com/ome/omero-ms-zarr/issues/73), we can start to think how we want clients to consume this data and display it.

Some initial thoughts...

napari

Imagine the entry-point with napari, using https://github.com/ome/ome-zarr-py is something like;

$ napari 'https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/plate/1234.zarr/'
# or maybe?
$ napari 'https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/1234/plate.zarr/'

where the Plate ID is 1234.

Using a napari plugin only allows us to open a single image and not open any other windows etc I think? So a potential solution would be to return a dask-based image pyramid, where the lowest resolution would be the plate overview (grid of thumbnails) and as you zoom into each thumbnail, you get the full resolution of the image. If the images are Z-stacks or T-stacks this would add extra dimensions to the pyramid. If we have multiple Plate Acquisitions ("runs") this would add another dimension to the pyramid. Multiple Fields in a Well could be another dimension, or you could show all the fields adjacent to one-another in a Well. The plugin would also have to handle labels

Pros:

napari should be able to open a plate, without any UI changes.
raw pixel data downsampled from the whole plate could allow you to adjust rendering settings for the whole plate in real time, and play through time or Z for the whole plate.
could also add a layer of Well labels 'A1' etc to overlay on the image.

Cons:

No metadata / separation of images. E.g. want to know what image ID you're looking at, or want to analyse separate images in napari.
Need to handle sparse data. E.g. if one image has lots of Z/T-sections then the whole plate needs to have the same Z/T shape. Also, if one image has many channels or labels layers, all the images need to have the same.

Alternatively, we build something similar to https://github.com/tlambert03/napari-omero that includes napari UI widgets for browsing a Plate and showing thumbnails in a grid. Clicking a thumbnail loads the image. The UI widget for browsing a Plate is already needed by the napari-omero client, so we could either use napari-omero to support

$ napari_omero 'https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/plate/1234.zarr/'

or we allow ome-zarr-py to use the UI widget somehow (put it in yet another repo?) or we have a different entry-point?! e.g.:

$ napari_zarr 'https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/plate/1234.zarr/'

Pros:

We can build a custom Plate UI, similar to the existing OMERO clients, with dedicated controls for changing Field, showing layout of Fields in a Well, etc.
We load a single image at a time into napari, so we can analyse an image at a time, load / save labels for an image at a time etc.

Cons:

No expertise on the OME team for Qt UI building.
Just loading regular rgb thumbnails for a Plate - don't get all the benefits described above (rendering settings or play move for plate etc)
Needs another repo (or need to share UI widgets between napari-omero and ome-zarr-py) and a different CLI entry-point.

Possibly a combination of the options above:

We use a napari viewer to load a single canvas of plate overview of down-sampled raw data (not rgb thumbnails)
Can apply rendering settings, play through Z/T etc.
But clicking on an Image (Well) opens the image in a new napari viewer.

Pros:

Best experience for the user

Cons:

Most work to build?

vizarr

Probably all the same arguments apply to a web-based client using https://github.com/hms-dbmi/vizarr, although at least we have a slightly better understanding of how to build a custom web UI.

cc @tlambert @joshmoore @manzt

joshmoore commented 3 years ago

A short summary from on going implementation discussions: the loading of a large number of individual images before showing anything to the user is certainly a suboptimal experience. A few ideas for possibly improving this:

generate a static overview image, server-side
generate an overview from smallest resolutions with delayed calls, client-side

cc: @sofroniewn @jni

sofroniewn commented 3 years ago

This is definitely an important and interesting use case! The detailed right up above helps me understand, but I'm not sure I have any additional recommendations at this point. If @joshmoore and @will-moore want to come by the napari euro dev sync next wednesday 8:30am PT (in USA, afternoon in Europe) we could discuss more then!

jni commented 3 years ago

Personally, I would very much like to have the option to browse through an entire screen in napari. At least one mode I would like to be able to have is for plate, row, and column to be individual dimensions of a lazy array, and then I have sliders for each.

We can then do things like splitting the row/column dimensions into layers and using grid mode to see a full plate. There's a lot of issues with this, including (1) we currently don't have the UI machinery for the splitting, though the model code exists, and (2) interacting with a large number of layers in napari is currently painful. But I'd be very happy to work with the OME team to make all of this easier.

Things like individual metadata on each image is important, and something we should support — again, happy to work towards this. Currently we don't really have much metadata even on single image layers, so there's work to do for sure. 😂 All I'm saying is that I wouldn't take napari's current metadata support to be the limiting factor for design on your end — we should explore the space of options available to us, even those that assume radical overhauls of napari. I can for example imagine a sliceable "metadata" object that gets sliced together with the image and displayed alongside it.

jni commented 3 years ago

(Thank you @joshmoore for the ping, would have missed this otherwise!)

will-moore commented 3 years ago

See https://github.com/ome/ome-zarr-py/pull/57 for current state of plate view in napari. ome-zarr-py is returning a dask-delayed, stitched sub-resolution of a plate, along with shapes + text labels to show the wells. There is another script where I'm adding click handling, so that clicking a Well loads the Image. It would be nice to package this behaviour somehow (since I don't think it's possible to include it in a napari plugin) - similar to napari-omero, but kinda reluctant to create another package like napari-ome since it's already kinda confusing.

jni commented 3 years ago

@will-moore so cool! 😍

dask-delayed, stitched sub-resolution of a plate

Is this multiscale? That should be doable within the same framework, right?

I don't think it's possible to include it in a napari plugin

I think you're right here, but this is definitely something we want to improve on and keep in mind. @sofroniewn do we have a central place yet where we keep potential plugins so that they can guide our design goals? I've linked napari/napari#939 for now.

will-moore commented 3 years ago

@jni I coded it up to return a multi-scale pyramid (would have been really cool to zoom in to full resolution) but unfortunately this caused seg faults, so I've just updated to return a single zoom level (see https://github.com/ome/ome-zarr-py/pull/57/files#diff-b50d9715cc6e4017cfc055fd0ed73ecb5d9158e17f4d58ca5b3ba08b89c46657R447). I might have got something wrong with the dask.delayed stuff, but it seemed to be trying to load all the data up front. Haven't had time to go back and dig deeper yet.

jni commented 3 years ago

Dang!

This might be relevant, in case you're not aware: https://github.com/dask/dask-image/issues/161

Even though it is labeled as a performance issue, for me, if I used dask stack in the rootomics example linked above, it was sluggish and eventually crashed without warning. Once I moved to map_blocks, scrolling was smooth and I never experienced any crashes.

The fast way to check is to call dask.optimize() on the array before returning it. @m-albert discovered here that the performance then becomes identical — it might also help here.

Anyway, thanks for explaining that. Might be worth leaving it as single zoom for now and creating an issue to investigate later.

will-moore commented 3 years ago

@jni Thanks for the tip. When I'm working with local data, the performance is not a blocker (few seconds to show a plate) and dask.optimize() doesn't seem to help with the seg fault when I'm trying with multiple zoom levels.

Unfortunately, when loading data remotely, the performance becomes very bad (about 2 minutes to show a plate) even though I can load the same tiles in a web viewer (vizarr) in a few seconds (temp build at https://mystifying-lalande-e12142.netlify.app/?source=https://minio-dev.openmicroscopy.org/idr/idr0002-heriche-condensation/plate1_1_013/422.zarr) This is certainly due to my low home bandwidth (others have reported loading in a few secs with) but still seems worse than I would expect.

I couldn't quite understand how to use map_blocks in my code to stack 5d arrays, but maybe it's not needed if you're applying dask.optimize()?

@sofroniewn Happy to join the napari euro dev sync on Wednesday - Zoom? You'll also be at the OME call on Thursday? https://forum.image.sc/t/upcoming-calls-on-next-gen-bioimaging-data-tools-starting-oct-29/43489

jni commented 3 years ago

I couldn't quite understand how to use map_blocks in my code to stack 5d arrays, but maybe it's not needed if you're applying dask.optimize()?

Yeah it's unclear to me right now whether dask.optimize gets you as far as map_blocks. And yes it was tricky to grok! You need to know two non-obvious things:

if the function you're mapping takes in a block_id keyword argument, map_blocks will populate it with the block coordinates. This is handy to figure out which file you want to load. See this line.
for map_blocks to know how many blocks it's making, you have to use the "long" chunks format: a tuple of tuples, containing the exact sizes of every block along each axis. See this line.

btw how big is this dataset total and can I do ome_zarr download on a URL to download it? Or not yet?

I'll be at the OME call. Looking forward to it! :blush:

sofroniewn commented 3 years ago

@sofroniewn Happy to join the napari euro dev sync on Wednesday - Zoom? You'll also be at the OME call on Thursday?

Yup, I'll be at the second leg of the OME call on Thursday! @will-moore if you send me your email I can add you to napari euro dev sync calendar invite. not sure if you have mine, but @joshmoore does. You can also use the napari zulip if you want https://napari.zulipchat.com/login/

manzt commented 3 years ago

Sorry I'm late to the party, and not sure I have much to comment on... @will-moore has been experimenting with loading a plate in vizarr and linking to individual multi-res views (in a new tab) on click. Ideally we could stitch together a multi-res grid of individual multi-res images (we are limited in that regard due to some deck.gl specifics), but the deep-linking UI is really interesting IMO and not something I had fully considered.

Perhaps getting (well) ahead of myself, but since napari (via ome-zarr-py plugin) and vizarr follow similar heuristics for loading a OME-Zarr node (multiscale / plate group), its exciting to think about adding some type of open in napari button to vizarr if folks are interested. I don't know if there is a plan to have a custom url scheme for napari, but certainly something to consider once you can launch napari as an app.

I might be able to join for the sync tomorrow, but still need to record a lightning talk for thursday :)

jni commented 3 years ago

I don't know if there is a plan to have a custom url scheme for napari, but certainly something to consider once you can launch napari as an app.

there's not a specific plan, but there's definitely a plan!

joshmoore commented 3 years ago

btw how big is this dataset total and can I do ome_zarr download on a URL to download it? Or not yet?

I'd propose using aws s3 directly. (ome_zarr download ...someplate... will be important for S3 servers without directory listings, but it's not done yet)

I've uploaded two datasets to EBI's S3:

aws --no-verify-ssl --no-sign-request --endpoint-url https://s3.embassy.ebi.ac.uk/ s3 ls s3://idr/share/community-call-2020-10-29/
                           PRE idr0002-heriche-condensation/   # 125 GB (done)
                           PRE idr0033-rohban-pathways/        # 35 GB (done)

in the current format that we've been testing with.

will-moore commented 3 years ago

@jni Josh also uploaded the idr0002 plate with no Time data, which I've used for most of my testing and is much smaller to download:

s3://idr/share/community-call-2020-10-29/idr0002-heriche-condensation/plate1_1_013/422_no_T.zarr

will-moore commented 3 years ago

@jni @GenevieveBuckley I finally got around to looking at map_blocks, using https://github.com/dask/dask-image/pull/165/files as an example. I was testing this using an example where I'm generating a ton of tiles and stitching them together into a pyramid of 5D levels. Then viewing the pyramid in napari, or timing the compute() on each level of the pyramid. See https://gist.github.com/will-moore/819ade3c4e46864d9405555a1bf4933c

Unfortunately, I'm seeing that both are slower when I'm using map_blocks() :( So I'm definitely doing something wrong!? Any ideas how to improve this?

Also I'm still seeing the problem of napari re-requesting a ton of tiles, even when you're just panning a few pixels and not panning onto any new tiles.

Thanks!

sofroniewn commented 3 years ago

Also I'm still seeing the problem of napari re-requesting a ton of tiles, even when you're just panning a few pixels and not panning onto any new tiles.

eek! we're getting pretty close to having an experimental version of our asynchronous multiscale / tiled rendering working, see the gifs in this PR for the latest https://github.com/napari/napari/pull/1837#issue-515682786. It still can't be hooked up to externally provided chunked data sources (like zarr/ dask), but I think that will be coming in the next two weeks and will be transformative for your use case.

ome / omero-ms-zarr

HCS view in clients, napari / vizarr #74

napari

vizarr