ome-zarr / ngff support

satra commented 2 years ago

@jbms - i know that the spec is not quite there yet. but are there things currently being discussed that would prevent supporting ngff viewing in neuroglancer including multiscale ?

and is there any way presently of displaying a single level of an ome zarr file? i tried to hack by doing this: zarr://https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.3/9836842.zarr/0 but that didn't work. it did not error, but the display wasn't quite right. i suspect it's because it's missing some metadata.

jbms commented 2 years ago

Zarr is fully supported already. As for that dataset you just need to configure the "display dimensions" and the brightness/contrast in Neuroglancer.

Example

By default Neuroglancer uses the first three dimensions as display dimensions, in the order given. You can rename dimensions and re-order them, or specify display dimensions explicitly via the "x y z" control on the left side.

As far as OME-Zarr (is OME-NGFF just another name for OME-Zarr?), from looking at this spec: https://ngff.openmicroscopy.org/latest/#why-ngff

I see there are the following types of metadata:

multiscales: Very easy to support, assuming that the intention is that each successive level of downsampling is 2x2 in xy only. However, I would suggest some changes to allow this metadata to be even more widely useful, e.g. supporting arbitrary multi-dimensional data, supporting downsampling by any factor in any dimension, clarifying the purpose and intent of the "type" and "metadata" fields.
omero: All of the functionality here is supported, but currently it is handled at the "layer" level rather than as a property of a data source. There might be a reasonable way to add support for it but it would be non-trivial. In general this seems to be defining a "view" of the data rather than the data itself, and therefore I would suggest it might make sense to decouple it more from the data, e.g. potentially in a separate "data view" specification rather than as part of the dataset itself.
labels: It is not clear from the specification what this means.
image-label: This should be relatively easy to support. All of the functionality is already supported by Neuroglancer segmentation layers, the only thing is that currently data sources cannot provide color maps (instead it is done at the layer level), but that fits well with existing support for segment properties and would be relatively easy to add.
plate and well: Probably won't be supported. Some existing users of Neuroglancer do use it for multi-well plates. However, they have found it helpful to display the wells spatially as they are actually laid out on the plate, with the ability to zoom out and see the entire plate, since they have found there are often spatial correlations that are of interest. To make that practical requires downsampling within the plate space, since if you downsample within only individual wells then to display a plate requires retrieving the images of many individual wells, which is slow. For that reason the OME-NGFF plate/well metadata would not actually serve these users.

I am happy to comment on other proposals, but please point me to specific ones.

satra commented 2 years ago

thank you @jbms for the neuroglancer link.

As far as OME-Zarr (is OME-NGFF just another name for OME-Zarr?), from looking at this spec: https://ngff.openmicroscopy.org/latest/#why-ngff

i think so at this point. the library is called ome-zarr that provides the python integration.

thank you for the pointers to the various metadata elements in the spec. at least the 3d datasets we are generating they are downsampling by 2 in voxel space in all dimensions (x,y, and z), not just x and y. i think the axes transformation is the next PR to be merged. probably the other metadata will be visited in a separate PR.

i'll close this for now, and will reopen after 0.4 is out.

jbms commented 2 years ago

I might suggest that OME-Zarr be used to refer to the metadata format, OME-Zarr-Py refer to the specific implementation on top of zarr-python, and OME-NGFF might not be used at all, as it seems more ambiguous than OME-Zarr.

satra commented 2 years ago

reopening this issue as it looks like the ngff 0.4 spec is closer to being done.

it does seem that some abstraction of the precomputed/zarr formats could be used to read the ome-zarr data. if it's single scale it's essentially analogous to zarr. it it's multiscale it's similar to precomputed (without sharding at the moment).

are there pointers to where PRs would could be done? or do you see the basic ngff pyramidal support as something that's easier for you to implement. and would it be helpful to have ngff zarr stores always have consolidated metadata?

jbms commented 2 years ago

I don't think the consolidated metadata would help. As far as I can see the metadata for a multiscale ome volume would in any case be in a single .zattr file, right?

One question is whether to use a separate protocol name, like one+zarr:// or just support it via zarr:// using autodetection.

I think one issue is that ome metadata does not currently provide a clear way to autodetect it, e.g. via a property that clearly indicates it is ome. Perhaps we can add that to the ome spec though.

The other reason we might want to use a separate protocol is if there are cases a user may want to disregard the ome metadata (e.g. transforms) and just access the zarr array directly, or if there are cases where there is both ome metadata and metadata under some other multiscale standard; in that case the user may want to be able to choose which multiscale representation to use.

Regarding the implementation you could refer to the n5 datasource, as it already supports a form of multiscale metadata.

normanrz commented 2 years ago

Is there a way to use multi-scale Zarr datasets in Neuroglancer as of today?

jbms commented 2 years ago

There is not.

normanrz commented 2 years ago

I think ome-zarr (=ome-ngff) would be a great option for that. As you may know, there have been updates to the spec since your reply in December. The version 0.4 now supports the axes property to detect time, channel, and space dimensions. It also supports explicit transforms for the multiscales properties.

One question is whether to use a separate protocol name, like ome+zarr:// or just support it via zarr:// using autodetection.

I'd personally lean towards auto-detection, but I understand the concerns you've raised.

I think one issue is that ome metadata does not currently provide a clear way to autodetect it, e.g. via a property that clearly indicates it is ome. Perhaps we can add that to the ome spec though.

I think that would be a useful addition to the spec.

The other reason we might want to use a separate protocol is if there are cases a user may want to disregard the ome metadata (e.g. transforms) and just access the zarr array directly

In that case, users could specify the URL to the zarr array instead of the group (examples linked).

or if there are cases where there is both ome metadata and metadata under some other multiscale standard; in that case the user may want to be able to choose which multiscale representation to use.

Imo for now the main use case for the ome-metadata is the specification of multiscales.

normanrz commented 2 years ago

Here is a larger OME-NGFF dataset (with 3 layers): Raw data: https://s3.eu-west-1.amazonaws.com/static.webknossos.org/data/l4dense_motta_et_al_demo_zarr/color/ Segmentation: https://s3.eu-west-1.amazonaws.com/static.webknossos.org/data/l4dense_motta_et_al_demo_zarr/segmentation/ Affinity predictions: https://s3.eu-west-1.amazonaws.com/static.webknossos.org/data/l4dense_motta_et_al_demo_zarr/predictions/

normanrz commented 2 years ago

Hi @jbms, cool to see your implementation of ome-zarr! I was wondering if it would be possible to auto-detect channel or time axes from the OME metadata and designate them with ^ (e.g. in this layer c => c^)?

jbms commented 2 years ago

I am working on an additional change to order dimensions: space, time, channel, other, and possibly order x, y, z. That way the display dimensions will default to the spatial dimensions. That is probably what is desired in most cases, though I am somewhat wary of reordering dimensions. For channel dimensions it could make sense to mark them "channel" dimensions (c^), but as that is only supported for unchunked dimensions it may be better to just mark them as "local" dimensions (c'). For time dimensions it seems like it would normally make sense to have them as global dimensions, why would you want them as local dimensions?

satra commented 2 years ago

@jbms - thanks a lot. is this already deployed on appspot or do i need to build to check it out?

normanrz commented 2 years ago

That way the display dimensions will default to the spatial dimensions.

I think that would be great!

For channel dimensions it could make sense to mark them "channel" dimensions (c^), but as that is only supported for unchunked dimensions it may be better to just mark them as "local" dimensions (c'). For time dimensions it seems like it would normally make sense to have them as global dimensions, why would you want them as local dimensions?

I don't think I fully understand the difference between channel, global and local dimensions.

jbms commented 2 years ago

This is deployed on appspot.

I also implemented the fix for naming channel dimensions c'. I decided not to implement dimension reordering.

Global dimensions are shared by all layers --- they should be used in cases where a position within the dimension in one layer corresponds to the same position in every other layer with that dimension. Only global dimensions can be used as display dimensions (dimensions shown in the 2d or 3d views).

Local dimensions are local to a given layer --- there is no correspondence between local dimensions of the same name across different dimensions. A similar result can be achieved by just giving such dimensions unique global names, but the UI is also different, in that the position is shown within the layer tab rather than at the top with the global dimensions. Neuroglancer always just displays a cross section at a single position within a local dimension.

Channel dimensions are a type of local dimension supported by image layers, where instead of displaying a cross section at a single position within the dimension, the values at all positions along the dimension are available to the shader, and may be used to compute the RGBA value that will be rendered. For example if there is a channel dimension of size 3 specifying the red, green, and blue values, then by making it a channel dimension c^ you can define a shader:

void main() {
  emitRGB(vec3(getDataValue(0), getDataValue(1), getDataValue(2)));
}

to display it as RGB.

satra commented 2 years ago

@jbms - just an fyi.

i'm trying this on this: https://dandiarchive.s3.amazonaws.com/zarr/3d538281-06d4-48ea-9e8b-72dfe08981a1/.zattrs (and it's unable to read the time axis property)

Error parsing "axes" property: Error parsing "unit" property: Unsupported unit: "second"

which should be ok from an ome-zarr spec perspective.

jbms commented 2 years ago

@satra Thanks for reporting that issue --- there is a trivial fix for the second issue (base SI units without a prefix were accidentally excluded) but I noticed another issue loading your dataset that I am looking into now.

normanrz commented 2 years ago

Cool! This now makes it very easy to stream datasets from a webKnossos server into Neuroglancer. See example. Thanks for your work on this!

jbms commented 2 years ago

@satra I have now fixed several issues identified based on your dataset and it now works correctly.

satra commented 2 years ago

awesome thanks @jbms - it also fixed some issues with this one

and i can start stitching chunks as layers

satra commented 2 years ago

@jbms - quick question - will there be a pypi update with these changes?

jbms commented 2 years ago

It is now available in 2.28.

joshmoore commented 2 years ago

Sorry for coming late to the party. A few late comments:

:100:. I've started passing URLs around. :heart:
@satra randomly testing one of your URLs from above, please see this validation error
@jbms: https://github.com/google/neuroglancer/tree/master/src/neuroglancer/datasource/zarr mentions support for "0.5" but that's not released. Are there "0.5" datasets in the wild?

jbms commented 2 years ago

I haven't seen any 0.5 datasets, but I have seen 0.5-dev and I figured I'd include 0.5 in the allowed version list so that neuroglancer continues to work if someone uses that version number in the future, since I don't guarantee anyway that all functionality is supported.

If you are concerned that will lead to non-conforming datasets I could remove it, though.

google / neuroglancer

ome-zarr / ngff support #360