More clarifications of the "image-label" spec

constantinpape commented 2 years ago

I am working on converting data with segmentations to ngff using the "image-label"s. There are a couple of questions about the description in https://ngff.openmicroscopy.org/latest/#label-md:

Are "colors" and/or "properties" mandatory or optional? This is not clear from the spec description.
Are "colors" / "properties" sparse? E.g.g if I have label values 1, 2, 3, 4 is it valid to just specify: colors: [{label-value: 1, rgba: [...]}]?
Is it valid to specify multiple "image-label" sources per image?

For my use case, the preferred answers would be 1: optional (segmentation with many label values, specifying a color per label is not necessary and would result in huge jsons). 2: sparse (could give properties for selected labels), 3.: should be valid because there is a nucleus and cell segmentation per image.

sbesson commented 2 years ago

Cross-linking to https://github.com/ome/omero-ms-zarr/pull/71 and https://github.com/ome/ngff/pull/3 where the original image-label specification and the properties extension was proposed.

@DragaDoncila and/or @joshmoore might want ot comment on this but my understanding is that both keys are optional (maybe RECOMMENDED)
there had been discussion around value duplication as well as the format of the colors (see https://github.com/ome/omero-ms-zarr/issues/62) but I cannot find any consensus on whether all label values MUST be defined in the dictionaries. Similarly to above, in the absence of a clear MUST in the specification, my assumption is that it's not a requirement in the current version although I would tend to mark this as RECOMMENDED
at least in https://www.openmicroscopy.org/2021/12/16/ome-ngff.html, there is an example of multiple segmentations Cell & Chromosome) for the OME-NGFF dataset generated from idr0052. Here each segmentation is stored as a separate Zarr group with its image-label and multiscales metadata so there was no use case for multiple image-label. How does the data storage look like in your scenario?

constantinpape commented 2 years ago

@DragaDoncila and/or @joshmoore might want ot comment on this but my understanding is that both keys are optional (maybe RECOMMENDED)

Ok, this would be good, but should be clarified in the spec. (I would personally not recommend colors; this is not a good fit for representing instance segmentations, which can easiliy have 10th to 100th of thousands of label values).

2. [...] Similarly to above, in the absence of a clear MUST in the specification, my assumption is that it's not a requirement in the current version although I would tend to mark this as RECOMMENDED

This would be fine by me; but again, I think the spec should clearly state this to avoid ambiguity.

3. Here each segmentation is stored as a separate Zarr group with its image-label and multiscales metadata so there was no use case for multiple image-label. How does the data storage look like in your scenario?

I had a look at idr0052 and this matches the data layout I need very well, so I will base my script on this example. There are two minor differences:

the segmentations are 4x smaller than the original image (fine with scale in v0.4)
we have multiple positions and a ome.tif for each position; these are however from a single coordinate space, so it could make sense to merge into a single image in ome.zarr; but this would require some special treatment for the instance segmentation; I will discuss this with shila.

sbesson commented 2 years ago

Yes to all the above re clarifying the specification wherever needed

we have multiple positions and a ome.tif for each position; these are however from a single coordinate space, so it could make sense to merge into a single image in ome.zarr; but this would require some special treatment for the instance segmentation; I will discuss this with shila.

You are right: for each embryo of this dataset, the different positions (field of views) is part of the same coordinate space overall. Ideally a user would like to access the full image according to the relative positions of the acquisitions. My initial thought was to focus on generating OME-NGFF representation (image + label) of each position, partly because I think we have a specification that covers these requirements and partly because I am not 100% sure of the best strategy to do the merging. One possibility is that the ongoing transformation/spaces work in #101 #84 would allow to specify metadata to register different multiscale images relative to another in the same coordinate space and allow clients to build a merged representation. An alternative approach would be to stitch the arrays at the multiscale level and create a single multi-resolution OME-NGFF image. Definitely happy to hear what Shila thinks of the above and/or have a follow-up discussion if needed.

constantinpape commented 2 years ago

One possibility is that the ongoing transformation/spaces work in #101 #84 would allow to specify metadata to register different multiscale images relative to another in the same coordinate space and allow clients to build a merged representation

Yes, I think that this would be the best solution in principle. Note that we do have all transformations required for this already (scale and translation), but would need to have a way to specify this in a collection, which is not possible yet as far as I can see.

An alternative approach would be to stitch the arrays at the multiscale level and create a single multi-resolution OME-NGFF image.

Yes, that would be the solution that's available now.

Definitely happy to hear what Shila thinks of the above and/or have a follow-up discussion if needed.

I will ask her; I think we have two options: if we need to have everything in the same coordinate space now we need to merge the positions in a single image, otherwise it would be better to wait and add this to the user stories and then develop the collection spec to support this use-case.

ome / ngff

More clarifications of the "image-label" spec #105