Support for multi-channel labels

tischi commented 3 years ago

@will-moore @joshmoore @constantinpape

What is meaning of the channel dimension for the label images?

I could imagine:

It must be a singleton dimension, where only channel 0 exists
If the intensity image has multiple channels, each channel could have its own segmentation (label-image), and the channel dimension of the label image corresponds to the channel dimension of the intensity image

Is there already a spec for this?

tischi commented 3 years ago

I think the option (2) may not make too much sense, because (i) segmentations could be obtained using all channels (e.g. in a machine learning setting) and (ii) if, e.g., only channel 0 and 5 have a segmentation we would have to store label images also for all the other inbetween channels.

will-moore commented 3 years ago

Not sure I know what you mean by "channel dimension for the label images" but it sounds like coming from OMERO, where each Shape can have C index (optional) if you wish to indicate which channel in the origin image it is associated with (e.g. segmented from). But I don't think it appears in the OME-Zarr spec (unless I've missed it)?

tischi commented 3 years ago

@will-moore According to the current spec, the label images are 5D (t,c,z,y,x), and thus have a channel dimension. I think I found the spec here: https://ngff.openmicroscopy.org/latest/#citing

    └── labels
        │
        ├── .zgroup           # The labels group is a container which holds a list of labels to make the objects easily discoverable
        │
        ├── .zattrs           # All labels will be listed in <code data-opaque bs-autolink-syntax='`.zattrs`'>.zattrs</code> e.g. <code data-opaque bs-autolink-syntax='`{ &quot;labels&quot;: [ &quot;original/0&quot; ] }`'>{ "labels": [ "original/0" ] }</code>
        │                     # Each dimension of the label <code data-opaque bs-autolink-syntax='`(t, c, z, y, x)`'>(t, c, z, y, x)</code> should be either the same as the
        │                     # corresponding dimension of the image, or <code data-opaque bs-autolink-syntax='`1`'>1</code> if that dimension of the label
        │                     # is irrelevant.
        │

I think one could interpret this as suggesting that the channel dimension should be a singleton, but I think it could be clearer. What do you think?

will-moore commented 3 years ago

Ah, yes sorry. I guess we 'lose' the channel dimension when we open in napari since each image channel is split into a separate 4D layer, and then the labels are another 4D layer. I don't think we have any examples where we have labels with multi-C dimension. In napari, I don't think we'd have any way of 'linking' a labels layer (one channel of a label) with the corresponding channel of the image (another layer), except maybe by naming them in the same way.

tischi commented 3 years ago

In napari, I don't think we'd have any way of 'linking' a labels layer (one channel of a label) with the corresponding channel of the image (another layer), except maybe by naming them in the same way.

In BDV it is the same.

thewtex commented 3 years ago

Can overlapping labels be specified through multiple "channels"?

CC @lassoan

joshmoore commented 3 years ago

I think this was largely an "implementation restriction" since napari was the only viewer currently handling OME-Zarr labels and it couldn't use the channel information. If everyone's on board, I think it makes sense to add support (or specify that labels are single channel only)

cc: @jni @tlambert03 @sofroniewn @manzt

Edit: I should clarify before @tischi started implementing which led to this issue.

jni commented 3 years ago

Sorry for slow response. For napari it'll be some time before we handle overlapping labels, but it's been requested a couple of times before so I don't want us to be the blocking implementation here! It would make sense for ome-zarr to allow channels support, and the napari plugin can simply return a list of 4D labels layers. We currently scale poorly with many layers but it would "work", and we are always working on those scalability issues.

lassoan commented 3 years ago

In 3D Slicer, each non-overlapping group of segments is stored in a 3D volume (we call this a "layer", I think it is referred to as "channel" above). If all segments are non-overlapping then the segmentation is a 3D volume, otherwise it is a 4D volume. We rarely encounter the need for a a 5th dimension, but sometimes it comes up. I don't remember anyone asking for a 6th dimension in the past 10 years. So, specifying segmentation as up to 5D (t,c,z,y,x), sounds good.

Currently, we store the following metadata per segment:

channel index (index of the 3D volume within the 4D array, if it is a 4D array)
label value (label value within the 3D volume)
id (machine-readable identifier unique within the segmentation)
name (human-readable name) + auto-generated flag (if name is set automatically from a preset or a custom name entered by the user)
rgb color + auto-generated flag (if color is auto-generated from a preset or the user has specifically set it)
extent (xmin, xmax, ymin, ymax, zmin, zmax in voxel coordinates; to be able to quickly extract small segments from a large volume)
tags (key/value pairs): it is used for example to describe the content of the segment using standard terminologies (using 3 strings: coding scheme, code value, and code meaning; which allows lossless storage of segments imported from DICOM)

It would great if we could standardize as many fields of the above as possible, but at least agree in that we allow storing non-overlapping segments in one channel and allow storing multiple channels (and define metadata fields for specifying channel index and label value for each segment).

0x00b1 commented 3 years ago

I started work on napari/napari#269.

Labels should, in my opinion, use the representation that is both ubiquitous in computer vision research and machine learning libraries like PyTorch and TensorFlow: (n, r, c) of bool or uint.

lassoan commented 3 years ago

I cannot comment on what is common in computer vision, but in medical imaging labelmap volume is the standard (3D volume with char or short voxel value specifying what structure is there). Overlapping label support is not that common, but typical solution is 4D labelmap volume. Since you often have atlases with hundreds of labels, bool voxels are not generally usable.

We obviously will not be able to find a single organization of label data that works for everybody, so if we want this file format to see wide adoption then it should allow specification of the meaning of each axis of the label array.

0x00b1 commented 3 years ago

@lassoan For sure. This was the common structure in computer vision too. But this changed, like everything else in the past decade, when learned-based methods became standard. Think about overlapping objects from a y_pred rather than a y_true perspective. Your ground truth, y_true, may have exactly one value per unit (pixel, voxel, or whatever) but your prediction certainly won't. Your data structure, in my opinion, should reflect the probabilistic nature of contemporary methods.

0x00b1 commented 3 years ago

@lassoan Your comment is really interesting! I should confess that I know absolutely nothing about microscopy!

I don't remember anyone asking for a 6th dimension in the past 10 years. So, specifying segmentation as up to 5D (t,c,z,y,x), sounds good.

As far as I know, I too have not personally run into this issue in biological contexts but it has become increasingly common in non-biological contexts (e.g. robotics). Hell, my new iPhone 12 Pro Max, for whatever reason, has a LiDAR sensor. 🤷‍♂️

You can also imagine a situation where embeddings are packed alongside the pixel information, e.g.

(frames, planes, features, rows, columns, channels)

I believe Carolina Wählby experimented with this.

lassoan commented 3 years ago

@lassoan For sure. This was the common structure in computer vision too. But this changed, like everything else in the past decade, when learned-based methods became standard. Think about overlapping objects from a y_pred rather than a y_true perspective. Your ground truth, y_true, may have exactly one value per unit (pixel, voxel, or whatever) but your prediction certainly won't. Your data structure, in my opinion, should reflect the probabilistic nature of contemporary methods.

In 3D Slicer, we implemented all the mentioned representations and some more (3D labelmap, 4D labelmap, 4D fractional labelmap; and - primarily for 3D display - closed surface, planar contours, and ribbons; see overview here) along with automatic conversion algorithms between them and visualization and editing in both 2D and 3D.

We thought that fractional labelmaps (4D volume, each voxel describes some kind of probability) would be very useful and worked a lot on implementing first-class support for them (interactive editing and visualization, GPU-accelerated supersampling conversion, etc.). Surprisingly, it is barely used. Even though most ML prediction results are kind of probabilistic, it seems that by the time it gets to be displayed to end users, the results are usually already converted to labelmap or binary image. Trends can change quickly though, so I agree that the file format should be able to handle fractional labelmaps well.

constantinpape commented 3 years ago

I think parts of the discussion here moved away slightly from the original question about multi-channel support for labels.

Labels should, in my opinion, use the representation that is both ubiquitous in computer vision research and machine learning libraries like PyTorch and TensorFlow: (n, r, c) of bool or uint.

I think this is related to the general question of how to specify axes / dimensions in the NGFF format. I don't think that it would be a good idea to introduce a separate nomenclature for labels here. There is currently PR #46 in progress to introduce axes labels. Note that this is still fairly limited (only allowing x, y, z, c, t) but this can certainly be extended further, see discussion in #35 and also related #28 (all extensions should be non-breaking with #46 though).

Think about overlapping objects from a y_pred rather than a y_true perspective. Your ground truth, y_true, may have exactly one value per unit (pixel, voxel, or whatever) but your prediction certainly won't. Your data structure, in my opinion, should reflect the probabilistic nature of contemporary methods.

I agree that being able to represent probabilistic predictions is important. But I would see this in a different category than the labels discussed here; for many downstream analysis tasks having a "regular" label map will be prerequisite. For now, probability maps can be stored following the "normal" NGFF data definition. We could think about some additional metadata for it. And maybe also allow "linking" them to the primary data.

(3D labelmap, 4D labelmap, 4D fractional labelmap; and - primarily for 3D display - closed surface, planar contours, and ribbons; see overview here)

That's a very nice overview! I think 3d labelmaps are already covered by the current spec and 4d could be achieved using the "c" dimension (which is the initial topic of this issue). I assume that "fractional" labelmaps would correspond to the probabilistic prediction case (see above). For surfaces and contours, the most relevant discussion is #33.

0x00b1 commented 3 years ago

@constantinpape I have not followed this (or any other ngff) discussion until yesterday! I apologize for missing some important context. 😄

I agree that being able to represent probabilistic predictions is important. But I would see this in a different category than the labels discussed here; for many downstream analysis tasks having a "regular" label map will be prerequisite. For now, probability maps can be stored following the "normal" NGFF data definition. We could think about some additional metadata for it. And maybe also allow "linking" them to the primary data.

My probabilistic example was just one example of overlapping labels. Overlapping visible and occluded regions is another.

0x00b1 commented 3 years ago

Trends can change quickly though, so I agree that the file format should be able to handle fractional labelmaps well.

@lassoan Exactly. argmax predictions are, and I assume will remain, extremely common! Hell, they are preferred in countless situations. As far as trends are concerned, every method on the Cityscapes and Common Objects in Context leaderboards outputs (objects, y, x) masks! Nevertheless, I realize that I may not be the target audience for ngff! 🤷

ome / ngff

Support for multi-channel labels #19