Open tischi opened 3 years ago
I think the option (2) may not make too much sense, because (i) segmentations could be obtained using all channels (e.g. in a machine learning setting) and (ii) if, e.g., only channel 0 and 5 have a segmentation we would have to store label images also for all the other inbetween channels.
Not sure I know what you mean by "channel dimension for the label images" but it sounds like coming from OMERO, where each Shape can have C index (optional) if you wish to indicate which channel in the origin image it is associated with (e.g. segmented from). But I don't think it appears in the OME-Zarr spec (unless I've missed it)?
@will-moore
According to the current spec, the label images are 5D (t,c,z,y,x)
, and thus have a channel dimension.
I think I found the spec here: https://ngff.openmicroscopy.org/latest/#citing
└── labels
│
├── .zgroup # The labels group is a container which holds a list of labels to make the objects easily discoverable
│
├── .zattrs # All labels will be listed in <code data-opaque bs-autolink-syntax='`.zattrs`'>.zattrs</code> e.g. <code data-opaque bs-autolink-syntax='`{ "labels": [ "original/0" ] }`'>{ "labels": [ "original/0" ] }</code>
│ # Each dimension of the label <code data-opaque bs-autolink-syntax='`(t, c, z, y, x)`'>(t, c, z, y, x)</code> should be either the same as the
│ # corresponding dimension of the image, or <code data-opaque bs-autolink-syntax='`1`'>1</code> if that dimension of the label
│ # is irrelevant.
│
I think one could interpret this as suggesting that the channel dimension should be a singleton, but I think it could be clearer. What do you think?
Ah, yes sorry. I guess we 'lose' the channel dimension when we open in napari
since each image channel is split into a separate 4D layer, and then the labels are another 4D layer.
I don't think we have any examples where we have labels with multi-C dimension. In napari, I don't think we'd have any way of 'linking' a labels layer (one channel of a label) with the corresponding channel of the image (another layer), except maybe by naming them in the same way.
In napari, I don't think we'd have any way of 'linking' a labels layer (one channel of a label) with the corresponding channel of the image (another layer), except maybe by naming them in the same way.
In BDV it is the same.
Can overlapping labels be specified through multiple "channels"?
CC @lassoan
I think this was largely an "implementation restriction" since napari was the only viewer currently handling OME-Zarr labels and it couldn't use the channel information. If everyone's on board, I think it makes sense to add support (or specify that labels are single channel only)
cc: @jni @tlambert03 @sofroniewn @manzt
Edit: I should clarify before @tischi started implementing which led to this issue.
Sorry for slow response. For napari it'll be some time before we handle overlapping labels, but it's been requested a couple of times before so I don't want us to be the blocking implementation here! It would make sense for ome-zarr to allow channels support, and the napari plugin can simply return a list of 4D labels layers. We currently scale poorly with many layers but it would "work", and we are always working on those scalability issues.
In 3D Slicer, each non-overlapping group of segments is stored in a 3D volume (we call this a "layer", I think it is referred to as "channel" above). If all segments are non-overlapping then the segmentation is a 3D volume, otherwise it is a 4D volume. We rarely encounter the need for a a 5th dimension, but sometimes it comes up. I don't remember anyone asking for a 6th dimension in the past 10 years. So, specifying segmentation as up to 5D (t,c,z,y,x), sounds good.
Currently, we store the following metadata per segment:
It would great if we could standardize as many fields of the above as possible, but at least agree in that we allow storing non-overlapping segments in one channel and allow storing multiple channels (and define metadata fields for specifying channel index and label value for each segment).
I started work on napari/napari#269.
Labels should, in my opinion, use the representation that is both ubiquitous in computer vision research and machine learning libraries like PyTorch and TensorFlow: (n, r, c)
of bool
or uint
.
I cannot comment on what is common in computer vision, but in medical imaging labelmap volume is the standard (3D volume with char or short voxel value specifying what structure is there). Overlapping label support is not that common, but typical solution is 4D labelmap volume. Since you often have atlases with hundreds of labels, bool voxels are not generally usable.
We obviously will not be able to find a single organization of label data that works for everybody, so if we want this file format to see wide adoption then it should allow specification of the meaning of each axis of the label array.
@lassoan For sure. This was the common structure in computer vision too. But this changed, like everything else in the past decade, when learned-based methods became standard. Think about overlapping objects from a y_pred
rather than a y_true
perspective. Your ground truth, y_true
, may have exactly one value per unit (pixel, voxel, or whatever) but your prediction certainly won't. Your data structure, in my opinion, should reflect the probabilistic nature of contemporary methods.
@lassoan Your comment is really interesting! I should confess that I know absolutely nothing about microscopy!
I don't remember anyone asking for a 6th dimension in the past 10 years. So, specifying segmentation as up to 5D (t,c,z,y,x), sounds good.
As far as I know, I too have not personally run into this issue in biological contexts but it has become increasingly common in non-biological contexts (e.g. robotics). Hell, my new iPhone 12 Pro Max, for whatever reason, has a LiDAR sensor. 🤷♂️
You can also imagine a situation where embeddings are packed alongside the pixel information, e.g.
(frames, planes, features, rows, columns, channels)
I believe Carolina Wählby experimented with this.
@lassoan For sure. This was the common structure in computer vision too. But this changed, like everything else in the past decade, when learned-based methods became standard. Think about overlapping objects from a y_pred rather than a y_true perspective. Your ground truth, y_true, may have exactly one value per unit (pixel, voxel, or whatever) but your prediction certainly won't. Your data structure, in my opinion, should reflect the probabilistic nature of contemporary methods.
In 3D Slicer, we implemented all the mentioned representations and some more (3D labelmap, 4D labelmap, 4D fractional labelmap; and - primarily for 3D display - closed surface, planar contours, and ribbons; see overview here) along with automatic conversion algorithms between them and visualization and editing in both 2D and 3D.
We thought that fractional labelmaps (4D volume, each voxel describes some kind of probability) would be very useful and worked a lot on implementing first-class support for them (interactive editing and visualization, GPU-accelerated supersampling conversion, etc.). Surprisingly, it is barely used. Even though most ML prediction results are kind of probabilistic, it seems that by the time it gets to be displayed to end users, the results are usually already converted to labelmap or binary image. Trends can change quickly though, so I agree that the file format should be able to handle fractional labelmaps well.
I think parts of the discussion here moved away slightly from the original question about multi-channel support for labels.
Labels should, in my opinion, use the representation that is both ubiquitous in computer vision research and machine learning libraries like PyTorch and TensorFlow:
(n, r, c)
ofbool
oruint
.
I think this is related to the general question of how to specify axes / dimensions in the NGFF format.
I don't think that it would be a good idea to introduce a separate nomenclature for labels here.
There is currently PR #46 in progress to introduce axes labels. Note that this is still fairly limited (only allowing x
, y
, z
, c
, t
) but this can certainly be extended further, see discussion in #35 and also related #28 (all extensions should be non-breaking with #46 though).
Think about overlapping objects from a
y_pred
rather than ay_true
perspective. Your ground truth,y_true
, may have exactly one value per unit (pixel, voxel, or whatever) but your prediction certainly won't. Your data structure, in my opinion, should reflect the probabilistic nature of contemporary methods.
I agree that being able to represent probabilistic predictions is important. But I would see this in a different category than the labels discussed here; for many downstream analysis tasks having a "regular" label map will be prerequisite. For now, probability maps can be stored following the "normal" NGFF data definition. We could think about some additional metadata for it. And maybe also allow "linking" them to the primary data.
(3D labelmap, 4D labelmap, 4D fractional labelmap; and - primarily for 3D display - closed surface, planar contours, and ribbons; see overview here)
That's a very nice overview! I think 3d labelmaps are already covered by the current spec and 4d could be achieved using the "c" dimension (which is the initial topic of this issue). I assume that "fractional" labelmaps would correspond to the probabilistic prediction case (see above). For surfaces and contours, the most relevant discussion is #33.
@constantinpape I have not followed this (or any other ngff) discussion until yesterday! I apologize for missing some important context. 😄
I agree that being able to represent probabilistic predictions is important. But I would see this in a different category than the labels discussed here; for many downstream analysis tasks having a "regular" label map will be prerequisite. For now, probability maps can be stored following the "normal" NGFF data definition. We could think about some additional metadata for it. And maybe also allow "linking" them to the primary data.
My probabilistic example was just one example of overlapping labels. Overlapping visible and occluded regions is another.
Trends can change quickly though, so I agree that the file format should be able to handle fractional labelmaps well.
@lassoan Exactly. argmax
predictions are, and I assume will remain, extremely common! Hell, they are preferred in countless situations. As far as trends are concerned, every method on the Cityscapes and Common Objects in Context leaderboards outputs (objects, y, x)
masks! Nevertheless, I realize that I may not be the target audience for ngff! 🤷
@will-moore @joshmoore @constantinpape
What is meaning of the channel dimension for the label images?
I could imagine:
0
existsIs there already a spec for this?