ome / omero-cli-zarr

https://pypi.org/project/omero-cli-zarr/
GNU General Public License v2.0
15 stars 10 forks source link

add --allow_overlaps arg to cli #120

Closed will-moore closed 2 years ago

will-moore commented 2 years ago

Add support for --overlaps=dtype_max to cli for Image and Plate export of polygons and masks. Also update README with this (and mention export of polygons). To test, using IDR: (idr0001):

$ omero zarr export Image:1229801
$ omero zarr polygons Image:1229801 --overlaps=dtype_max

$ napari --plugin napari-ome-zarr 1229801.zarr

Not tested Plate export yet, or export of masks. Overlapping regions should be exported with a value of dtype.max.

NB: The masks for the Image above are exported as a single plane with Z index of 0. Only by scrolling to first Z-index are they visible (See screenshot) I wonder if we can use coordinateTransformations to solve this?

Without the --overlaps arg, we get:

  File "/Users/wmoore/Desktop/ZARR/omero-cli-zarr/src/omero_zarr/masks.py", line 529, in masks_to_labels
    raise Exception(
Exception: Mask 632801 overlaps with existing labels

Screenshot 2022-06-21 at 13 47 57

sbesson commented 2 years ago

I am supportive of the spirit of this PR i.e. exposing some opt-in option to export overlapping masks/polygons (or in the case above overlapping masks derived from polygons) especially as we already have some underlying API. Amongst others, for testing e.g. https://github.com/ome/ome-zarr-py/pull/207, building a complete example of OME-NGFF plate with labels using idr0001 would be very advantageous.

My primary concern is the implementation choice (not introduced in this PR) to compute the sum of all the overlapping values. A few Under some conditions, this can break the dtype mechanism introduced in #116 as the maximal label value might exceed the computed dtype. Additionally, there is no guarantee that these values will be unique e.g. assuming there is an overlap between masks 10 & 13 and 11 & 12, the two sets of intersectings pixels would be assigned with the same value. Finally and probably more importantly, I think this means the label value of an overlapping region (e.g. from masks 1 and 2) can be identical to the label value of another region.

Understanding part of this API is lossy by essence, another approach would be to set the value of the overlapping regions to zero before calling https://github.com/ome/omero-cli-zarr/blob/2342a90eaf2d122290bf49de2b86b47e0f042c22/src/omero_zarr/masks.py#L529-L530. Effectively, this means the value of the first mask/polygon found for each pixel would take precedence over all following values.

will-moore commented 2 years ago

Hmmm - good points. One option is to set all overlapping regions to value of len(shapes) + 2 (one more than the last shape). In most cases that wouldn't be lossy (except in the cases where 2 overlapping regions overlap!). It would allow you to preserve the shape of the overlapping regions, instead of losing that info.

Another thought on exporting labels for a Plate: The labels values are now 1 -> n for each Image, but when we combine these into a Plate in ome-zarr-py reader, there will be a blob of value 1 for each Image on the Plate.

It still allows you to view the labels on a Plate, but when we try to combine the label-values properties for each Image into a single dict for the Plate, we'd lose that info. Current behaviour in https://github.com/ome/ome-zarr-py/pull/207/files is:

        # combine 'properties' from each image
        # from https://github.com/ome/ome-zarr-py/pull/61/
        properties: Dict[int, Dict[str, Any]] = {}
        for well_path in self.well_paths:
            path = self.get_image_path(well_path) + ".zattrs"
            labels_json = self.zarr.get_json(path).get("image-label", {})
            # NB: assume that 'label_val' is unique across all images
            props_list = labels_json.get("properties", [])
            if props_list:
                for props in props_list:
                    label_val = props["label-value"]
                    properties[label_val] = dict(props)
                    del properties[label_val]["label-value"]
        node.metadata["properties"] = properties
will-moore commented 2 years ago

@sbesson That commit sets overlapping regions to dtype.max as in the spec "the highest integer available in the pixel range" not dtype.max - 1? I updated the help message for --allow_overlaps but I think the name is OK. --split_overlaps sounds like you're choosing between different ways to handle overlaps, but the choice is really just to allow them or not?

With that change, viewing the above image in napari as before, mousing over the 1 overlap shows it's value of 127:

Screenshot 2022-06-24 at 11 29 04

joshmoore commented 2 years ago

I updated the help message for --allow_overlaps but I think the name is OK. --split_overlaps sounds like you're choosing between different ways to handle overlaps, but the choice is really just to allow them or not?

This makes me wonder if we wouldn't be safer with:

--overlaps=[error (default) | allow | ...]

for extensibility.

will-moore commented 2 years ago

Hmmm - yeah, I get the idea of extensibility, but until then you've got --overlaps allow and no other option, which is slightly odd. And when (if) you add other options you might have --overlaps [allow | ignore | set-zero] and then the allow option looks odd as it doesn't describe a behaviour. That's only a minor concern, so I'm happy to go for it, but I wonder that we get sub-optimal situation in both cases to avoid the (small?) chance we might want to change the API later.

joshmoore commented 2 years ago

but until then you've got --overlaps allow and no other option

except at the moment I think you have --overlaps=error, no? So that stays as the default and you are adding a different "overlap strategy" here.

sbesson commented 2 years ago

and then the allow option looks odd as it doesn't describe a behaviour.

That's exactly the issue I have with the single --allow-overlaps flag at the moment as I feel I need more to understand what this does. I can see two options:

  1. an --allow-overlaps flag that determines whether you error or not and eventually an --overlap-strategy= [strategy1 | strategy2] flag that allows to choose different resolution mechanism
  2. a single --overlaps flag as proposed above but in that case, I would use something more specific than allow to cater for future implemmentation
will-moore commented 2 years ago

@sbesson agreed - I guess it comes down to how likely we are to want other strategies. The current one is the only one I can think of that makes sense (and matches the spec), so my preference is for option 1) - which still allows for other strategies later if needed. But I'm OK with option 2 if we can already think of any other strategies we might want in the future?

will-moore commented 2 years ago

So, having thought about this over lunch... I could imagine the possibility of using negative values for overlapping regions. The use-case being... If I want to combine the labels from HCS images into a Plate, I might want to find the max(label_value) for each image, so as to add an offset to the next set of labels, to keep them unique. In which case, It might be useful for max(label_value) to NOT be dype.max. As far as I can see from the spec, negative numbers are acceptable for label values, so using negative values for overlapping region could allow you to have a unique value for each overlap?

will-moore commented 2 years ago

So, let's go for option 2. E.g. --overlaps=dtype_max sound good?

joshmoore commented 2 years ago

No objections from my side. :+1: