mtbc commented 4 years ago

Split masks

For split masks the microservice specifies "dtype":"|b1" and the Zarr spec defines a Boolean data type as,

"b": Boolean (integer type where all values are only True or False)

and describes how metadata is encoded using JSON. However, in JSON, Boolean types are all-lowercase, no initial capital letter as above.

The microservice takes the JSON Boolean idea in offering, e.g., "fill_value":false and "color":{"true":-2139062144}. However, it's possible that it's meant to do a more-Pythonic initial-capital thing, or just use integers? (With 0 as false and not-0 as true? https://github.com/ome/omero-ms-zarr/pull/60#issuecomment-673378885 suggests 1.)

Such changes are trivial, it's just very unclear. If anybody arrives at certainty, happy to open a fixing PR.

mtbc commented 4 years ago

(Basically, from the point of view of developing in Java, it's generally not fun to try to interpret the Zarr v2 spec. :confused: Compressors are another thorny bit.)

manics commented 4 years ago

I'd have thought we'd want to stick to using JSON types where possible, so native booleans rather than strings representing booleans. What does the "true" in "color":{"true":-2139062144} mean?

mtbc commented 4 years ago

I was surprised by that too. I'm adapting it from https://github.com/ome/omero-ms-zarr/blob/6add6804/spec.md#user-content-color, don't know why it needs stringifying though.

manics commented 4 years ago

Is it an opaque string, and the fact it happens to be a number is irrelevant? i.e. each object has a string label that is associated with a color?

mtbc commented 4 years ago

I think the image's pixel values have to fit the dtype as at https://zarr.readthedocs.io/en/stable/spec/v2.html#data-type-encoding.

manics commented 4 years ago

Must be because JSON only allows string keys

manics commented 4 years ago

Though I'd argue true/false should be represented as 1/0.

Perhaps this should be an array where array-index = object id, and if an object is missing it's set to null ?

mtbc commented 4 years ago

Aha, that'll be it, thank you. Could be we decide to opt for 1/0 for split masks, it's an easy change to make to the microservice. I wonder if that violates what Zarr readers expect for |b1 re. fill value, etc.? And all-bits-set is nice because it's more obvious in napari, I haven't figured the buttons to make it have 1/0 perceptible, sigh. Could make the not-0 value configurable.

joshmoore commented 4 years ago

cf:

@manics: can you outline the trade-offs between the two representations. The first proposal (https://github.com/ome/omero-ms-zarr/pull/65) is to have the colors in an array the size of the labels, right?

  "color": [
    null,
    8388736,
    ...
  ]

with the downside that choosing large label values leads to large arrays. (One option would be to put this array in zarr rather than json?)

The other proposal (https://github.com/ome/omero-ms-zarr/pull/68) is basically to make the color metadata more explicit, right?

  "color": [
    {
      "label": 1,
      "rgba": 8388736
    },
    {
      "label": 2,
      "rgba": 10101010
    },

and then allowing to potentially update the key in the object to be something other than "rgba", e.g. "hex" or whatever?

manics commented 4 years ago

Yes, pretty much what you said. The other benefit of the second option is it's extensible to include other metadata about a label, for instance you could add a name/description to each label. In this case it might be better to rename color to meta or metadata.

I think we should also change the current default colour model from "an int in whatever format OMERO uses" to a more understandable 4-tuple: [0-255, 0-255, 0-255, 0-255], or a CSS style hex value aabbccdd if you're concerned about the size of the field.

manics commented 4 years ago

Decisions from todays Zoom call:

Agreed:

Default to storing colour as an RGBA unsigned-int 4-tuple: [uint8, uint8, uint8, uint8], this does not preclude supporting other formats in future

Rejected:

The non-sparse array format

To be decided:

Keep the existing layout:

"color": {
  "1": [255, 255, 255, 0],
  "4": [0, 255, 255, 128],
  ...
}

Other metadata will be stored in a different top-level key, but with the same structure.

Sparse list of dicts:

"label-metadata": [
  {
    "label-value": 1,
    "rgba": [255, 255, 255, 0]
  },
  {
    "label-value": 4,
    "rgba": [0, 255, 255, 128]
  },
  ...
]

Sparse dict of dicts where key is the label-value converted to a string:

"label-metadata": {
  "1": {
    "rgba": [255, 255, 255, 0]
  },
  "4": {
    "rgba": [0, 255, 255, 128]
  },
  ...
}

will-moore commented 4 years ago

I prefer the first of the 3 options (most concise), but I guess the complexity of the labels parsing we're doing now for ome-zarr-py won't be very much different for any of these alternatives:

https://github.com/ome/ome-zarr-py/blob/208952c770c5d3683a8a8339173c0bc6ec158e54/ome_zarr.py#L276-L284

manics commented 4 years ago

Just noticed a bug in https://github.com/ome/ome-zarr-py/blob/208952c770c5d3683a8a8339173c0bc6ec158e54/ome_zarr.py#L280-L281 bool(non-empty-string) is always True, so bool("false") == True

joshmoore commented 4 years ago

Fixed in https://github.com/ome/ome-zarr-py/pull/45/commits/db43af52efb5b29d0201707adcd7329214c50717

k-dominik commented 4 years ago

If I may add my two cents.

sparse list of dicts: it is very explicit, support for a different color model is trivial, label names can be different types than strings. But the question is whether this freedom is desired? If not, than the current layout is probably fine. What I like about the other two approaches is that they are more human readable. I understand them without reading the specs. Wouldn't be so sure about the current layout.

joshmoore commented 4 years ago

Currently working on a set of PRs to test out the following

 24 ### "image-label" metadata
 23
 22 Groups containing the `image-label` dictionary represent an image segmentation
 21 in which each unique pixel value represents a separate segmented object.
 20 `image-label` groups MUST also contain `multiscales` metadata and the two
 19 "datasets" series MUST have the same number of entries.
 18
 17 The `colors` key defines a list of JSON objects describing the unique label
 16 values. Each entry in the list MUST contain the key "label-value" with the
 15 pixel value for that label. Additionally, the "rgba" key MAY be present, the
 14 value for which is an RGBA unsigned-int 4-tuple: `[uint8, uint8, uint8, uint8]`
 13 In the case that the same `label-value` is present multiple times in the list,
 12 the last value wins.
 11
 10 Some implementations may represent overlapping labels by using a specially assigned
  9 value, for example the highest integer available in the pixel range.
  8
  7 The `source` key is an optional dictionary which contains information on the
  6 image the label is associated with. If included it MAY include a key `image`
  5 whose value is the relative path to a Zarr image group. The default value is
  4 "../../" since most labels are stored under a subgroup named "labels/" (see
  3 above).
  2
  1
161 ```
  1 "image-label":
  2   {
  3     "version": "0.1",
  4     "colors": [
  5       {
  6         "label-value": 1,
  7         "rgba": [255, 255, 255, 0]
  8       },
  9       {
 10         "label-value": 4,
 11         "rgba": [0, 255, 255, 128]
 12       },
 13       ...
 14       ]
 15     },
 16     "source": {
 17       "image": "../../"
 18     }
 19 ]
 20 ```

joshmoore commented 4 years ago

Reading through https://qupath.readthedocs.io/en/latest/docs/advanced/exporting_annotations.html#binary-labeled-images, one wonders if the label-values need to be per channel.

joshmoore commented 4 years ago

image-label proposal is now opened in:

Separately, also considering an array (rather than JSON metadata) similar to this example:

import numpy as np
import zarr

t = [
    ("label-value", int),
    ("r", int),
    ("g", int),
    ("b", int),
    ("a", int),
    ("object-type", "U20"),
    ("object-id", int),
    ("description", "U200")]

data = list()
for x in range(100):
    data.append((1, 100, 100, 100, 100, "Mask", 123456, "some text here"))
    data.append((2, 200, 200, 200, 200, "Mask", 567896, "some more text"))
a = np.array(data, dtype=t)

z = zarr.open("s.zarr")
z.array(name="a", data=a, chunks=(10,))

imagesc-bot commented 4 years ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/multi-scale-image-labels-v0-1/43483/1

ome / omero-ms-zarr

Revamp color metadata #62

Split masks

Decisions from todays Zoom call:

Agreed:

Rejected:

To be decided:

Keep the existing layout:

Sparse list of dicts:

Sparse dict of dicts where key is the label-value converted to a string: