ome / ngff

Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
https://ngff.openmicroscopy.org
Other
110 stars 38 forks source link

units for data #203

Open d-v-b opened 12 months ago

d-v-b commented 12 months ago

The spec should express the units of the quantity represented in an image. This is a pretty important piece of metadata about an image; units are essential for interpreting images as the results of physical measurements / simulations.

Examples of images made of quantities with different units:

As a rough proposal, I would suggest that this metadata look something like this:

{
"units" : "W m-2",
"standard_name" : "irradiative flux",
"description" : "nice description of light getting collected",
}

One thing we could do with this metadata is go beyond physical units to express label / segmentation images:

{
"units" : "1",
"standard_name" : "instance_label",
"description" : "separate instances of a single semantic class.",
}

This would solve a problem for viewer software that need to know whether to open an image as a collection of intensities or as a collection of labels (e.g., napari's image layer vs label layer).

I'm not 100% sold on these examples, and I don't know where this metadata should live, but I think we should find some way to express this information. Thoughts?

jni commented 3 months ago

I like this idea. Another category is atomic force microscopy images, where the image intensity represents height in nm. In skan I record this with the awkward value_is_height kwarg — it would be nice to instead have this structured piece of metadata.

I'm not 100% sold on "standard_name": "instance_label", because it's really rather non-standard. Coming from ome/napari-ome-zarr#99, I might suggest adding a "type": "continuous" (intensities), "type": "categorical" (semantic segmentation), and optionally "type": "id" (instance segmentation).

d-v-b commented 3 months ago

I'm not 100% sold on "standard_name": "instance_label", because it's really rather non-standard. Coming from https://github.com/ome/napari-ome-zarr/issues/99, I might suggest adding a "type": "continuous" (intensities), "type": "categorical" (semantic segmentation), and optionally "type": "id" (instance segmentation).

I'm not sure about "type" : "continuous", since quantized measurements like like photon counts are not continuous, but they are definitely quantities, unlike categories / instance IDs. A few other options for type, off the top of my head: "signal" | "label" | "instance", or "quantity" | "category", and we use some additional category-specific metadata to distinguish instance segmentations from semantic segmentations (because you can have them both in the same image, it might be necessary to have per-ID metadata that expresses the interpretation of each ID). I will keep thinking about this!