units for data - Githubissues

d-v-b commented 1 year ago

The spec should express the units of the quantity represented in an image. This is a pretty important piece of metadata about an image; units are essential for interpreting images as the results of physical measurements / simulations.

Examples of images made of quantities with different units:

FLIM images might have intensities that represent time-binned photon counts, and processed FLIM images might have units like tau, which I think is a unitless quantity that represents the inferred exponential decay of fluorescence (someone who does FLIM should correct me if I'm getting something wrong here).
Fluorescent calcium imaging of active cells produces images where the intensities represent time-binned photon flux (or time-binned electrons flux, if you start at the camera chip), but these images are typically processed to produce images that represent % change over baseline, etc.

As a rough proposal, I would suggest that this metadata look something like this:

{
"units" : "W m-2",
"standard_name" : "irradiative flux",
"description" : "nice description of light getting collected",
}

One thing we could do with this metadata is go beyond physical units to express label / segmentation images:

{
"units" : "1",
"standard_name" : "instance_label",
"description" : "separate instances of a single semantic class.",
}

This would solve a problem for viewer software that need to know whether to open an image as a collection of intensities or as a collection of labels (e.g., napari's image layer vs label layer).

I'm not 100% sold on these examples, and I don't know where this metadata should live, but I think we should find some way to express this information. Thoughts?

jni commented 8 months ago

I like this idea. Another category is atomic force microscopy images, where the image intensity represents height in nm. In skan I record this with the awkward value_is_height kwarg — it would be nice to instead have this structured piece of metadata.

I'm not 100% sold on "standard_name": "instance_label", because it's really rather non-standard. Coming from ome/napari-ome-zarr#99, I might suggest adding a "type": "continuous" (intensities), "type": "categorical" (semantic segmentation), and optionally "type": "id" (instance segmentation).

d-v-b commented 8 months ago

I'm not 100% sold on "standard_name": "instance_label", because it's really rather non-standard. Coming from https://github.com/ome/napari-ome-zarr/issues/99, I might suggest adding a "type": "continuous" (intensities), "type": "categorical" (semantic segmentation), and optionally "type": "id" (instance segmentation).

I'm not sure about "type" : "continuous", since quantized measurements like like photon counts are not continuous, but they are definitely quantities, unlike categories / instance IDs. A few other options for type, off the top of my head: "signal" | "label" | "instance", or "quantity" | "category", and we use some additional category-specific metadata to distinguish instance segmentations from semantic segmentations (because you can have them both in the same image, it might be necessary to have per-ID metadata that expresses the interpretation of each ID). I will keep thinking about this!

ome / ngff

units for data #203