ome / ngff

Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
https://ngff.openmicroscopy.org
Other
115 stars 38 forks source link

"standalone" label images #179

Open bogovicj opened 1 year ago

bogovicj commented 1 year ago

In conversations elsewhere @sbesson says:

At the moment, the specification enforces that such data must be stored within a well-defined labels hierarchy but moving forward, I could certainly imagine a relaxation of this constraint.

A typical use case that comes immediately to mind is the one where segmentation / classification is performed against a read-only Zarr dataset e.g. public data and the output of this process needs to be stored as a new dataset. At the moment, the structure which is the most compliant with the spirit of the specification is create an artificial labels// hierarchy under the root even if there is no multiscales image. Assuming we relaxed this constraint to allow label images to be stored at the root of the Zarr dataset, I would argue the image-label metadata would become a critical element to identify what we are dealing with.

I agree that relaxing this constraint could be a good idea.

In my view, the spec currently uses the hierarchy (that labels belong in a child of a multiscales), to communicate that labels are derived from, or correspond to a particular multiscales image. We might consider using coordinate systems to communicate this idea in the future after https://github.com/ome/ngff/pull/138 is merged, and to reference related "raw" image data explicitly, once we decide how to encode references. See https://github.com/ome/ngff/issues/144

Related PRs by @virginiascarlett that started this conversation:

d-v-b commented 1 year ago

+1 to not using hierarchy to express a relationship like "raw data, segmented data". The space of dependencies between datasets is sufficiently big that we should be using metadata to express this, rather than directly nesting images inside each other.

virginiascarlett commented 1 year ago

Yes, sometimes it feels like all these decisions about nesting, hierarchies, and collections are a mere artifact of starting from Zarr. Like, if we were to start from the question, "What should be the fundamental design principle for organizing image data?" we would not necessarily say nested hierarchies. I imagine a more natural fit would be something quite permissive, like (but NOT) the BagIt format, which could essentially mandate two things: a place for data, and a place for metadata.

Regarding label images, it seems like all that's really needed is three things:

  1. Some keyword to indicate that this is a segmentation (or sth else)
  2. Some kind of "source" metadata field, which could be a file path, a URI, or something else
  3. Label correspondences

I spend a lot of my time with the DataCite schema, so I am reminded of a couple of interesting mechanisms there: relatedIdentifier and relationshipType to indicate how two items are related to one another, and relatedMetadataScheme to essentially nest one metadata schema within another. To adapt the latter to OME-NGFF would be a more breaking, but more impactful, change.

I see two options:

  1. A new series of optional JSON objects within multiscales conveying the three pieces of information I listed above, with some more flexibility e.g. relatedItem: foo/bar/my_image, relationshipType: isSourceImage.
  2. Create a mechanism for subschemas. Currently, the OME-NGFF solution to the problem of specific use cases, like segmentation, is optional JSON objects embedded within the main schema. A subtle shift would be to something like relatedMetadataSchema, which abstracts away an entire subschema. This would work equally well for a very minimal subschema, like a tiny "labels" subschema, or something quite large and differently encoded, like an entire OME-XML block. Viewers could simply state they they support certain subschemas, and updates to a particular subschema would not require a refresh of the entire spec.

If you couldn't tell, I am partial to option 2), but I am biased, being more of a librarian than a developer.

will-moore commented 1 year ago

If/when we decide that we want to identify a stand-alone image as a label, we should probably not use image-label key but use a new key like imageLabel, or even just label or labels since the NGFF naming style is camelCase: https://ngff.openmicroscopy.org/latest/#naming-style

imagesc-bot commented 6 months ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/save-a-single-labels-dataset-into-an-ome-zarr/93505/18