ome / ngff

Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
https://ngff.openmicroscopy.org
Other
110 stars 38 forks source link

Proposing spaces and transforms #94

Open bogovicj opened 2 years ago

bogovicj commented 2 years ago

This is a preliminary proposal and discussion for axes and coordinate transformations. For more details see:

Add discrete axes

Extend the axes specification, adding a new optional field discrete whose values are booleans. Discrete axes should not be interpolated, whereas continuous axes may be interpolated. For example, the channels are usually discrete:

{"name": "c", "type": "channel", "discrete": true },
{"name": "y", "type": "space", "unit": "micrometer"},
{"name": "x", "type": "space", "unit": "micrometer"},

Add spaces

A space is a named list of axes and defines the coordinate system of the data. Two simple examples - a viewer may prefer the physical spatial coordinates of data:

{ 
"name" : "physical-micrometers", 
"axes" : [ 
  {"name": "y", "type": "space", "unit": "micrometer"},
  {"name": "x", "type": "space", "unit": "micrometer"}
]
}

where an algorithm that processes pixels may prefer the discrete pixel grid:

"name" : "pixel-space", 
"axes" : [ 
  {"name": "j", "type": "space", "discrete": true },
  {"name": "i", "type": "space", "discrete": true }
]

array space

This is a nice default that does not hurt much that I can see, and is more concise than alternatives. See my brainstorming here.

Every array / dataset has a default space whose name is the empty string. It's axes have default names dim_i, are discrete, and there exist as many axes as array dimensions. For example, a 3D dataset's array space is

{
"name" : "", 
"axes" : [ 
  {"name": "dim_0", "discrete": true },
  {"name": "dim_1", "discrete": true }
]
}

Array space is shared across all datasets, so if any applications need to differentiate between them, a new discrete space should be explicitly defined with other, unique axis names.

coordinateTransformations

This proposal adds the idea that a coordinateTransformation is a function from one space to another space to the existing coordinateTransformations metadata. coordinateTransformations will now have new fields:

For example, assuming "array space" and the "physical-micrometers" space defined above:

{
  "name" : "pixels-to-micrometers",
  "type" : "scale",
  "scale" : [1.1, 2.2],
  "input_space" : "",
  "output_space": "physical-micrometers"
}

this is equivalent to:

{
  "name" : "pixels-to-micrometers",
  "type" : "scale",
  "scale" : [1.1, 2.2],
  "input_axes" : ["dim_0", "dim_1"],
  "output_axes": ["y", "x"]
}

Providing input_axes and output_axes enables transforming subsets of axes as in this example.

Specific questions

See also

Thanks

This took shape with the help of lots of people. @constantinpape @xulman @tischi @sbesson @axtimwalde @tpietzsch @d-v-b @joshmoore @jbms @thewtex @dzenanz @lassoan @satra @jni organizers and participants of the 2022 Bioimage Hackathon, and even more.

satra commented 2 years ago

@bogovicj - this looks great and will cover a lot of ground.

We considered using the name "view" instead of "space" but giving a name to the raw data array is nice, and that's arguably not a "view"

a "view" in my head is essentially some coordinatetransformation (could be identity) on a space. don't have strong feelings but space sounds good to me.

Is the default, nameless "array space" worthwhile? Are its axis names appropriate?

are there suggestions for what this would be used for?

Where are these space + transform metadata stored in the container? all together in the root or special location? with the metadata for particular datasets?

all spaces could be stored in some root location with the name of the space added as metadata of a dataset. similarly all transforms could be stored in some root location or in an external location.

Are multiple transformations between two spaces, in the same direction allowed? I propose ["not now, but probably later"]

agree, but are useful. since different algorithms are likely to generate different transformations for registering between any two things.

some notes on coordinateTransformations:

some additional pointers from the neuroimaging community if people feel like reading:

jbms commented 2 years ago

@satra I would agree that input and output should be used in the function sense, even if that is "backwards" from the direction of the conversion that will actually be performed. However, one thing to note is that for efficiency a flow field would normally be represented as you describe, where the physical space is the input space and the array voxel space is the output space, but I think it is customary to represent an affine transformation (or separate scale/translation) where the array voxel space is the input space and the physical space is the output space. Presumably that discrepancy can be addressed easily enough by an "invert" option.

@bogovicj My understanding of your proposal is that a "space" essentially does two things: it defines a list of dimensions, with associated names, units, and discrete/continuous flags, and it define a namespace for those dimensions.

However, suppose I define a space as follows:

{
  "name": "physical_xyzt",
  "axes": [
    {"name": "x", "type": "space", "unit": "meter", "discrete": false},
    {"name": "y", "type": "space", "unit": "meter", "discrete": false},
    {"name": "z", "type": "space", "unit": "meter", "discrete": false},
    {"name": "t", "type": "time", "unit": "second", "discrete": false}
  ]
}

Then maybe I have a collection of arrays:

This type of dimension correspondence is easily expressed in the netcdf data model, but the current proposal does not seem to allow that --- instead it would be necessary to define a separate space for each combination of dimensions.

An alternative we could consider is to allow specifying individual dimensions of a space, e.g. "physical.x" to indicate the "x" dimension of the "physical" space. A possible simplification then would be to not define "spaces" at all, but just define individual named dimensions; for disambiguation purposes, these names could be longer than typical for dimension names, e.g. "physical_x", or maybe there could be a naming convention like "physical.x" so that viewers might know to display just "x" as a shorthand.

A separate comment: it seems to me that if SI prefixes are allowed as multipliers on the units, then arbitrary multipliers should also be allowed (this came up in previous discussions about units). This would allow in many cases the coordinate space of an array itself (prior to any transformation) to have meaningful units, e.g. if it uses a post-alignment coordinate space; under the current proposal you are forced to just specify these dimensions as unitless discrete dimensions and always need a separate transform to indicate any units at all.

One last thought regarding the discrete indicator: what if instead discrete was indicated by the lack of a unit?

We could consider also allowing a string description field to be associated with a dimension and/or space, though I'm not sure how useful that would be.

joshmoore commented 2 years ago

This type of dimension correspondence is easily expressed in the netcdf data model, but the current proposal does not seem to allow that

:+1: for compatibility with the NC model as being a nice to have.

bogovicj commented 2 years ago

Thanks @satra and @jbms!

multiple transforms between spaces

since different algorithms are likely to generate different transformations for registering between any two things.

Agree, and this will eventually be important to me as well. If this is a critical use case for you, please consider adding it as a user story here https://github.com/ome/ngff/issues/84

input / output and direction

"what input and output spaces mean when thinking about transforms" - @satra

"I would agree that input and output should be used in the function sense" - @jbms

Yes. Transforms here are in the "forward" direction despide the fact that the "inverse" is used for rendering / interpolating. That some transformations we want are not closed-form-invertible is exactly why we included the "inverse" option in this list of transforms long ago.

I'll make a new issue to discuss what specific types of transforms will be in the next version.

axes and spaces

just define individual named dimensions... these names could be longer than typical for dimension names, e.g. "physical_x"

This is actually what I'm proposing. Specifically that axis names are unique across all all spaces. I agree that dimension correspondence in the way you described is important, and unique axis names gives us that.

For many use cases, short axis names (e.g. "x", "y", "t") are fine and recommended. I expect "Long" axis names (e.g. "fafb-v14.x") will only be necessary when registering between images.

A possible simplification then would be to not define "spaces" at all

While not strictly necessary, I think of "spaces" as a way for a dataset to communicate to downstream applications what axes make sense to be displayed together, and to formalize naming conventions that we'd need otherwise, like "axes with a common prefix go together". I also imagine they could make UIs easier for end users, especially if that naming convention is not respected. For instance, with my user hat on, I'd prefer to see:

than

One last thought regarding the discrete indicator: what if instead discrete was indicated by the lack of a unit?

Interesting idea. I'm imagining "discrete" communicating "don't interpolate across this dimension," and that could might be useful for axes with units, but I may be overthinking. Let's revisit if and when more user stories come in.

allowing a string description field

Nice idea, I'm open to this, let's see what others think of this.

jbms commented 2 years ago

Interesting idea. I'm imagining "discrete" communicating "don't interpolate across this dimension," and that could might be useful for axes with units, but I may be overthinking. Let's revisit if and when more user stories come in.

When displaying a segmentation (specified by a segment label volume), we never want to interpolate, but we may want to say that the segmentation has the same dimensions as a microscopy image volume, that we would want to interpolate. But for that use case, we already don't need any additional information to tell us not to interpolate, because that is already indicated by the fact we are displaying the volume as a segmentation.

imagesc-bot commented 2 years ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/ome-zarr-chunking-questions/66794/38

imagesc-bot commented 1 year ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/ome-ngff-community-call-transforms-and-tables/71792/1

imagesc-bot commented 1 year ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/ashlar-stitching-questions-and-developments/67418/18

m-albert commented 11 months ago

Hi @bogovicj and others, it's been great following all the discussions here and seeing how NGFF is moving towards adopting a powerful framework for coordinate systems and transforms.

I wanted to comment here that in several contexts it came up that it'd be useful to be able to specify transforms for subsets of a dataset, i.e. different transforms for different image coordinates. Here are some examples:

As far as I understand, in the current form of the proposal this is not possible within the same dataset in ways other than using displacement fields.

I've seen that some time ago @bogovicj and @tischi discussed ideas for new transforms here. I especially liked the "coordinate-wise" transform, which is similar to the "ByDimension" transform composed of lower dimensional transforms acting on a subset of their input and output coordinate system's axes.

Considering the many use cases it would help (including some I'm working on 😁), I was wondering whether supporting such transforms is currently being considered/discussed?