ome / ngff

Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
https://ngff.openmicroscopy.org
Other
117 stars 38 forks source link

Data dimensionality and axes metadata #35

Closed constantinpape closed 2 years ago

constantinpape commented 3 years ago

In last weeks meeting the question of data dimensionality came up again (in the morning it was raised by @jni, and I think it came up in the afternoon as well). Currently, the spec demands that all data is 5 dimensional (I think with axis order TCXYZ, but I am not quite sure).

Do we want to lift the restriction and allow data of lower dimensionality? In this case, we would add metadata in multiscales to describe the axes (e.g. "axes": ["x", "y", "z"]).

Note that this is also important for the transformation spec #28, where we need to clarify which axes a transformation applies to.

Independent of the decisions, we should add a field that describes the physical units of the axes, e.g. "units": ["micrometer", "micrometer", "micrometer"].

jni commented 3 years ago

(I think with axis order TCXYZ, but I am not quite sure).

TCZYX. ;)

Do we want to lift the restriction and allow data of lower dimensionality?

Yes please! And in fact the axis names should be whatever, I we should not be limited to subsets of "TCZYX". eg could be ["lat", "lon"] or ["left-right", "superior-inferior", "anterior-posterior"].

joshmoore commented 3 years ago

@jni: what behavior would you expect for an array with no x or y?

tischi commented 3 years ago

Based on how this discussion evolved: https://github.com/ome/ngff/issues/28 I guess the axis names may be part of the specification of the transformation from data space to physical space, is it?

tischi commented 3 years ago

what behavior would you expect for an array with no x or y?

@joshmoore What do you mean by "behavior", maybe "how it would be rendered in a viewer"?

d-v-b commented 3 years ago

@jni: what behavior would you expect for an array with no x or y?

In my opinion, a generic image viewer should have no intrinsic opinion about the particular axis names of the data it displays. If the user has 2D data with axes labelled X and B, then the viewer should display the data (with a default, but overrideable, mapping from data coordinates to viewer coordinates) as an image with one axis labelled "X" and the other axis labelled "B". If the data axis labelled "X" happens to be mapped to a display axis also called "X", then that is just a happy coincidence. A general-purpose data visualization tool should not assign any "meaning" to an axis name like "X" or "T". A more specialized tool might have an opinion about axis names, though.

tischi commented 3 years ago

then the viewer should display the data (with a default, but overrideable, mapping from data coordinates to viewer coordinates)

The way I interpreted the status of our discussion at https://github.com/ome/ngff/issues/28 is that there is no default mapping, but a mapping must be always provided, or did I get this wrong?

d-v-b commented 3 years ago

Ah, sorry for causing confusion (and maybe we are straying away from the original question @joshmoore posed) -- Yes, I have the same interpretation of the discussion in #28. My (confusingly stated) point in the comment above was just that general purpose data visualization tools shouldn't have an opinion / preference for specific axis names in the transform metadata.

constantinpape commented 3 years ago

The way I interpreted the status of our discussion at #28 is that there is no default mapping, but a mapping must be always provided, or did I get this wrong?

I think this is still up for discussion. @axtimwalde made the point that no transformation could just be interpreted as identity transform. And no axes labels would mean that the data stays in pixel space. This has the advantage that it's a non-breaking change.

@joshmoore what do you think about allowing to save also 2d, 3d and 4d data. I think this is the first important decision to drive #28 (and probably also other discussions) forward.

tischi commented 3 years ago

And in fact the axis names should be whatever, I we should not be limited to subsets of "TCZYX". eg could be ["lat", "lon"] or ["left-right", "superior-inferior", "anterior-posterior"].

@jni based on state of the discussion in https://github.com/ome/ngff/issues/28 I wonder now whether your comment is about axis names in data space or in physical space. Currently, I would think we simply have no axis names at all in data space. In physical space I think it is nice to know which axis should be the "x" axis such that the viewer can display the data accordingly. Thus I think this information should be there.

What we could think of, on top of the specification which on is the "x" axis, to have something like optional axis_names metadata:

"axis_names" : { "x" : "anterior-posterior", "y": "dorsal-ventral" }

Would that work for you?

tischi commented 3 years ago

I think this is still up for discussion. @axtimwalde made the point that no transformation could just be interpreted as identity transform. And no axes labels would mean that the data stays in pixel space.

I think I'd prefer that it is required to specify the axes labels, because in practice it makes a big difference whether one displays a 3D data as xyz or xyc 😉 Unless we agree that specifying nothing defaults to axes of "type" : "space" with some default order like xyz.

jni commented 3 years ago

@tischi as mentioned on #28 we do not want to prescribe here where physical axes go on the screen. There is a third space, which is the screen space, and all kinds of transformations can happen between physical/world space and screen space, not least of which is a 3D -> 2D projection.

I also don't think axis label specification should be a requirement, but a strongly encouraged metadata. As mentioned by others, requirement makes the spec not backward-compatible. Indeed, treating channels as spatial by default is fine: most viewers have the ability to separate out channels. (napari notably doesn't :sweat_smile: but we are definitely planning it!)

tischi commented 3 years ago

I also don't think axis label specification should be a requirement, but a strongly encouraged metadata

OK, I guess I could live with "strongly encouraged" 😉

tischi commented 3 years ago

Indeed, treating channels as spatial by default is fine: most viewers have the ability to separate out channels.

@jni I get the point about requirements and backwards compatibility. But, in practice, let's say the vision is to be able to chain a set of napari plugins into an image processing workflow. My feeling is that it may be necessary to require to know which axes are spatial and which axis is the channel axis. What do you think?

joshmoore commented 3 years ago

https://github.com/ome/ngff/issues/35#issuecomment-790949045 @joshmoore what do you think about allowing to save also 2d, 3d and 4d data.

I've been working under the assumption that it would eventually be necessary (cf. the IMS file structure). It certainly has the potential to complicate and possibly slow-down implementations, so I'd just urge balancing how soon its introduced against immediate need.

joshmoore commented 3 years ago

On the topic of XYZ or not necessarily XYZ, I have some concern that not having these takes us outside the realm of OME-* specs and closer to underlying numpy/zarr/n5/etc. specs, which is fine, but is something we should consider. If the axes are named arbitrarily, then quite possibly the axes metadata SHOULD additionally define which are orthogonal to one another and in what right-handed order

cf. (har) http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#coordinate-types

Edit: ah, I see while working through issues that this also came up in https://github.com/ome/ngff/issues/28#issuecomment-791132327

constantinpape commented 3 years ago

I've been working under the assumption that it would eventually be necessary (cf. the IMS file structure). It certainly has the potential to complicate and possibly slow-down implementations, so I'd just urge balancing how soon its introduced against immediate need.

As this is quite a big change and has implications for other parts of the spec, I would argue that this change should be done sooner than later if deemed necessary. For example I am pretty sure that the transformation spec will look different if we decide on fixed 5d or 2d, 3d, 4d, 5d/

On the topic of XYZ or not necessarily XYZ, I have some concern that not having these takes us outside the realm of OME-* specs and closer to underlying numpy/zarr/n5/etc. specs, which is fine, but is something we should consider. If the axes are named arbitrarily, then quite possibly the axes metadata SHOULD additionally define which are orthogonal to one another and in what right-handed order

I personally also think we shouldn't allow for arbitrary axis naming and stick to XYZCT.

joshmoore commented 3 years ago

I personally also think we shouldn't allow for arbitrary axis naming and stick to XYZCT.

To be clear, I can certainly imagine having additional axes. But if there is no traditional X, Y, or Z axes in a given zarray, I don't know if I would consider it an image in the sense that is currently defined in this repository. (If anyone has a counter-example, I'd love to hear it.)

d-v-b commented 3 years ago

Medical imaging often uses anatomical coordinates, which do not involve the letters "X", "Y", or "Z": https://www.slicer.org/wiki/Coordinate_systems

joshmoore commented 3 years ago

@d-v-b: I guess I'm less concerned with naming, that's "just metadata". ;) But in all three you are in a 3D, right-handed coordinate system, right? I guess in my head (forgive me if I'm being biased) the ALS and IJK coordinate systems from slider.org could be equated to XYZ and then one need just provide which system one is under.

For comparison, in the high-content screening case, there are rows and plates but there's additionally metadata to say that the rows are letters and the columns are numbers.

tischi commented 3 years ago

I think the axes metadata part of this issue became quite overlapping with the discussion in this issue: https://github.com/ome/ngff/issues/28, where the last posts were also about the handedness of the coordinate system and how much we want to commit to x, y, and z. Could it therefore make sense to continue this discussion on axes metadata in https://github.com/ome/ngff/issues/28 and here just discuss how many data dimensions we would like to support?

axtimwalde commented 3 years ago

A data format that supports only 5 dimensions is asking to be obsolete within 2 weeks ;).

glyg commented 3 years ago

As a concrete case of more-than-5-D data, a team here is developing polarization microscopy, so each pixel has 7 coordinates: 3 spatial, the 3 components of the polarization vector, and time. Of course you can store the polarization as channels, but it gets tricky to encode a transformation then, as for example a rotation needs to apply to both the spatial and polarization coordinates.

constantinpape commented 3 years ago

Ok, so I think dropping the requirement for 5d is not really controversial, whereas there's still some discussion about the axes labels.

I have been thinking a bit about how to drive the spec forward, and I think it would make most sense to start with a rather small change:

What do you think @joshmoore? I can start working on this.

joshmoore commented 3 years ago

https://github.com/ome/ngff/issues/35#issuecomment-791791122 A data format that supports only 5 dimensions is asking to be obsolete within 2 weeks ;).

I want this on a 👕 :wink:

https://github.com/ome/ngff/issues/35#issuecomment-792235945 7 coordinates: 3 spatial, the 3 components of the polarization vector, and time

How would you optionally encode them?

https://github.com/ome/ngff/issues/35#issuecomment-792250308 What do you think @joshmoore? I can start working on this.

:100:

glyg commented 3 years ago

How would you optionally encode them?

{
    "axes": ["x", "y", "z", "rho", "theta", "phi", "t"],
    "units": ["micrometer", "micrometer", "micrometer", "radians", "radians", "radians"]
}
tischi commented 3 years ago

@glyg That's an interesting use case! As mentioned above, I think this may be quite overlapping with https://github.com/ome/ngff/issues/28 where we discuss how to map from data space (no units, just dimensions) to physical space (e.g. spatial or possibly angles). So maybe it could be useful to look at this issue and maybe re-post your example there.

constantinpape commented 3 years ago

I have proposed some initial changes in #39 to lift the 5d requirement, but otherwise did not change anything w.r.t. the current spec. I will try to summarise the discussion here soon to see how to continue after #39 gets merged.

constantinpape commented 3 years ago

39 now introduces axes as a MUST field in multiscales and allows up to 5 dimensions, with values for axes restricted to x, y, z, c, t. This change will be breaking with 0.1 and in the reviews @joshmoore remarked that it would a good idea to see if any of the potential changes we discussed here would be breaking with the (proposed) 0.2 again.

To summarize, I think we have discussed the following possible changes (relative to 0.2):

As far as I can see none of these changes would be breaking with the 0.2 proposal. Anything I forgot here? Can anybody see issues with 0.2 that would require a breaking change in the future?

k-dominik commented 3 years ago

Hi - adding in a few cents here as well...

When I was reading it, I was thinking about what viewers would like best. I think this issue/discussion should allow a complete newcomer to design a super simple viewer, that enables rudimentary viewing of all data that claims to be ngff. One of the reasons people still go around using pngs, jpgs, tifs and the likes is that they can view them with their system image viewer, by simply drag and drop. Ever tried this with an hdf5 with the de-facto image viewer of the bioimage community - Fiji?! No dice. When the outcome of this discussion here is, we allow arbitrary data with arbitrary axes, then this is as good as doing nothing. No new developer will be able to come up with a viewer that makes sense based on the specification. I think this encourages fragmentation. No one would be able to "understand" the data. With a fixed, limited set of axes in the data/pixel/image/voxel space you could truly have a format that all viewers could support, where looking into the image space will look more or less the same in all. Isn't this one of the goals?

The semantic meaning of the axes and units and the likes can be handled by smarter viewers: depending on the application they might use the transformation (as discussed in #28).

k-dominik commented 3 years ago

Adding to the comment above: I think some axes should have fixed meaning and name: tzyx, the rest could be handled as channels by "naive" consumers, whereas applications, closer to the data can handle those in a specialized way.

joshmoore commented 3 years ago

See the new PR at https://github.com/ome/ngff/pull/46

imagesc-bot commented 3 years ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/next-call-on-next-gen-bioimaging-data-tools-early-september-2021/55333/14

constantinpape commented 3 years ago

To summarize the current state:

I think it's straightforward to also add an optional field units with the same length as axes and this can be done in one of the next versions.

In addition, I can see two more controversial potential changes that lift the restrictions above:

I am personally more in favor of keeping the spec more restrictive, but we need to see if there are some important use-cases that cannot be covered with the current spec. This is also very relevant for the issue of specifying transformations.

constantinpape commented 3 years ago

Note also the proposal by @bogovicj and @axtimwalde here, which introduces a label, type and unit per dimension with a list of objects (=map/dict). This diverges a bit from our current solution of having axes as list. But it would be easy to have an equivalent solution using 3 lists, e.g. axes, axes_label and unit.

d-v-b commented 3 years ago

Once you've specified a unit (assuming it's an SI unit), you have basically already specified the axis type, no? So it seems like axis_type is unnecessary (and potentially confusing, if someone accidentally does something like {axis_type : time, unit: nm}

bogovicj commented 3 years ago

The below was discussed in the ngff meeting on 01 Sept 2020

A counter example might be channels acquired at different wavelengths (physical unit), which clashes with spatial domain. Ideas:

tischi commented 3 years ago

Maybe the word channel is anyway a bit misleading? Maybe setup like in the BDV file format is more appropriate. For example, we sometimes acquire the exact same fluorescence "channel" in terms of emission wavelengths, but with a couple of different exposure times to accommodate for different sample brightness. Another example is to acquire the same emission wavelength but with different exposure wavelengths for some of the ratiometric sensor fluorophores. Thus associating "channel" very strictly with the emission wavelength band is maybe too limiting?

constantinpape commented 3 years ago

Follow up from last week's ngff meeting: there was fairly broad consensus that the axes label should be decoupled from the semantic meaning and in consequence a new field for the "semantic" axes type (time, space, channel (or similar, see comment by @tischi above). In addition, we want to add unit, which has some relation to type (e.g. type: time, unit: meter doesn't make sense, but there is not a strict one-to-one correspondence as @bogovicj pointed out above). There was some additional discussions about allowing more than 5 dimensions and adding more axes types. My personal preference would be to not include these changes now, but rather make sure that the current changes allow extensibility to allow work on this in later versions.

I will start to work on spec v0.4 now and begin by making a PR for the changes laid out above; I will implement the solution that seems best to my judgment and try to lay out all discussion points I can see in the PR. We will announce once the PR is ready to be discussed on github and on image.sc.

thewtex commented 3 years ago

Thus associating "channel" very strictly with the emission wavelength band is maybe too limiting?

component could also be considered -- it is semantically more general but has the same non-space-time association, and it also starts with a c :-)

unidesigner commented 3 years ago

HI @constantinpape et al. Just wanted to make you aware of some of discussion around axes metadata in this neuroglancer issue. It'd be good to know how some of the discussions therein could be fed into the discussion/proposal process for the ome-ngff specs on axes metadata.

satra commented 3 years ago

as a slight aside: regarding units as text we have found this text representation quite useful: https://people.csail.mit.edu/jaffer/MIXF/CMIXF-12 and we adopted this in the BIDS standard (https://bids-specification.readthedocs.io/en/stable/99-appendices/05-units.html). here is a python library to support parsing: https://github.com/sensein/cmixf

constantinpape commented 3 years ago

I have started to put something together for the new axes metadata based on the discussions here in #57. I am now working on transformations and will start a broader call for feedback once both proposals are done (given that these are linked), but feel free to comment on the axes metadata proposal already.

constantinpape commented 2 years ago

This is now implemented with v0.4 :).