Adapt the CF conventions data model

ome / ngff

Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.

https://ngff.openmicroscopy.org

Other

110 stars 38 forks source link

Adapt the CF conventions data model #174

Open d-v-b opened 1 year ago

d-v-b commented 1 year ago

I've been reading the NetCDF Climate and Forecast metadata conventions recently. I highly recommend this document to anyone interested in what a mature, battle-tested metadata model for n-dimensional array data looks like. In terms of collaborative science, climate / geosciences are far ahead of bioimaging. Climate science is bigger and has more concentrated resources than bioimaging, but they have the same core praxis as bioimaging -- they take fancy pictures of interesting things. We would be foolish not to use their tools, to the extent that those tools apply to our problems, and are not cumbersome to work with.

In particular, I think we should seriously consider a few key points:

in the CF conventions, the coordinates of a dataset (i.e., the location in space / time of each sample) are represented explicitly as a collection of arrays. Contrast this with the typical bioimaging approach of representing a dataset's coordinates implicitly as a transformation that takes an array index (a tuple of integers) and maps it to a point in space. By expressing the coordinates explicitly, you avoid the need to define a bestiary of transformation types in your metadata specification -- since every possible transformation produces a grid, you can just store the grid directly. One downside to this is the demand for more storage, but regular grids are extremely compressible so I think this can be solved technically.
the CF conventions define sane approaches to dealing with different axis types (e.g., discrete axes)
the CF conventions is deliberately abstract w.r.t. specific file formats. By contrast, the current ome-ngff spec is deeply coupled to the zarr format, which means it's not possible to implement ome-ngff in hdf5, or other chunked formats.
the CF conventions support more metadata that would be nice to have -- you can describe the history of a dataset, reference timepoints to a calendar year, describe the exact time when each sample was acquired (not possible with a transform-based approach), etc. It's a very rich metadata model.

Using the data model expressed in the CF conventions would be a big change, but I think it would be absolutely beneficial.

I strongly encourage people to read the CF conventions and try to identify parts that don't apply to bioimaging. In my experience, there are vanishingly few. One key example is the CF assumption that all data represent territory on the earth, and thus there's special treatment for dimensions like "latitude" and "longitude". While strictly true that nearly all microscopy data is acquired on earth, we don't really care about the orientation of microscopy images relative to latitude and longitudinal axes. Because of things like these, an adaptation to bioimaging would be a bit of effort, but the payoff would be immense, in my opinion.

I'm curious to hear what other people think about this proposal. If there's enthusiasm for it, then perhaps we can move forward with an adaptation (called "OME conventions"?).

imagesc-bot commented 1 year ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/adapt-the-cf-conventions-for-ome-ngff/78463/1

will-moore commented 1 year ago

Thanks for the proposal. It certainly makes sense not to reinvent the wheel!

I find it quite hard to understand the CF conventions in terms of what the data actually looks like. That might be because I'm not familiar with netCDF file format/API. That document seems to be more like guidelines than an actual data model?

So it's hard to know how big a change this would be for us to adopt.

Are you proposing similar guidelines for "OME conventions", such as "history": "We recommend that each line begin with a timestamp indicating the date and time of day that the program was executed" that we can add to our existing model, or is the idea to start with their model and remove lat/long etc?

Is the key attraction to this proposal to ease the burden on designing a file format (re-use existing expertise) or to actually use a format that is compatible with existing software tools? So that tools which work with climate data can also work with OME-NGFF?

Explicit storing of coordinates: Does this mean that for every pixel, instead of a single value pixel intensity, you have e.g. pixel intensity, x, y, z, c, t? So the data volume is potentially 6 times bigger? That would appear to be a big drawback, even if it's compressed. Also makes operations like generating a pyramid quite a bit more involved. It seems like this has been adopted for lat/long data to deal with the issue of non-regular "reduced horizontal" grids? This isn't an issue in the same way with biological images.

I'm all for using what we can from the CF conventions, but it feels like full adoption of their data model (based on netCDF) might be more cost than benefit, unless I'm not fully understanding the proposal?

virginiascarlett commented 1 year ago

I'm hardly a stakeholder here, but I also see some nice concepts in the CF conventions. Discrete axes could be an elegant way to handle channels, and the ability to store multiple grids is very intuitive. One thing to watch out for is that CF does rely on a somewhat unwieldy controlled vocabulary. I've heard that when netCDF first came out, they had several hundred standard names, but now they have upwards of 4,000, with a digest announcing the new standard names each quarter. This is something to consider.

FWIW, I found this document a bit easier to read, though it's still not exactly leisure reading. --> https://gmd.copernicus.org/articles/10/4619/2017/ Figures 5, 6, and 7 are simple but really clarifying, when taken with the text.

I also found pages 15-20 of this document helpful for understanding the three core objects of netCDF: dimensions, variables, and attributes. --> https://www.aoml.noaa.gov/ftp/hrd/dodge/netcdf.pdf

joshmoore commented 1 year ago

A few quick thoughts from my side:

:100: for learning as much from cfconventions as possible.
in discussing with those in the geospatial community (specifically about multiscales e.g. as adopted by geozarr) I was given warning that the cost of trying to propose changes to cfconventions itself would likely be significant.
I will note the 20 year revision history of cfconventions. Along the lines of @virginiascarlett's "not leisure reading" a trick will finding ways to extract self-contained sections that are of the size that we can actually implement them sensibly across all of our target platforms (and not die trying).

mkitti commented 1 year ago

CF may be missing a concept of source-sample-detector coordinate system common in microscopy and bioimaging: https://manual.nexusformat.org/design.html#the-nexus-coordinate-system

Other concepts from Nexus include:

Diffraction gratings: https://manual.nexusformat.org/classes/base_classes/NXgrating.html#nxgrating
Monochromators: https://manual.nexusformat.org/classes/base_classes/NXmonochromator.html#nxmonochromator
Cylindrical geometry: https://manual.nexusformat.org/classes/base_classes/NXcylindrical_geometry.html#nxcylindrical-geometry
Transformations: https://manual.nexusformat.org/classes/base_classes/NXtransformations.html#nxtransformations

d-v-b commented 1 year ago

@will-moore asked a lot of great questions and I'd like to answer them all:

I find it quite hard to understand the CF conventions in terms of what the data actually looks like. That might be because I'm not familiar with netCDF file format/API. That document seems to be more like guidelines than an actual data model?

In addition to the resources linked bt @virginiascarlett, I found this demo helpful. It's a demo of one implementation of the CF data model in a python library (iris) that tries to hew pretty close the the CF conventions.

So it's hard to know how big a change this would be for us to adopt.

Conceptually, I think it would be a big change. We are used to thinking about data as an array + some metadata. Adopting the CF conventions would thinking about data as an array + some more arrays + some metadata. The upside would be that a) we can unambiguously express the physical embedding of our data, and b) we could use some pre-existing tools (assuming we don't deviate too far from the existing implementations of the CF model.

I want to emphasize that adopting the CF conventions would allow us to unambiguously express the physical embedding of our data. Indeed, transformation matrices alone will never be sufficient to express the physical embedding of our data.

Why is this? Because transformation matrices cannot express irregular coordinate grids, but most microscopy data is actually acquired on an irregular 4D grid. Raster-scanned imaging like confocal or 2p microscopy acquires samples in a temporal sequence, and the raster scanner spends time resetting before acquiring another scan line. There's even more time spent between the last scanline and the first scanline of the next image. sCMOS cameras typically used in widefield imaging have a similar behavior if the rolling shutter mode is used. And there is a similar issue with the Z axis when a z-stack is acquired with a sawtooth waveform.

Two key takeaways: 1) a typical microscopy image is not acquired in one instant, but as a sequence of samples acquired at different times. 2) the sample timepoints are not uniformly spaced (due to resetting time between lines and between frames).

In order to match the reality of imaging, we should have a data model that allows us to assign a timepoint to each sample in an image. Given that those samples are, generally, not acquired in a regular sequence, a transformation matrix cannot express this information. The logical step is to treat the timepoints as data and store them as an array. And thus we have arrived at a key feature of the CF conventions data model.

When imaging fast events, the time that an individual sample was acquired is actually important, but It's possible that many consumers of data don't care when a single timepoint in a microscopy image was acquired. Maybe assigning a timestamp to the entire frame / volume is sufficient for many applications. But we should use metadata that allows us to preserve that information if needed.

Are you proposing similar guidelines for "OME conventions", such as "history": "We recommend that each line begin with a timestamp indicating the date and time of day that the program was executed" that we can add to our existing model, or is the idea to start with their model and remove lat/long etc?

I think I'm proposing that we start with their model and throw out everything that doesn't apply to us, so stuff like latitude and longitude and the special treatment for different types of spherical projections etc. I suspect we won't actually need to add anything, but maybe I'm wrong.

Explicit storing of coordinates: Does this mean that for every pixel, instead of a single value pixel intensity, you have e.g. pixel intensity, x, y, z, c, t? So the data volume is potentially 6 times bigger? That would appear to be a big drawback, even if it's compressed. Also makes operations like generating a pyramid quite a bit more involved. It seems like this has been adopted for lat/long data to deal with the issue of non-regular "reduced horizontal" grids? This isn't an issue in the same way with biological images.

Yes, the data volume is potentially much larger if we store coordinate grids naively. And if it wasn't possible to compress the grids, I would consider this to be a huge drawback. But consider that an array's shape and a homogeneous affine transformation actually specifies a grid exactly. So, we could simply define a compression scheme for zarr arrays that stores a coordinate array as a shape + a homogeneous affine transformation. This is actually better than putting transforms in JSON, because we don't run into JSON encoding issues. The downside is that it becomes a bespoke compression scheme (but likely one with broad appeal, since affine-transformed grids are very popular, albeit typically in the implicit form).

As for creating pyramids, explicit coordinates actually makes the creation of the pyramid better, because you keep track of how the downsampling routine affects the coordinate grid. See the coarsening routines in xarray, for example.

I'm all for using what we can from the CF conventions, but it feels like full adoption of their data model (based on netCDF) might be more cost than benefit, unless I'm not fully understanding the proposal?

I'm curious what the cost would be. The benefit is that we can actually store + express stuff about our data that was previously impossible to express, and there's already an ecosystem of tools that work with the CF model. I can say for sure that integration with xarray would be extremely easy, because xarray doesn't require strict adherence to the CF conventions. Integration with more strict climate tools would be harder, but I don't think that's a goal because we are not climate scientists.

jni commented 1 year ago

Hey @d-v-b

I need to read the netCDF docs that you and others have linked, but wanted to jump in already and push back against one notion:

Indeed, transformation matrices alone will never be sufficient to express the physical embedding of our data.

I was about to say that your raster scan could be captured by an affine transform with a shear in t, but you seem to already know this and say the same thing later in your response when you describe compression of the coordinate grid based on this? So I feel like your two points contradict each other? ie we can compress the grid if and only if it can be expressed as a low-dimensional transformation. And indeed the implementation of the compression scheme would probably need to reinvent the transforms spec that has just been implemented?

So I guess you meant to say that transformation matrices in x y z alone will never be sufficient, but I don't think that the transformation spec under discussion is limited to those? And anyway, a significant proportion of imaging data is on fixed tissues, in which cases we don't need to concern ourselves with such complexity.

So, even though there are conceptual solutions for compressing grid coordinates, I'm still not convinced that that model is superior to a transformation specification model. Both approaches give us a straightforward way to interpolate from the raw data in a target space. In simple cases (ie most data), the transformations model is significantly more human readable, which is a huge advantage. I also think "bestiary of transformation types" is a rather unfair assessment of the situation and doesn't account for the other side of the coin: as a consumer of the data, I would like to know whether I am even capable of dealing with the data. If it's an axis-aligned grid + a transformation, I can work that out trivially. If it's a complete set of coordinates, I have to work to determine whether the grid is even regular or embeds some non-linear transformations. On the visualisation side of things, dealing with arbitrary chained transformations is trivial. Dealing with interpolation of an irregular grid, less so. Storing a grid of coordinates in fact throws out information about the simplicity of the grid.

normanrz commented 1 year ago

I understand the theoretical appeal of explicitly storing coordinates. Practically, I think it will make a lot of current use cases (e.g. visualization, deep learning) harder to implement, which will limit adoption. To be honest, I think it would be a big enough deviation from what OME-Zarr currently is to be its own thing.

Couldn't storing coordinates explicitly be handled by attaching displacement fields to images?

I want to mention that even imaging coordinates are not the source of truth. For example, if you have a volume and section it, there is a transformation that you may want to track and virtually reverse (by aligning).

d-v-b commented 1 year ago

@jni thanks for the corrective comment! yes, if we treat all imaging data as 4D, then we can capture more information via transforms that shear in t and z. But this requires several unpleasant things: first, it forces us to store a 2D imaging dataset as a 4D array, or define some scheme for applying a 4D affine transformation to lower dimensional data (maybe you could do this via named axes?), and second, a shearing transform cannot express any irregularity in temporal sampling (e.g., images acquired with any jitter), or spatially-varying distortions like field curvature in z. In the CF data model, you can express temporally varying sampling and field curvature in z without promoting the source data to 4D -- you just create two 2D coordinate variables, one named "t" and the other named "z", each containing spatially-varying timepoints and z values.

In the case of images acquired with temporal jitter (like the FIB-SEM datasets i process 🙄), it's true that we can't compress timeseries easily. But those timepoints are real information about our experiment, so I want a data format that lets me represent it as one of the coordinates of the data. The storage cost is worth it. So there's no avoiding the need to store this information, once we accept that it's potentially valuable.

I agree that "transform metadata" -> "transformed grid data" is a big leap. But the climate science community is already there! They are using this data model today. It clearly works. Naturally our spec would have to define what coordinates are valid, much as the CF conventions does, and we could be even more strict if we wanted (but as @normanrz notes, we are already in the business of deformation field transformations, so I'm not sure how strict we can really be).

Juan brings up an important point here:

If it's a complete set of coordinates, I have to work to determine whether the grid is even regular or embeds some non-linear transformations. On the visualisation side of things, dealing with arbitrary chained transformations is trivial. Dealing with interpolation of an irregular grid, less so. Storing a grid of coordinates in fact throws out information about the simplicity of the grid.

We should look at how the netcdf ecosystem handles this. In theory, someone could store coordinate arrays that are random numbers (this is contra the CF convention, but it's something that could happen), so I'm guessing that applications do sanity checks on the coordinate arrays before visualization. I disagree that a materialized grid is any less informative than a transform + shape, though, since they can be losslessly converted back and forth.

I understand the theoretical appeal of explicitly storing coordinates. Practically, I think it will make a lot of current use cases (e.g. visualization, deep learning) harder to implement, which will limit adoption. To be honest, I think it would be a big enough deviation from what OME-Zarr currently is to be its own thing.

I'm not convinced that explicit coordinates makes deep learning particularly difficult (climate scientists do use deep learning after all), but it's definitely true that current bioimaging visualization libraries expect data + transforms, not data + coordinate data, and it would be a chore to change this. If visualization applications need transformation matrices, they can just infer them from the coordinate grids using whatever cheap hacks they want. All programs capable of handling ome-ngff data can open zarr arrays, which is where the coordinates are stored, so at a minimum all that would change for these applications is that they would be inferring a transform from a coordinate array instead of JSON metadata (which is limited insofar as it can only represent float64s, whereas putting coordinates in an array gives us the full suite of numeric types).

I would love a world in which we could re-use some of the existing climate science tooling to make this transition, but you might be right that's it's just too much work. Which would be pretty sad -- another community of scientists / software engineers develops a powerful data model that we know would work for our data, and they develop an entire ecosystem of tools around this model, and in response we say "looks like too much work" and go back to the 10th iteration of the "should the affine transform metadata be an array of arrays or a flat array, and should it be C or F ordered" debate 😆. I think this outcome would be perfectly rational -- it might actually be too much work, but this would reflect very poorly on us as a community.

d-v-b commented 1 year ago

I want to mention that even imaging coordinates are not the source of truth. For example, if you have a volume and section it, there is a transformation that you may want to track and virtually reverse (by aligning).

The CF conventions handles this directly: you can define multidimensional coordinate variables that express the transformation from the imaging coordinate system to some arbitrary coordinate system. climate scientists need this abstraction because cameras are flat, with rows and columns of pixels, but the earth is a sphere, with latitude and longitudinal coordinates. multidimensional coordinate variables exist to express this mapping.

jni commented 1 year ago

I disagree that a materialized grid is any less informative than a transform + shape, though, since they can be losslessly converted back and forth.

How exactly do you convert losslessly back from a 2D (or nD) grid of coordinates to a simple transform? If I have a polynomial transform like shown in this image, I can store the coordinates (ndim x size of data), or I can store those 32 parameters + a tiny bit of metadata (the "bestiary" as you call it 😂). If I give you only the coordinates, you're gonna have a hard time telling me those parameters, especially if I don't also tell you that it's a polynomial transform of a certain degree. I agree that in principle you could succeed, but it's neither easy nor cheap.

You could store the transformation information as part of your compression scheme, but that is simply moving the problem around for little benefit.

On the other hand, if you give me the 32 parameters and metadata, I can trivially and very quickly reconstruct the coordinate arrays of the correct dimensionality.

The more I think about this, the more I'm convinced that storing the transforms is the right model and we can/should build a layer on top that processes ome-ngff (agree 100% with multiple file backend support) into data + coordinate arrays, on the fly at read time. In my opinion this is a much easier path and provides almost equivalent functionality. (?)

I agree that "transform metadata" -> "transformed grid data" is a big leap. But the climate science community is already there! They are using this data model today. It clearly works.

Indeed, and I definitely think it's worth seriously considering it. But it's really important to understand all the caveats that come with that statement:

first, a lot of software and standards practice is wound up in history, and it's not immediately clear whether the standard would be different now if it was feasible to start from scratch. We are in an enviable position with NGFF in that we are starting from scratch, so we can evaluate each decision based not just on prior art (which we should indeed look at carefully!) but also on desired properties.
second, there are properties of the geodata domain that might make their use case different enough from ours that different solutions are called for. They don't have to worry about big nonlinear deformations of the data, for example. (I think, anyway?) *see below
third, there is other prior art, including DICOM and NIfTI, both of which also have long history of being used productively, many software packages depending on them, etc, and both of which store transformations rather than coordinate arrays.
finally, as sketched above, I think we can have our cake and eat it too, by storing the transforms and building an API on top of them that creates coordinate arrays where needed.

*

We should look at how the netcdf ecosystem handles this. In theory, someone could store coordinate arrays that are random numbers (this is contra the CF convention, but it's something that could happen), so I'm guessing that applications do sanity checks on the coordinate arrays before visualization.

In most xarray tutorials, I notice that the grid data is given as orthogonal 1D lat/lon/time arrays. I suspect that a lot of geodata is like this (?) and therefore they don't usually have to store a complete grid of the size of the image data. At this point I'm just guessing based on limited experience though. But anyway, see all the other points above.

d-v-b commented 1 year ago

How exactly do you convert losslessly back from a 2D (or nD) grid of coordinates to a simple transform? If I have a polynomial transform like shown in this image, I can store the coordinates (ndim x size of data), or I can store those 32 parameters + a tiny bit of metadata (the "bestiary" as you call it 😂). If I give you only the coordinates, you're gonna have a hard time telling me those parameters, especially if I don't also tell you that it's a polynomial transform of a certain degree. I agree that in principle you could succeed, but it's neither easy nor cheap.

When I said "simple transform", I was thinking of something like scale -> translation, or affine, for which round-tripping from grid to transform is pretty simple. But I think the polynomial transform with 32 parameters is actually a great example! Because, under the current approach, if you want to use that transform, you will need to standardize this transform in the ngff metadata, and get visualization software to support that transform, which could be extremely difficult if the transform is peculiar / niche (as I suspect is this case for this polynomial transform). Whereas, if we comprehensively adopt the CF conventions, then the process is (in principle) much easier: you give your image data and your grid to a compliant viewer, and it displays your data resampled to the grid, without needing to know anything about the transform that generated the grid.

I should note that the CF conventions allows you to assign attributes to a coordinate variable, and so we could always associate a transform with a grid by putting the transform in the attributes of the coordinate variable that represents the grid, and this representation could even be standardized. This could address any concerns about lossless conversion between transform and grid. But the grid itself would be the primary source of truth for the coordinates.

In fact, if we rely exclusively on transforms for expressing coordinates in ome-ngff, it's impossible to express an important class of image transformations: the results of iterative deformable image alignment like symmetric diffeomorphism or demons. This class of image registration methods generates a deformation field for which there is almost certainly no representation via transforms, and these a standard techniques for deformable alignment in medical imaging. So if we want to support this kind of deformable alignment, we are already committed to handling transformations stored as coordinate grids.

To summarize: I think relying on transforms for encoding coordinates is actually a lot of extra work, because every time you come up with a new variant of a transform, you have to figure out how to encode that transform in metadata. Even for old transforms this is potentially a headache (should the affine matrix be C or F ordered? a flattened or nested list? is it homogenous?). By contrast, storing the coordinates as data is much less ambiguous, and it's much easier for users to extend. Also, the transforms-first approach excludes a whole class of deformable image registration techniques. A remaining drawback of my proposal is the increased storage size, but I'm hoping we can solve that technically.

I'd love to get some perspectives from climate science experts on this topic. Specifically, how common are irregular coordinate grids in climate / geo? How is this handled? cc @rabernat

jni commented 1 year ago

In fact, if we rely exclusively on transforms for expressing coordinates in ome-ngff, it's impossible to express an important class of image transformations: the results of iterative deformable image alignment like symmetric diffeomorphism or demons.

I definitely dispute that: transforms have parameters, this just happens to be a class of transform that has as ndim times many parameters as the image size, and those parameters could be stored in an array just like the image. ie, the transforms-first approach generalizes straightforwardly to the coordinates-first approach, but the reverse is not true.

When I said "simple transform", I was thinking of something like scale -> translation, or affine, for which round-tripping from grid to transform is pretty simple.

It is, but you have to (a) already know that the transform is in that class of transforms using metadata (and if you're going to store the metadata anyway, the rest is just a waste of space) or (b) inspect every coordinate in case your transform turns out to be only approximately affine but actually diffeomorphic.

Even for old transforms this is potentially a headache (should the affine matrix be C or F ordered? a flattened or nested list? is it homogenous?). By contrast, storing the coordinates as data is much less ambiguous, and it's much easier for users to extend.

I think the F/C ordering is also a problem with the coordinates?

The user-space extensions, though, are a big question and imho the biggest argument in favor of the coordinates. However!

(a) I will reiterate what I said above: coordinate-based transforms are completely feasible, as long as transforms can refer to arrays/groups for their parameters. Therefore, the transform-based approach can in fact completely subsume the coordinate approach: "No simple transform in the catalog meets your needs? Use a CoordinateGridTransform! @d-v-b will thank you!" 😜 (b) For the most common transforms, there is a simple solution: transforms will be set by the standard and come from the ome-ngff implementation for your chosen language. It's already expected that various groups will work on "official" OME-NGFF implementations for their own favorite programming language (?), so for the simple transforms, this is not a lot of extra work. The CoordinateGridTransform can always be there for people as a fallback. For complex and user-contributed transforms, we can envision a directory with namespaces, similar to conda/conda-forge. You can register a transform mapped to a function in your language of choice, and while you're working on registering your transform you can share the data with the CoordinateGridTransform as a fallback.

I will definitely admit that (b) for arbitrary/niche transforms is in no way trivial, but the CoordinateGridTransform is always there as a fallback option.

A remaining drawback of my proposal is the increased storage size, but I'm hoping we can solve that technically.

I think it's not only storage that is an issue, but both memory size and performance issues at every step in the pipeline. You need to load those coordinate arrays into memory (again, cheap for axis-aligned grids, not very cheap even for rotations or shears), and then when you're doing the interpolation you have to do lookups into those arrays — this is data intensive and not friendly for e.g. CPU cache access. Chaining transforms, for example if you have CLEM data that you want to register, is more computationally expensive — and memory-intensive, if you want to store both transforms in the chain.

To me, that sounds like a big technical problem, and one that you'd have to solve and optimize repeatedly for different implementations.

I'd love to get some perspectives from climate science experts on this topic. Specifically, how common are irregular coordinate grids in climate / geo? How is this handled?

💯

constantinpape commented 1 year ago

I didn't have time to read all of this, but do I understand it correctly that adopting the CF convention would mean we always need to specify the displacement / transformation for each pixel?

If this is true then I don't see any need to discuss this further. This approach is INCREDIBLY wasteful, and would mean that we would need to more than double (floating point numbers don't compress well ...) the storage size for all data, whereas specifying the transformation parameters for simple transformations (scale, rotation, affine etc.) is independent of the data size. Or am I missing something here?

Also, as @jni already points out, deformable transformations can be expressed given the transformation model from #138 by pointing towards a zarr array that gives the coordinates or coordinate displacements.

d-v-b commented 1 year ago

If this is true then I don't see any need to discuss this further. This approach is INCREDIBLY wasteful, and would mean that we would need to more than double (floating point numbers don't compress well ...) the storage size for all data, whereas specifying the transformation parameters for simple transformations (scale, rotation, affine etc.) is independent of the data size. Or am I missing something here?

The key thing you are missing is that there's a huge community of scientists who have been productively using this data model for n-dimensional datasets for years. I think it would be incredibly wasteful not to learn from them.

regarding storage: as I mentioned earlier in this discussion, transforms that are actually simple are compressible, so there's no wasted storage -- we would simply be pushing the representation of the coordinate grid to a lower level of abstraction. In fact, treating representation of a grid as a compression / encoding issue is a huge improvement over representing numbers in JSON, which doesn't have the richest support for numeric types.

Also, bear in mind that explicit coordinates is just one feature of the CF conventions. I don't think anyone has engaged with the other nice aspects of the CF conventions that I suggested we adopt. I HIGHLY encourage people to actually read the CF conventions with the attitude of "how can we use this for OME-NGFF" instead of getting hung up on the coordinates issue.

rabernat commented 1 year ago

👋 climate scientist here! It's fascinating to read this discussion. In general I agree with @d-v-b that I think these conventions could be broadly useful in other scientific domains that use complex ND-array data.

Let me weigh in on the specific point of contention that has arisen: the use of explicit coordinates vs. implicit coordinates specified by geometric transforms. This is also a point of contention in the geospatial / climate community! It is not an obvious question by any means, given the diversity of sensors and data collection techniques. To make things worse, in weather and climate, we are often detailing with simulation data, where the grids are determined by numerical modeling considerations (rather than by sensor geometry / physics). The geospatial imagery community often clashes with the modeling community in its preference for how to describe coordinates (see e.g. https://github.com/christophenoel/geozarr-spec/issues/3; https://github.com/pydata/xarray/issues/6448).

The key point to emphasize here is that CF conventions support many ways of doing it! They certainly don't require you to explicitly encode the coordinates of every single point in your image. Instead, you can just store images plus metadata about their projection--in geospatial world this is called a CRS: coordinate reference system. If you look at the examples of CF datasets with map-projected data in them, you will see that this is conceptually quite similar to the OME-NGFF coordinate transformations. (See example 5.12 for instance.) Basically, you just store orthogonal "projection" x / y coordinates, plus metadata which explain how to project this data into whatever coordinate reference system you want. Software can handle the transformation.

``` dimensions: y = 18 ; x = 36 ; variables: double x(x) ; x:standard_name = "projection_x_coordinate" ; x:units = "m" ; double y(y) ; y:standard_name = "projection_y_coordinate" ; y:units = "m" ; float temp(y, x) ; temp:long_name = "temperature" ; temp:units = "K" ; temp:grid_mapping = "crs: x y" ; int crs ; crs:grid_mapping_name = "transverse_mercator" ; crs:longitude_of_central_meridian = -2. ; crs:false_easting = 400000. ; crs:false_northing = -100000. ; crs:latitude_of_projection_origin = 49. ; crs:scale_factor_at_central_meridian = 0.9996012717 ; crs:longitude_of_prime_meridian = 0. ; crs:semi_major_axis = 6377563.396 ; crs:inverse_flattening = 299.324964600004 ; crs:projected_coordinate_system_name = "OSGB 1936 / British National Grid" ; crs:geographic_coordinate_system_name = "OSGB 1936" ; crs:horizontal_datum_name = "OSGB_1936" ; crs:reference_ellipsoid_name = "Airy 1830" ; crs:prime_meridian_name = "Greenwich" ; crs:towgs84 = 375., -111., 431., 0., 0., 0., 0. ; crs:crs_wkt = "COMPOUNDCRS ["OSGB 1936 / British National Grid + ODN", PROJCRS ["OSGB 1936 / British National Grid", GEODCRS ["OSGB 1936", DATUM ["OSGB 1936", ELLIPSOID ["Airy 1830", 6377563.396, 299.3249646, LENGTHUNIT[“metre”,1.0]], TOWGS84[375, -111, 431, 0, 0, 0, 0] ], PRIMEM ["Greenwich", 0], UNIT ["degree", 0.0174532925199433] ], CONVERSION["OSGB", METHOD["Transverse Mercator", PARAMETER["False easting", 400000, LENGTHUNIT[“metre”,1.0]], PARAMETER["False northing", -100000, LENGTHUNIT[“metre”,1.0]], PARAMETER["Longitude of natural origin", -2.0, ANGLEUNIT[“degree”,0.0174532925199433]], PARAMETER["Latitude of natural origin", 49.0, ANGLEUNIT[“degree”,0.0174532925199433]], PARAMETER["Longitude of false origin", -7.556, ANGLEUNIT[“degree”,0.0174532925199433]], PARAMETER["Latitude of false origin", 49.766, ANGLEUNIT[“degree”,0.0174532925199433]], PARAMETER["Scale factor at natural origin", 0.9996012717, SCALEUNIT[“Unity”,1.0]], AUTHORITY["EPSG", "27700"]] CS[Cartesian,2], AXIS["easting (X)",east], AXIS["northing (Y)",north], LENGTHUNIT[“metre”, 1.0], ], VERTCRS ["Newlyn", VDATUM ["Ordnance Datum Newlyn", 2005], AUTHORITY ["EPSG", "5701"] CS[vertical,1], AXIS["gravity-related height (H)",up], LENGTHUNIT[“metre”,1.0] ] ]" ; ... ```

If this community choose to adopt CF in some form, it seems inevitable that you would need to incorporate your own domain-specific ways of encoding transformations in metadata. In fact, it looks like you already have this in OME-NGFF. For those who wish to use the more explicit coordinate representations, those would also be available, but not required.

d-v-b commented 1 year ago

thanks @rabernat, this is extremely helpful.

@jni , @constantinpape sorry for muddying the waters with my "all explicit coordinates, all the time" zealotry. I didn't know that a CF coordinate reference system could be defined via a transform. With that blocker removed, I hope we can start taking this proposal seriously.

constantinpape commented 1 year ago

The key thing you are missing is that there's a huge community of scientists who have been productively using this data model for n-dimensional datasets for years. I think it would be incredibly wasteful not to learn from them.

Our goal is to implement a standard that can easily be adopted by tools that are also used by a huge community of scientists, and fundamentally changing how these tools deal with transformations would be much more wasteful (in practice: would not happen).

regarding storage: as I mentioned earlier in this discussion, transforms that are actually simple are compressible, so there's no wasted storage -- we would simply be pushing the representation of the coordinate grid to a lower level of abstraction.

What does this mean in practice? You store the parameters that generate the grid (e.g. affine) and then generate the grid on the fly? But why would you do this? It's still very different than the typical transformation model in bioimaging, and in practice I would assume most tools would just read the transformation parameters (and we're back to our current solution, but with an more indirection).

In fact, treating representation of a grid as a compression / encoding issue is a huge improvement over representing numbers in JSON, which doesn't have the richest support for numeric types.

The current proposal in #138 fixes the json issue by allowing to "outsource" transformation parameters to zarr arrays.

Also, bear in mind that explicit coordinates is just one feature of the CF conventions. I don't think anyone has engaged with the other nice aspects of the CF conventions that I suggested we adopt. I HIGHLY encourage people to actually read the CF conventions with the attitude of "how can we use this for OME-NGFF" instead of getting hung up on the coordinates issue.

I fully agree with that. It would be great to adapt or at least learn from from the CF conventions in a modular fashion; if we can do this without the need to adapt the explicit transformation model wholesale.

P.s.: I started writing this before the last two comments by @d-v-b and @rabernat, where I think we converge on similar paths; I will leave the comment above to finish my thoughts ;). Moving forward:

What are other parts we could adapt from CF conventions?
Would any of this clash with the current proposal to extend transformations in #138 and are there ways to mitigate?

mkitti commented 1 year ago

If this is true then I don't see any need to discuss this further. This approach is INCREDIBLY wasteful, and would mean that we would need to more than double (floating point numbers don't compress well ...) the storage size for all data, whereas specifying the transformation parameters for simple transformations (scale, rotation, affine etc.) is independent of the data size. Or am I missing something here?

Above @d-v-b proposed that the common case of a regular grid is compressible via a custom scheme.

Yes, the data volume is potentially much larger if we store coordinate grids naively. And if it wasn't possible to compress the grids, I would consider this to be a huge drawback. But consider that an array's shape and a homogeneous affine transformation actually specifies a grid exactly. So, we could simply define a compression scheme for zarr arrays that stores a coordinate array as a shape + a homogeneous affine transformation. This is actually better than putting transforms in JSON, because we don't run into JSON encoding issues. The downside is that it becomes a bespoke compression scheme (but likely one with broad appeal, since affine-transformed grids are very popular, albeit typically in the implicit form).

This "compression" scheme is actually quite common in Julia. In Julia, we have a CartesianIndices type, where given a shape, we have a virtual n-dimensional array which is a regular Cartesian grid.

julia> ci = CartesianIndices((1024, 2048))
CartesianIndices((1024, 2048))

julia> typeof(ci)
CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}

julia> typeof(ci) |> supertype
AbstractMatrix{CartesianIndex{2}} (alias for AbstractArray{CartesianIndex{2}, 2})

julia> sizeof(ci) # bytes, the above integers are both of type Int64
16

julia> ci[1000, 300]
CartesianIndex(1000, 300)

julia> ci[4096] # LinearIndexing
CartesianIndex(1024, 4)

julia> collect(ci)
1024×2048 Matrix{CartesianIndex{2}}:
 CartesianIndex(1, 1)     CartesianIndex(1, 2)     …  CartesianIndex(1, 2047)     CartesianIndex(1, 2048)
 CartesianIndex(2, 1)     CartesianIndex(2, 2)        CartesianIndex(2, 2047)     CartesianIndex(2, 2048)
 CartesianIndex(3, 1)     CartesianIndex(3, 2)        CartesianIndex(3, 2047)     CartesianIndex(3, 2048)
 CartesianIndex(4, 1)     CartesianIndex(4, 2)        CartesianIndex(4, 2047)     CartesianIndex(4, 2048)
 CartesianIndex(5, 1)     CartesianIndex(5, 2)        CartesianIndex(5, 2047)     CartesianIndex(5, 2048)
 CartesianIndex(6, 1)     CartesianIndex(6, 2)     …  CartesianIndex(6, 2047)     CartesianIndex(6, 2048)
 CartesianIndex(7, 1)     CartesianIndex(7, 2)        CartesianIndex(7, 2047)     CartesianIndex(7, 2048)
 CartesianIndex(8, 1)     CartesianIndex(8, 2)        CartesianIndex(8, 2047)     CartesianIndex(8, 2048)
 CartesianIndex(9, 1)     CartesianIndex(9, 2)        CartesianIndex(9, 2047)     CartesianIndex(9, 2048)
 CartesianIndex(10, 1)    CartesianIndex(10, 2)       CartesianIndex(10, 2047)    CartesianIndex(10, 2048)
 CartesianIndex(11, 1)    CartesianIndex(11, 2)    …  CartesianIndex(11, 2047)    CartesianIndex(11, 2048)
 CartesianIndex(12, 1)    CartesianIndex(12, 2)       CartesianIndex(12, 2047)    CartesianIndex(12, 2048)
 CartesianIndex(13, 1)    CartesianIndex(13, 2)       CartesianIndex(13, 2047)    CartesianIndex(13, 2048)
 ⋮                                                 ⋱
 CartesianIndex(1013, 1)  CartesianIndex(1013, 2)     CartesianIndex(1013, 2047)  CartesianIndex(1013, 2048)
 CartesianIndex(1014, 1)  CartesianIndex(1014, 2)     CartesianIndex(1014, 2047)  CartesianIndex(1014, 2048)
 CartesianIndex(1015, 1)  CartesianIndex(1015, 2)     CartesianIndex(1015, 2047)  CartesianIndex(1015, 2048)
 CartesianIndex(1016, 1)  CartesianIndex(1016, 2)  …  CartesianIndex(1016, 2047)  CartesianIndex(1016, 2048)
 CartesianIndex(1017, 1)  CartesianIndex(1017, 2)     CartesianIndex(1017, 2047)  CartesianIndex(1017, 2048)
 CartesianIndex(1018, 1)  CartesianIndex(1018, 2)     CartesianIndex(1018, 2047)  CartesianIndex(1018, 2048)
 CartesianIndex(1019, 1)  CartesianIndex(1019, 2)     CartesianIndex(1019, 2047)  CartesianIndex(1019, 2048)
 CartesianIndex(1020, 1)  CartesianIndex(1020, 2)     CartesianIndex(1020, 2047)  CartesianIndex(1020, 2048)
 CartesianIndex(1021, 1)  CartesianIndex(1021, 2)  …  CartesianIndex(1021, 2047)  CartesianIndex(1021, 2048)
 CartesianIndex(1022, 1)  CartesianIndex(1022, 2)     CartesianIndex(1022, 2047)  CartesianIndex(1022, 2048)
 CartesianIndex(1023, 1)  CartesianIndex(1023, 2)     CartesianIndex(1023, 2047)  CartesianIndex(1023, 2048)
 CartesianIndex(1024, 1)  CartesianIndex(1024, 2)     CartesianIndex(1024, 2047)  CartesianIndex(1024, 2048)

d-v-b commented 1 year ago

@constantinpape to your two points:

What are other parts we could adapt from CF conventions?

I have a few ideas:

At a stylistic level, I think we should copy the organization of the CF conventions specification document. That is, we should state the principles of the design, explicitly define terminology, define a data model that is abstracted over programming language implementations, explicitly declare the scope of paths within the container, among others
I would propose we use the dataset provenance metadata fields described here.
I'm sure there's probably other stuff we should copy. The first two bullet points were off the top of my head.

Would any of this clash with the current proposal to extend transformations in https://github.com/ome/ngff/pull/138 and are there ways to mitigate?

yes, there are clashes: the current coordinate transformations proposal doesn't distinguish between dimensions and coordinates. This distinction can simplify the current definition of OME-NGFF axis types, remove the need to flag certain types of axes as non-interpolatable, etc. I don't know of any easy ways to mitigate this besides ripping off the bandaid, embracing the CF coordinate model as much as we can, then see where we can fit our transform specifications.

jni commented 1 year ago

Hi folks!

Instead, you can just store images plus metadata about their projection--in geospatial world this is called a CRS: coordinate reference system. If you look at the examples of CF datasets with map-projected data in them, you will see that this is conceptually quite similar to the OME-NGFF

@d-v-b the bestiary lives! 😂

How many people on this thread are gonna be at SciPy 2023? 🙋 It would probably be a good opportunity to have a BOF bringing together disparate communities to understand everyone's use cases, pain points, and existing solutions and adaptability thereof. I for one didn't quite grok the CRS format @rabernat despite your thorough linking 🙏 and would not at all mind a walkthrough of that text. 😅 Maybe even hammer out some implementations during the sprints. I'm there all week, Sunday inclusive.

d-v-b commented 1 year ago

I won't be there, but I love the idea of getting people together in person to hash this out! If y'all would tolerate a virtual presence, I could "attend" such a meeting via zoom.

imagesc-bot commented 1 year ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/save-irregular-time-coordinates-in-ome-zarr/82138/3

joshmoore commented 1 year ago

I'll be there and definitely happy to discuss. cc: @MSankeys963

jni commented 1 year ago

Oh right! The deadline to submit official BoFs is in two days, thanks for the ping @joshmoore! 😅 @rabernat @jakirkham @thewtex anyone biting or is it just Josh and me (and Davis on the phone 😂)? Tentative title for the BoF: "Where on Earth is my pixel?"

joshmoore commented 1 year ago

Like https://www.wheretheheckismatt.com/ ? :smile: :+1:

normanrz commented 1 year ago

I'd also be happy to join remotely if it happens at a time of day that I'm awake :-)

jni commented 1 year ago

On behalf of the Scipy organizers and the Scipy 2023 BoF committee, I'm happy to inform you that your BoF proposal, "Where on Earth is my Pixel", has been accepted and is scheduled to take place at 1pm on July 13, 223, in Classroom 106."

Find the time in your local time zone here. I'll ask the organisers about setting up an online presence, but worst case I'll sit my phone on a gorilla pod @d-v-b @normanrz. 😃

jni commented 1 year ago

So apparently the BoFs will have a virtual component by default but you need to buy a virtual conference ticket ($100 USD) here. Then you should be able to sign up for the BoF like any attendee. Hopefully that is not too much of a barrier / you can get work to pay for it without too much hassle! 😊

imagesc-bot commented 10 months ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/saving-volumetric-data-with-voxel-size-colormap-annotations/85537/14