Open tischi opened 3 years ago
Good point! I think in my mind, c is not really a dimension here and I would save this as two datasets. But it is also a good argument to support compound types. I had a conversation with @SabineEmbacher last week and we both agree that supporting compound types would (a) be useful, and (b) should be done with an annotation similar to N5 Compressions, such that compounds that can be extracted from byte streams can be registered at class loading time.
I think in my mind, c is not really a dimension here and I would save this as two datasets.
In my mind, these are two datasets as well, but I think there is the argument of having the ability to store all "channels" in one chuck, for loading efficiency. With the current specification I think we would have to put them into the same dimension, isn't it? In other words, data from different datasets cannot be in the same chunk, right?! @joshmoore
storing values long a dimensional axis means that they share a bunch of properties. Technically, they must be of the same type, and depending on how the spec is phrased, we may want to enforce that they also have the same dimensional type and unit. In your concrete example, I speculate that you would like to store intensity in an unsigned integer type (uint16?) and lifetime in a floating point type (float64?). This means they have to be two datasets or you make a compromise, muddling the waters. I think compound types should be the thing here. I need to educate myself about how data comes out of Python. There are a bunch of competing approaches in the Python world. Zarr dtype lists:
https://zarr.readthedocs.io/en/stable/spec/v2.html#data-type-encoding
Numpy Array interface:
https://numpy.org/doc/stable/reference/arrays.interface.html#arrays-interface
Numpy dtype:
https://numpy.org/doc/stable/reference/arrays.dtypes.html
which all seem to describe the same thing but all with a different syntax. None is clear about how variable length data is expressed. The pointer part is moderately obvious |O
or something but I cannot see where the data is.
Technically, they must be of the same type, and depending on how the spec is phrased, we may want to enforce that they also have the same dimensional type and unit.
My gut feeling would be to enforce same type and unit at least for the next version of ome.zarr and tackle compound data types in later releases.
I agree. So for now this would be two datasets.
@axtimwalde @constantinpape @joshmoore
Based on the latest posts of @glyg in https://github.com/ome/ngff/issues/28 I was wondering about the following.
Let's say we have, e.g. a FLIM data set.
I think it could be useful to store it as a 5D data set with these dimensions:
Where, in this case, my feeling is that the
c
dimension is qualitatively different from the other dimensions. Because "moving along the c-axis" will change the unit of the output value (which is not the case for any of the other dimensions).Are you guys having any thoughts on this? I mean, should we treat such dimensions that change the unit of the output value differently than other dimensions?