Closed joshmoore closed 4 years ago
I'm really happy see this. We also used a similar layout for storing pyramids initially to that proposed above and it's fantastic to see this formalized.
I'm curious about the decision to store the base array in the same group as the downsampled levels. We initially did the same, but then moved towards a structure separating the two:
└── example/
├── .zgroup
├── base
│ ├── .zarray
│ ├── .zattrs
│ ├── 0.0.0
│ └── ...etc
└── sub-resolutions/
├── .zgroup
├── .zattrs
├── 01/
│ ├── .zarray
│ ├── 0.0.0
│ └── ...etc
└── 02/
├── .zarray
├── 0.0.0
└── ...etc
as a more general "image" format in zarr. One could expect to find a "base" array and then check for the "sub-resolutions" group to determine if it is a pyramid or not. We thought this structure would allow for other types of data (e.g. segmentation) to be store along side the base array. Again, thanks for the work here in formalizing this!
Thanks, @manzt. Let's see if there are more votes for the deeper representation. It's certainly also what I was originally thinking about in #23. The downside is that one likely needs metadata on all the datasets pointing up and down the hierarchy in order to support detection of the sequence from any scale. It's the other major design layout I can think of. (If anyone has more, those would be very welcome.)
@joshmoore amazing to see this kick off. A couple short comments
If looking for alternate names I'd consider multiresolution
, but multiscale
definitely works for me. We have been using pyramid
at in napari but are thinking of changing (see here https://github.com/napari/napari/issues/1019#issuecomment-595325260 and we can try and go with whatever the majority likes).
One thing that has come up for me and a list of "scales" is that when you have large volumetric timeseries, where you might create a pyramid for each timepoint, some of the axes are unscaled, so you really need to look at the shapes of the arrays to do the right thing. I see that the field is optional but I wonder how much is gained from it (I'm also not opposed though, and would probably find usage from it, but wanted to put out this caveat)
Multiple series per group is probably good flexibility to have, say if you have two independent multiscale datasets you want to put in the same group, it lets the group abstraction remain separate from the multiscale details.
The concept of having base + subresolutions like @manzt proposes is intriguing to me too. Ultimately for visualization purposes I want something like a single list of arrays so I guess I find that representation little simpler, but I can construct that from the later representation if I know the data is multiscale and maybe it is nice to keep that a little separate. I will think on it more, curious what others say.
Glad to see the discussion here. Some thoughts:
Philosophically, I'd like to suggest a few constraints (both of which are satisfied by @joshmoore's proposal, but not by a lot of other existing multiscale image schemas): First, individual images should be portable -- wherever possible, images should not have metadata/attributes that indicates their role in a multiscale representation, so that they can be copied somewhere else and viewed on their own without losing context. Second, no magic dataset names like s0
, s1
, etc. The use of the list of datasets in @joshmoore's group attributes solves this problem.
Personally I'm not a fan of putting the base image at a different level of the hierarchy, since most software i've seen assumes that all the different scale levels will all be elements in the same collection. @manzt you suggest that you adopted this structure in order to facilitate checking for a multiscale representation, but I think this is a job for group metadata, not hierarchy.
For simplicity, I would propose a restriction of one multiscale representation per group. Groups are cheap; if you to represent 2 multiscale images, then make 2 groups. (This doesn't work for multiple multiscale representations that use the same base image, e.g. gaussian and laplacian pyramids). The use of the series
group metadata in @joshmoore's proposal handles this nicely.
A multiscale image is a collection of images. Accordingly, the "multiscaleness" should be a group attribute that lists the images in the collection, which is how @joshmoore does it in the draft prosposal. I would add some dataset-specific information to the group attributes: software that consumes multiscale images needs to know about how the spatial properties of each image, and on cloud storage it can be cumbersome to query each image individually; so for convenience this image metadata could also be in the group attributes that describe the multiscale representation. I think explicitly listing the transform attributes of each image is safer than just listing "scales", as long as the transform attributes of each image are small.
Here's example metadata that implements this concept. The specifics of the "transform attributes" don't really matter -- this could be an affine transform, or something fancier. But I think the basic idea of putting the spatial information of each dataset in the group attributes is solid.
// group attributes
{
“multiscale”: {
“version” : “0.1”,
“datasets” : [“0” : {transform attributes of 0},
“1” : {transform attributes of 1},
“2” : {transform attributes of 2},
“3” : {transform attributes of 3},
"4" : {transform attributes of 4}]
}
// optional stuff
}
}
// example transform attributes of dataset 0
"transform" : {
"offset" : {"X" : 0, "Y" : 0, "Z" : 0},
"scale" : {"X" : 1, "Y" : 1, "Z" : 1},
"units" : {"X" : "nm", "Y" : "nm", "Z" : "nm"}
}
// example spatial attributes of dataset 1
"transform" : {
"offset" : {"X" : .5, "Y" : .5, "Z" : 0},
"scale" : {"X" : 2, "Y" : 2, "Z" : 1},
"units" : {"X" : "nm", "Y" : "nm", "Z" : "nm"}
}
For posterity, I've written about this issue (as it pertains to the data our group works with) here
@sofroniewn The concept of having base + subresolutions like @manzt proposes is intriguing to me too. Ultimately for visualization purposes I want something like a single list of arrays so I guess I find that representation little simpler, but I can construct that from the later representation if I know the data is multiscale and maybe it is nice to keep that a little separate. I will think on it more, curious what others say.
I generally have the same feelings. I'm for the simplicity of the current proposal, and I wonder if my suggestion adds an extra layer of complexity unnecessarily.
@d-v-b For simplicity, I would propose a restriction of one multiscale representation per group. Groups are cheap; if you to represent 2 multiscale images, then make 2 groups.
Wouldn't this require copying the base image into a separate group? Perhaps I'm misunderstanding.
Wouldn't this require copying the base image into a separate group? Perhaps I'm misunderstanding.
The base image would be in the same group with the downscaled versions. So on the file system, it would look like this:
└── example/
├── .zgroup
├── base
│ ├── .zarray
│ ├── .zattrs
│ ├── 0.0.0
│ └── ...etc
├── base_downscaled
│ ├── .zarray
│ ├── .zattrs
│ ├── 0.0.0
│ └── ...etc
...etc
Apologies, I thought you were suggesting that separate groups should be created for different sampling of the same base image (e.g. gaussian and laplacian).
@manzt this is actually my mistake -- I was not thinking at all about the use case where the same base image is used for multiple pyramids, and I agree that copying data is not ideal. I will remove / amend the "one multiscale representation per group" part of my proposal above.
I would add some dataset-specific information to the group attributes: software that consumes multiscale images needs to know about how the spatial properties of each image, and on cloud storage it can be cumbersome to query each image individually;
Adding to the practical importance here: the spatial position of the first pixel is shifted in subresolutions, and the physical spacing between pixels changes also. This must be accounted for during visualization or analysis when other datasets, e.g. other images or segmentations, come into play. If this metadata is readily and independently available for every subresolution, i.e. scale factors do not need to be fetched and computations made, each subresolution image can be used independently, effortlessly, and without computational overhead.
One option is to build on the model implied by storing images in the Xarray project data structures, which has Zarr support. This enables storing metadata such as the position of the first pixel, the spacing between pixels, and identification of the array dimensions, e.g., x
, y
, t
, so that data can be used and passed through processing pipelines and visualization tools. This is helpful because it enables distributed computing via Dask and machine learning [2] via the scikit-learn API. Xarray has broad community adoption, and it is gaining more traction lately. Of course, a model that is compatible with Xarray does not require Xarray to use the data. On the other hand, Xarray coords
have more flexibility than what is required for pixels sampled on a uniform rectilinear grid, and this adds a little complexity to the layout.
Generated from this example, here is what it looks like:
.
├── level_1.zarr
│ ├── rec20160318_191511_232p3_2cm_cont__4097im_1500ms_ML17keV_6
│ │ ├── 0.0.0
│ │ ├── 0.0.1
....
│ │ ├── 9.9.9
│ │ ├── .zarray
│ │ └── .zattrs
│ ├── x
│ │ ├── 0
│ │ ├── .zarray
│ │ └── .zattrs
│ ├── y
│ │ ├── 0
│ │ ├── .zarray
│ │ └── .zattrs
│ ├── z
│ │ ├── 0
│ │ ├── .zarray
│ │ └── .zattrs
│ ├── .zattrs
│ ├── .zgroup
│ └── .zmetadata
├── level_2.zarr
│ ├── rec20160318_191511_232p3_2cm_cont__4097im_1500ms_ML17keV_6
│ │ ├── 0.0.0
│ │ ├── 0.0.1
│ │ ├── 8.9.9
│ │ ├── .zarray
│ │ └── .zattrs
│ ├── x
│ │ ├── 0
│ │ ├── .zarray
│ │ └── .zattrs
│ ├── y
│ │ ├── 0
│ │ ├── .zarray
│ │ └── .zattrs
│ ├── z
│ │ ├── 0
│ │ ├── .zarray
│ │ └── .zattrs
│ ├── .zattrs
│ ├── .zgroup
│ └── .zmetadata
....
├── rec20160318_191511_232p3_2cm_cont__4097im_1500ms_ML17keV_6
│ ├── 0.0.0
│ ├── 0.0.1
...
│ ├── 9.9.9
│ ├── .zarray
│ └── .zattrs
├── x
│ ├── 0
│ ├── .zarray
│ └── .zattrs
├── y
│ ├── 0
│ ├── .zarray
│ └── .zattrs
├── z
│ ├── 0
│ ├── .zarray
│ └── .zattrs
├── .zattrs
├── .zgroup
└── .zmetadata
34 directories, 62359 files
This is the layout generated by xarray.DataSet.to_zarr
. It does not mean that Xarray has to be used to read and write. But, it would mean that Zarr images would be extremely easy to use via xarray. In this case, .zmetadata
is generated on each subresolution so it can be used entirely independently. Due to how Xarray/Zarr handles coords
, x
, y
, are one dimensional arrays. This results in every resolution having its own group.
The metadata looks like this:
{
"metadata": {
".zattrs": {
"_MULTISCALE_LEVELS": [
"",
"level_1.zarr",
"level_2.zarr",
"level_3.zarr",
"level_4.zarr",
"level_5.zarr",
"level_6.zarr"
],
"_SPATIAL_IMAGE": "rec20160318_191511_232p3_2cm_cont__4097im_1500ms_ML17keV_6"
},
```
".zgroup": {
"zarr_format": 2
},
"level_1.zarr/.zattrs": {},
"level_1.zarr/.zgroup": {
"zarr_format": 2
},
"level_1.zarr/rec20160318_191511_232p3_2cm_cont__4097im_1500ms_ML17keV_6/.zarray": {
"chunks": [
64,
64,
64
],
"compressor": {
"blocksize": 0,
"clevel": 5,
"cname": "zstd",
"id": "blosc",
"shuffle": 0
},
"dtype": "|u1",
"fill_value": null,
"filters": null,
"order": "C",
"shape": [
1080,
1280,
1280
],
"zarr_format": 2
},
"level_1.zarr/rec20160318_191511_232p3_2cm_cont__4097im_1500ms_ML17keV_6/.zattrs": {
"_ARRAY_DIMENSIONS": [
"z",
"y",
"x"
],
"direction": [
[
1.0,
0.0,
0.0
],
[
0.0,
1.0,
0.0
],
[
0.0,
0.0,
1.0
]
],
"units": "\u03bcm"
},
"level_1.zarr/x/.zarray": {
"chunks": [
1280
],
"compressor": {
"blocksize": 0,
"clevel": 5,
"cname": "lz4",
"id": "blosc",
"shuffle": 1
},
"dtype": "
Here _MULTISCALE_LEVELS
prevents the need to hardcode the identifiers as suggested by @d-v-b @manzt , but it could be renamed to multiscale
, etc. _ARRAY_DIMENSIONS
is the key that Xarray uses in Zarr files to identify the dims
.
This example is generated with itk, but it could also just as easily be generated with scikit-image, or dask-image via [1] (work in progress) or pyimagej.
Thanks for the link to that example @thewtex! Conforming with xarray.DataSet.to_zarr
where possible seems reasonable to me too.
@constantinpape, @bogovicj, @axtimwalde might also be interested in weighing in.
👍 to flat vs hierarchical representation. Also 👍 to "multiscale".
I also like the constraint that the sub-datasets should be openable as zarr arrays by themselves. I think @thewtex's example satisfies this. Having said this, @thewtex, the xarray model looks too complex to me compared to @joshmoore's proposed spec. It would be great if it could be stripped down to its bare essentials. I agree that it's nice to have the pixel start coordinate handy, but it can also be computed after the fact, so it should be optional I think.
Last thing, which may be out of scope, but might not be: for visualisation, it is sometimes convenient to have the same array with different chunk sizes, e.g. orthogonal planes to all axes for a 3D image. I wonder if the same data/metadata layout standard can be used in these situations.
Oh and @joshmoore
anyone else who's GitHub account I've forgotten for the preliminary discussions
whose. Regret pinging me yet? =P
Great to see so much discussion on this proposal. I didn't have time to read through all of it yet, will try to catch up on the weekend. Fyi, there is a pyramid storage format for n5 used by BigDataViewer and paintera already and I have used this format for large volume representations as well: https://github.com/bigdataviewer/bigdataviewer-core/blob/master/BDV%20N5%20format.md
Great to see this moving on!
In our projects xcube and xcube-viewer image pyramids look like so:
example.levels/
├── 0.zarr # Full-sized array
├── 1.zarr # Level-0 X&Y dimensions divided by 2^1
├── 2.zarr # Level-0 X&Y dimensions divided by 2^2
├── 3.zarr # Level-0 X&Y dimensions divided by 2^3
└── 4.zarr # Etc.
As @joshmoore mentioned, also this goes without special metadata, because
.levels
. .levels
folder. 0.lnk
. In this case it contains the path the original data rather than a copy of the "pyramidized" original dataset. (See also the xcube level
CLI tool that implements this.)
We are looking forward to adopt our code to any commonly agreed-on Zarr "standard".
All-
Here's a quick summary from my side of discussions up to this point. Please send corrections/additions as you see fit. ~Josh
The name "multiscale" seems to be generally acceptable (https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-595332383, https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-595505162)
Support for multiple series per groups seems to be generally acceptable (e.g. https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-595332383).
There are a few explicit votes for no special dataset names (e.g. https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-595359246), but under "New ideas" there was one mention of group naming schemes.
One primary decision point seems to be whether to use a deep or a flat layout:
Here I'd add that if flat is generally accepted as being the simplest approach for getting started, later revisions can always move to something more sophisticated. However, I'm pretty sure at that point we would want metadata not just at a single group level but either on multiple groups or all related datasets (or both).
Another key issue seems to be the scaling information. There are a range of ways that have been shown:
“scales”: [0.5, 0.5, 1, 1, 1],
representation in the current revision (2) of this issue.transform
submap with the keys "offset", "scale", and "units" (https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-595359246).xarray
representation with "direction" and "units" attributes (https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-595432323).gridSpacing
and 'origin' metadata of the form {"gridSpacing": [r_x, r_y, r_z], "origin": [o_x, o_y, o_z]}
downsamplingFactors
at two possible locations (see "Either/or" below).@sofroniewn even asked if they are even useful as they stand (https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-595332383).
To be honest, I punted on this issue knowing that it would be harder to find consensus on it. To my mind, this could even be a second though related extension proposal. My reasoning for that is that it can also be used to represent the relationship between non-multiscale arrays, along the lines of @jni's "multiple chunk sizes" question below, and in the case of BDV, the relationship between the individual timepoints, etc.
My first question then would be: to what extent can the current multiscale proposal be of value without the spatial/scale/transform information?
@d-v-b's New proposed COSEM style from https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-595359246 uses this format:
{"multiscale": [{"name": "base", ...}, {"name" : "L1", ...}]}
Though this would prevent directly consuming the list (e.g. datasets = multiscale["series"][0]["datasets"]
), it might provide a nice balance of extensibility, especially depending on the results of the coordinates/scales/transforms discussion.
@forman showed an example from xcube in https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-596449313 in which group names were used rather than metadata to detect levels:
example.levels/
@forman also showed in https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-596449313 one solution for linking: "The level zero, can also be named 0.lnk. In this case it contains the path the original data rather then a copy of the 'pyramidized' original dataset." This would likely need to be a pre-requisite proposal for this one if we were to follow that route. cc: @alimanfoo
In @d-v-b's COSEM writeup from https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-595359246, there is an example of either/or logic, where could would need to check in more than one location for a given piece of metadata:
- ├── (required) s1 (optional, unless "scales" is not a group level attribute): {"downsamplingFactors": [a, b, c]})
@jni pondered in https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-595505162: "for visualisation, it is sometimes convenient to have the same array with different chunk sizes, e.g. orthogonal planes to all axes for a 3D image. I wonder if the same data/metadata layout standard can be used in these situations."
For the record, I'd currently err on the side of:
[{"name": "base"}]
formatand saving coordinates for a follow-on proposal.
(whew) But opinions, as always, are very welcome.
Further CCs: @saalfeldlab @axtimwalde @tpietzsch
My first question then would be: to what extent can the current multiscale proposal be of value without the spatial/scale/transform information?
I think there's value in the current effort, insofar as standardizing spatial metadata is a separable issue.
For a multiscale image spec, I would propose abstracting over the specific implementation of spatial metadata, e.g. by stipulating that the group multiscale
attribute must contain the same spatial metadata as the collection of array attributes. This assumes as little as possible about the details of the spatial metadata; (but a key assumption I'm making is that duplicating this metadata is not prohibitive)
For the record, I'd currently err on the side of:
- sticking with a flat "multiscale" object
- without links or either/or logic
- and without any special names,
- while likely moving to the more flexible [{"name": "base"}] format
- and saving coordinates for a follow-on proposal.
These all look good to me!
@joshmoore outstanding summary! Thanks for leading this endeavor.
My first question then would be: to what extent can the current multiscale proposal be of value without the spatial/scale/transform information?
To correctly analyze or visualize the data as a multiscale image pyramid, then some spatial/scale/transform information is required.
To:
Spacing / scale and offset / origin and/or transforms are required. Without them, these use cases are either complex and error prone (requiring provenance and computation related to source pixel grids), or not possible. This is why the majority of scientific imaging file formats have at least spacing / scale and offset / origin in some form.
That said, the specs could still be split into two to keep things moving along.
Thanks so much to everyone who is putting detailed thought into this complex issue. Since the discussion has mostly focused on the bioimaging side of things, I'll try to add the xarray & geospatial perspective.
Great discussion. These are my $0.02. Largely, I agree with @joshmoore's summary in https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-596687408. Being able to open each scale level as an individual data set and not part of a pyramid is probably the most important feature and should be part of any standard the comes out of this. With this in mind, the spatial meta data (gridSpacing
and origin
) would need to be stored in the attributes of the individual datasets. This means either
This also does not consider other spatial meta data like rotations. As far as I know, this is a relevant use case for @tpietzsch. If such (arbitrary) transforms should not be considered in the standard, then the question arises of how to combine this with the gridSpacing
and origin
. In such a scenario, I would probably set the origin
to zero with appropriate shifts in downscaled levels as needed, and have the actual offset after the rotation in a global transform. But then again, each scale dataset could not be loaded individually with the correct scaling, rotation, and offset, without explicit knowledge of the pyramid.
Other than that, here are a few comments:
gridSpacing
and origin
for each scale level. I do not have a strong opinion about nomenclature. In Paintera, it is resolution
and offset
, but I am ok with anything reasonable.scales
are defined, they should be fully specified for all of the spatial dimensions, i.e. for 3D or 3D+channel, it would be [[sx, sy, sz], ...]
.I like having the scales
attribute but the scales can be inferred from gridSpacing
, so it is redundant information.[{"name": "s0", "meta1": ...}, {"name": "s1", "meta1": ...}]
over storing multiple arrays like
{"datasets": ["s0", "s1", ...], "meta1": [...]}
I think that a common standard would be a great thing to have and help interaction between the wealth of tools that we are looking at. Paintera does not have a great standard and should update its format if a reasonable standard comes out of this (while maintaining backwards compatibility).
Disclaimer: I will start a position outside academia soon and will not be involved in developing tools in this realm after that. My comment should be regarded as food for thought and to raise concerns that may not have been considered yet. Ultimately, I will not be involved in the decision making of any specifics of this standard.
cc @igorpisarev
Apologies, all, for letting this slip into April. Hopefully everyone's managing these times well enough despite the burden of long spec threads.
I've updated the description to include the new {"name": ...}
syntax and added a new deadline of April 15th for further responses.
A few points on the more recent comments:
In https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-599782137, @hanslovsky suggested "path" rather than "name". I'm on board and will make the change if there are no vetoes, but in the documentation for the metadata (when it appears) we will need to specify whether or not super- and sub-paths are allowed (i.e. ".." and "/").
Then the general topic of the spatial metdata. Both https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-596719361 and https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-596701736 give a :thumbsup: to splitting it out into a separate proposal. It sounds like https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-599782137 is proposing to have origin and gridSpacing (and not scale) on the datasets themselves rather than the group. If there were agreement on that, I'd omit scale
from this proposal and hold off for the next. @d-v-b may be the main opponent of that where in https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-599782137 there's a clear call for duplicating the metadata when/if possible. My major concern with duplication would be keeping the two representations consistent.
As an aside on the geospatial front, https://gis.stackexchange.com/a/255847 helped me understand the GeoTIFF overviews. I don't see anything contradictory.
@rabernat brings up in https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-598456145 non-Euclidean geometries which were also discussed in a recent Zarr call. I'm all for saving that for the follow-up discussion, since it's likely going to be a big one. I'd tend to err on the side of having that external, though perhaps storing (non-standardized?) provenance metadata if possible.
Otherwise, it sounds like the newer comments are generally onboard with the current proposal, but let me know if I've dropped anyone's concerns.
I like path
much more than name
. +1 to that.
My major concern with duplication would be keeping the two representations consistent.
This is a valid concern. Personally I don't like duplicating spatial metadata in the group -- my original conception a long time ago was for the group multiscale metadata to simply list the names/paths to the datasets that comprise the pyramid, with no additional information. But I was reminded by @axtimwalde that accessing metadata from multiple files on cloud stores can be bothersome, and this led to the idea of consolidating the array metadata at the group level. Maybe this can be addressed via the consolidated metadata functionality that has already been added to zarr: https://zarr.readthedocs.io/en/latest/tutorial.html#consolidating-metadata.
For a spec, a way to resolve this could be to specify that, for dataset entry in the group multiscale metadata, a path
field is required but additional fields per dataset are optional. In this regime, programs that attempt to parse the multiscale group may look for consolidated metadata in the group attributes, but they should have a fallback routine that involves parsing the individual attributes of the datasets.
What would we do if cloud storage wouldn't have high latency? I am similarly worried about the consolidated meta-data hack because we may store a lot of meta-data and parsing very long JSON texts isn't particularly fast either, it also doesn't scale very well.
NB: Updated description to use "path".
https://github.com/zarr-developers/zarr-specs/issues/50#issuecomment-607947712
I had never considered a level of consolidation between none and everything, e.g. all arrays (but not groups) within a group are cached within the group metadata. It's an interesting idea, but discussing it here seems dangerous.
If we assume that consolidation is out-of-scope for this issue, I think the only question remaining is if we want optional spatial metadata at the group level, where the array metadata would take precedence. Here, I'd likely also vote for being conservative and not doing that at this point, though we could add it in the future (more easily than we could remove it).
If all agree, I'll add hopefully one last update to remove all mention of "scale" and then start collecting all the spatial ideas that we've tabled in this issue into a new one.
This issue has been migrated to image.sc after the 2020-05-06 community discussion and will be closed. Authors are still encouraged to make use of the specification in their own libraries. As the v3 extension mechanism matures, the specification will be updated and registered as appropriate. Many thanks to everyone who has participated to date. Further feedback and request changes are welcome either on this repository or on image.sc.
This issue has been migrated to an image.sc topic after the 2020-05-06 community discussion. Authors are still encouraged to make use of the specification in their own libraries. As the v3 extension mechanism matures, the specification will be updated and registered as appropriate. Feedback and request changes are welcome either on this repository or on image.sc.
As a first draft of support for the multiscale use-case (https://github.com/zarr-developers/zarr-specs/issues/23), this issue proposes an intermediate nomenclature for describing groups of Zarr arrays which are scaled down versions of one another, e.g.:
This layout was independently developed in a number of implementations and has since been implemented in others, including:
Using a common metadata representation across implementations:
A basic example of the metadata that is added to the containing Zarr group is seen here:
Process
An RFC process for Zarr does not yet exist. Additionally, the v3 spec is a work-in-progress. However, since the implementations listed above as well as others are already being developed, I'd propose that if a consensus can be reached here, this issue should be turned into an .rst file similar to those in the v3 branches (e.g. filters) and used as a temporary spec for defining arrays with the understanding that this a prototype intended to be amended and brought into the general extension mechanism as it develops.
I'd welcome any suggestions/feedback, but especially around:
Deadline for a first round of comments:
March 15, 2020Deadline for a second round of comments: April 15, 2020Detailed example
Color key (according to https://www.ietf.org/rfc/rfc2119.txt):
Color-coded example:
Explanation
Type enumeration:
gaussian
, e.g. skimage.transform.pyramid_gaussianlaplacian
, e.g. skimage.transform.pyramid_laplacianreduce
, e.g. skimage.transform.pyramid_laplacianpick
, e.g. SimpleImageScaler's “top-left” strategySample code
which results in a
.zattrs
file of the form:and the following on-disk layout:
Thanks to @ryan-williams, @jakirkham, @freeman-lab, @petebankhead, @jni, @sofroniewn, @chris-allan, and anyone else whose GitHub account I've forgotten for the preliminary discussions.