tischi / i2k-2020-s3-zarr-workshop

0 stars 1 forks source link

Compliance with ome.zarr specs #7

Open tischi opened 3 years ago

tischi commented 3 years ago

@joshmoore @constantinpape

I don't understand how we comply with the ome.zarr specs: https://github.com/ome/omero-ms-zarr/blob/master/spec.md

We do it like this, isn't it?

id.zarr/setup0/timepoint0/s0/...
id.zarr/setup0/timepoint0/s1/...

Where s* are the resolution scales, are they?

If so, then I don't get how that complies, because to me it reads as if the resolution scales must come directly beneath id.zarr, isn't it?

└── 456.zarr                  # Another image (id=456) converted to Zarr.
    │
    ├── .zgroup               # Each image is a Zarr group, or a folder, of other groups and arrays.
    ├── .zattrs               # Group level attributes are stored in the .zattrs file and include
    │                         #  "multiscales" and "omero" below)
    │
    ├── 0                     # Each multiscale level is stored as a separate Zarr array,
constantinpape commented 3 years ago

As far as I understood the multiscale spec, the group with the metadata and the zarr arrays for the storing the actual volumes can be further down the hierarchy; it can be under the root but it's not mandatory. Can you clarify, @joshmoore?

tischi commented 3 years ago

can be further down the hierarchy

But if that is the case, how would the reader know where it is?

constantinpape commented 3 years ago

But if that is the case, how would the reader know where it is?

That's a good point. I think the straightforward solution would be to have a list with the available datasets and their internal paths. But I am not sure if this is part of the spec yet.

joshmoore commented 3 years ago

Hi guys. Sorry, just getting caught up. Yes, at the moment, you would need to know the path to the group which specifies the "multiscales" metadata. That's the image. In our conversion, it likely would have been cleaner to drop the "setup0" and "timepoint0" groups so that the top-level group was the image container.

constantinpape commented 3 years ago

The reason I kept the "timepoint0/setup0" is to be more consistent with the bdv layout to make it easier to read this file with bdv. But things need to change in the loader in any case, so maybe it's better to just put the multiscale arrays beneath the root.

@tischi Should I regenerate the files with everything beneath root? What do you think?

tischi commented 3 years ago

Should I regenerate the files with everything beneath root? What do you think?

I would say: Yes, please.

tischi commented 3 years ago

Let's first practice with the myosin data set only, just leave the others as is for now. We might have more iterations on this ;-)

constantinpape commented 3 years ago

Let's first practice with the myosin data set only, just leave the others as is for now. We might have more iterations on this ;-)

Good point. ~While we are at it, should I also update the reader name? See #5~ Edit: Sorry I read this before your other comments, gonna recreate and change the reader name to ome.zarr.s3.

constantinpape commented 3 years ago

I reuploaded the data for prospr-myosin and also changed the bdv reader name. In case you rely on github for reading the xml, you need to merge #8 for the update in the xml.

tischi commented 3 years ago

@joshmoore I did a Ctrl + F in your whole S3ZarrReader repo and I could neither find the Strings "multiscales" nor "datasets", indicating that we do not yet parse the highest level .zattrs file yet. Correct?

{
    "multiscales": [
        {
            "datasets": [
                {
                    "path": "s0"
                },
                {
                    "path": "s1"
                },
                {
                    "path": "s2"
                },
                {
                    "path": "s3"
                }
            ],
joshmoore commented 3 years ago

Correct. As per our zoom call, that's one level higher at a level that understands images.