ome / ome2024-ngff-challenge

Project planning and material repository for the 2024 challenge to generate 1 PB of OME-Zarr data
https://pypi.org/project/ome2024-ngff-challenge/
BSD 3-Clause "New" or "Revised" License
11 stars 7 forks source link

Where to place RO Crate Metadata #4

Closed sherwoodf closed 1 week ago

sherwoodf commented 2 months ago

Where to place the ro-crate-metadata.json file

The ro-crate-metadata.json contains descriptions of files relative to its location, within the same directory it is placed in. They can describe individual files and directories. It is not clear to me whether ro-crate-tools would expect / have a standard way of handling multiple ro-crate-metadata.json files within nested directory structures (i’ve not seen an example of this in the spec)

From discussions on 2024/07/02 there were a few different options of where to place this file relative to the zarr. I’m not sure what are the most relevant design ’scoring’ factors to consider e.g. size/effort to create ro-crate-metadata.json or promoting adoption (e.g. by creating a standard that can be used beyond zarr files / by image formats that we would want to convert to zarr)

1. In the directory outside the zarr

root-directory/
| ro-crate-metadata.json
| my-image1.zarr/
|  | .zgroup
|  | .zattrs
|  | 0/
|  | 1/
...
| my-image2.zarr/
|  | .zgroup
|  | .zattrs
|  | 0/
|  | 1/
...

consequences:

2. Inside the zarr, at the multiscale image level

my-image1.zarr/
| ro-crate-metadata.json
| .zgroup
| .zattrs
| 0/
| 1/
…

consequences:

3. Within a zarr group wrapping other zarrs

zarr-group.zarr/
| ro-crate-metadata.json
| 0/
|  | 0/
|  | 1/
...
| 1/
|  | 0/
|  | 0/
…

consequences:

joshmoore commented 1 month ago

It is not clear to me whether ro-crate-tools would expect / have a standard way of handling multiple ro-crate-metadata.json files within nested directory structures (i’ve not seen an example of this in the spec)

I've had similar questions myself. Shall we raise with the RO team? Happy to kick off / join in a slack/issue/drop-in session conversation.

I’m not sure what are the most relevant design ’scoring’ factors to consider

One I think that folks were concerned with is: "can there be exactly one way to find the RO-Crate or will there be different ways for Zarr vs TIFF vs OMERO, etc.?"


4. Embed within the Zarr metadata

my-image1.zarr/
| .zgroup
| .zattrs  # contains ro-crate-metadata.json content
| 0/
| 1/
…

Each .zattrs file (or in Zarr v3, each zarr.json file) can potentially hold a block of RO-Crate metadata. Similar to the "ome" block proposed in RFC-2, the metadata could be wrapped in a "ro-crate" block which could be passed to special handlers for processing.

consequences:


Discussion of 1-4

1

Can describe more than one zarr while reusing parts of the metadata descriptions

But this is always true of RO-Crate. As mentioned in https://github.com/ome/ome2024-ngff-challenge/pull/2#discussion_r1673994891, someone can always wrap what we've done. I'm not sure I would try to second-guess that.

Not intrinsically tied to the zarr format, so can be used to describe other image files using the same metadata structure

:+1: unless we decide, e.g., to embed the RO-Crate within the OME-TIFF in order to avoid the sidecar.

Could also contain non-image files (e.g. parquet files, meshes) “cleanly” (i.e. without putting things in the OME-Zarr container that the spec doesn’t explicitly describe)

Definitely true, however, a bit as above, the community is already putting non-Zarr files into the Zarr.

Would support holding multiple images together with, e.g. transform metadata

I think this is true regardless of where it lives.

2 (and 3 somewhat)

The zarr file effectively becomes an ro-crate, so the metadata is more difficult to ‘lose’

There's that practical aspect but then additionally the somewhat more political "IS-A" relationship, i.e. this one container is adhering to two (if not more) specifications to up the interoperability.

Each RO-Crate contains a single multiscale image, possibly with a single set of labels

I think this is where I would tend towards #3. The first distinction is about "putting the (a) RO-Crate at the root of the Zarr, in which case it could already multiple multiscale images which could all be referenced.

3

Can easily coexist with option 2

:+1:

joshmoore commented 1 month ago

Proposal from Session 1 of today's Challenge call:

joshmoore commented 1 week ago

Closing this for now. At the moment, we only have support in the tool for creating the top-level RO-Crate file, but if anyone needs to create one lower down in their hierarchy, feel free!