ome / ngff

Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
https://ngff.openmicroscopy.org
Other
117 stars 38 forks source link

OME-XML equivalent data #27

Open LeeKamentsky opened 3 years ago

LeeKamentsky commented 3 years ago

Hi all, first, thanks for starting this project - we are considering NGFF / Zarr 3.0 for large 3D multichannel datasets of light-sheet microscopy brain data.

I'm wondering if there is any plan to capture OME-XML data in the Zarr attribute hierarchy, in particular the microscopy data such as Instrument. I'd be happy to participate in the discussion and formulation of this extension.

--Lee

joshmoore commented 3 years ago

Hey Lee!

Good to hear from you. Short answer is capturing all the content of the OME-XML model in the Zarr (though likely in JSON rather than XML) is definitely on the roadmap. And it'd be wonderful to have your input & help. Don't know if you've seen it yet under the #ome-ngff tag but there'll be timezone-paired calls next Tuesday if you're interested in getting caught up:

https://forum.image.sc/t/next-call-on-next-gen-bioimaging-data-tools-feb-23/48386/4

All the best, ~Josh

joshmoore commented 3 years ago

So this issue has come up again recently with the interest from the aicsimageio team (cc: @jacksonmaxfield) of starting to consume the METADATA.ome.xml file from bioformats2raw sooner rather than later. ergo it would need to be part of an upcoming ome-ngff spec. @manzt pointed out that such files within a Zarr fileset are currently outside of the data model: the only files which Zarr knows about are .zgroup, .zarray, .zattrs, and chunks.

The discussion talked through a number of options:

1.) Dump METADATA.xml file in root of zarr store. Pros: simple / currently implemented, Cons: Not a part of zarr data-model

.
└── data.zarr/
    ├── .zattrs
    ├── .zgroup
    ├── ...
    └── METADATA.xml

2.) Add METADATA.xml to root .zattrs. Pros: simple / fits zarr's data-model, Cons: Increases the size of .zattrs, might not be desirable if not a common access pattern.

.
└── data.zarr/
    ├── .zattrs # <- METADATA.xml appended here
    ├── .zgroup
    └── ...

3.) Add customOME group to zarr root with XML in attrs. Pros: fit's zarr's data model, doesn't bloat root attrs. Cons: slightly more complicated, writes two files (OME/.zattrs, OME/.zgroup)

.
└── data.zarr/
    ├── OME/
    |  ├── .zgroup
    |  └── .zattrs # <- METADATA.xml appended here
    ├── .zattrs  # Don't touch current .zattrs
    ├── .zgroup
    └── ...

4a.) Add array to zarr root with XML. Pros: scales to handle large files, attributes can be added to the array Cons: encoding issues, writes two files (OME/.zattrs, OME/.zgroup)

├── .zattrs
├── .zgroup
└── OME
    ├── .zarray
    └── .zattrs <-- contains file-level metadata

4b.) Add array to a root zarr group. Pros: same plus other arrays can be in the same location, Cons: even one more file

├── .zattrs
├── .zgroup
└── OME
    ├── .zgroup <-- contains pointers to files
    └── XML
        ├── .zarray
        └── .zattrs <-- contains file-level metadata

5) (outside this repository) Add to the zarr specification a concept of Files (other than the .z* files and chunks) defined to be "a 1-dimensional array without chunking" (see https://github.com/zarr-developers/zarr-specs/issues/112)

(Thanks to @jacksonmaxfield and @manzt for driving the definition of the above.)

Update: to be clear, likely whatever mechanism is chosen here will be used for other File objects: opaque analysis results, FileAnnotations from OMERO, etc.

evamaxfield commented 3 years ago

Wanted to chime in with my opinion copied over from zulip:

Ranked choice of proposed options (most preferred to least preffered):

  1. Option 2
  2. Option 4a
  3. Option 3
  4. Option 4b
  5. Option 1

Haven't placed Option 5 as I assume we will likely discuss it in next zarr devs meeting.

manzt commented 3 years ago

I have the same rankings as Jackson.

With regard to Option 5, I kind of think the store itself captures the idea of a File object in Zarr, it's just a Zarr client is limited as to what keys it will read and write. In that sense, it should be "ok" to add any arbitrary File objects to the store as long as the names (keys) don't conflict with something Zarr will read or write. The question is question Zarr should recognize non-array/group keys or have a formal way of allowing arbitrary non-chunk/metadata objects.

I'll try to jump on the next zarr dev call :)

LeeKamentsky commented 3 years ago

For the BIDS spec one thing that's being discussed is a hierarchy with inheritance. The root might have things that are in common, like the microscopy setup part of the OME-XML (or OME-JSON to be) and sample information. As you went up the tree, to volumes and such, those would get the details of the particular acquisition such as stage position, staining conditions. For BIDS, what we've discussed is it being up to the researcher at what level to put a particular piece of information and an inheritance rule for aggregating everything.

Something like:

|-- .zattrs <- contains attributes common to everything below, e.g. the "Instrument" element
|-- .zgroup
|-- 20210316_70C-1
    |-- .zgroup
    |-- .zattrs
    |-- 20210316_70C-1-R1-YO-CR-GF.zarr
        |-- .zarray
        |-- .zattrs <- Channel info: YO antibody channel 0, CR antibody channel 1, GF antibody channel 2
    |-- 20210322_70C-1-R2-YO-CB-NPY.zarr
    ...

My votes for the above would be 4a, then 2. Offhand I'd expect the computational and space considerations of duplicating and parsing the metadata to be much less than computing on the data itself. It would be computationally inexpensive (but with possible synchronization issues) to download the JSON into a database and compute on it there. 4a has the flaw, though, of the same key in two places, inadvertently with different values.

joshmoore commented 3 years ago

Copying an update from https://github.com/zarr-developers/zarr-specs/issues/112 that the most likely interpretation of additional files will would enable under Option 1.

joshmoore commented 3 years ago

Note: a related conversation is ongoing under "NCZarr - Netcdf Support for Zarr" (https://github.com/zarr-developers/zarr-specs/issues/41), especially relevant regarding situations where we might want to join together two or more specs (here, OME & BIDS; there, xarray & NetCDF)

imagesc-bot commented 3 years ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/issue-with-opening-zarr-inside-napari/56089/12

joshmoore commented 2 years ago

see also: https://github.com/ome/ngff/issues/104