ome / omero-cli-zarr

https://pypi.org/project/omero-cli-zarr/
GNU General Public License v2.0
15 stars 10 forks source link

bioformats2raw export: unify Zarr layout and add OMERO metadata #76

Closed sbesson closed 3 years ago

sbesson commented 3 years ago

Follow-up of #75, this PR:

The add_omero_metadata and add_toplevel_metadata are arguably outside the scope of the raw_pixels module and could be moved elsewhere (utils? new module?)

sbesson commented 3 years ago

Initial set of sample files generated with and without --bf and uploaded to a temporary public bucket for comparison

Image ID Dimensions (XYZCT) omero zarr export omero zarr export --bf
13422206 256 x 256 x735 x 3 x 1 https://hms-dbmi.github.io/vizarr?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/omero/13422206.zarr https://hms-dbmi.github.io/vizarr?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/bf/13422206.zarr
3491626 2048 x 2048 x 1 x 5 x 20 https://hms-dbmi.github.io/vizarr?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/omero/3491626.zarr https://hms-dbmi.github.io/vizarr?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/bf/3491626.zarr
8343617 3540 x 4491 x 2977 x 1 x1 https://hms-dbmi.github.io/vizarr?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/omero/8343617.zarr https://hms-dbmi.github.io/vizarr?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/bf/8343617.zarr
13383974 3000 x 3000 x 1 x 3 x1 https://hms-dbmi.github.io/vizarr?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/omero/13383974.zarr https://hms-dbmi.github.io/vizarr?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/bf/13383974.zarr

From the performance perspective

(zarr) [sbesson@pilot-zarr1-dev data]$  time omero zarr --output /data/omero-cli-zarr_76/omero/ export Image:8343617
Previous session expired for public on idr.openmicroscopy.org:4064
Server: [idr.openmicroscopy.org:4064]
Username: [public]
Password:
Created session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Exporting to /data/omero-cli-zarr_76/omero/8343617.zarr (0.3)
Finished.

real    58m52.531s
user    42m19.562s
sys     5m45.421s
(zarr) [sbesson@pilot-zarr1-dev data]$  time omero zarr --output /data/omero-cli-zarr_76/bf/ export --bf Image:8343617
Previous session expired for public on idr.openmicroscopy.org:4064
Server: [idr.openmicroscopy.org:4064]
Username: [public]
Password:
Created session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp5020382743495610090/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.

Image exported to /data/omero-cli-zarr_76/bf/8343617.zarr

real    29m38.845s
user    55m35.191s
sys     2m17.729s
will-moore commented 3 years ago

This worked once I'd updated to the 0.3.0 release of bioformats2raw.

One difference I noticed was in the number of pyramid levels exported is different. For a 512 x 512 image, bf exported 2 levels whereas omero exported 4 levels, down to 64 x 64.

Known difference: --bf exports v0.2 whereas omero export uses v0.3 (with axes)

sbesson commented 3 years ago

This worked once I'd updated to the 0.3.0 release of bioformats2raw.

Thanks. 0.3.0 is absolutely a requirement for this work as this PR uses some of the new options. I'll make this clear in the help.

One difference I noticed was in the number of pyramid levels exported is different. For a 512 x 512 image, bf exported 2 levels whereas omero exported 4 levels, down to 64 x 64.

Yes that's one of the implementation differences which comes from different defaults in the maximal size for the smallest resolution: 96 for omero-cli-zarr https://github.com/ome/omero-cli-zarr/blob/aab16a60444804076225d283804df4fceace16c6/src/omero_zarr/raw_pixels.py#L88 vs 256 fo bioformats2raw - https://github.com/glencoesoftware/bioformats2raw/blob/4114f1ef8340317df67d8940151f3b7a0159a5a3/src/main/java/com/glencoesoftware/bioformats2raw/Converter.java#L105.

--bf exports v0.2 whereas omero export uses v0.3 (with axes)

Yes that's captured as https://github.com/glencoesoftware/bioformats2raw/issues/113

sbesson commented 3 years ago

Another large scale example of usage of this PR. Trying to convert the >1TB lightsheet dataset from McDole et al (https://doi.org/10.17867/10000116):

(zarr) [sbesson@pilot-zarr1-dev ~]$ time omero zarr --output /data/omero-cli-zarr_76/bf/ export --bf Image:4007801 --max_workers 16
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp6140859124054326326/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.

Image exported to /data/omero-cli-zarr_76/bf/4007801.zarr

real    1934m27.700s
user    22426m47.134s
sys     452m8.650s

(zarr) [sbesson@pilot-zarr1-dev ~]$ time aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 cp --recursive /data/omero-cli-zarr_76/bf/4007801.zarr/ s3://omero-cli-zarr_76/bf/4007801.zarr/
...
real    1510m25.233s
user    666m44.302s

So after 2.5 days of processing + S3 upload, the data can be viewed from

https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/bf/4007801.zarr https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/bf/4007802.zarr https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/bf/4007803.zarr https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/bf/4007804.zarr

sbesson commented 3 years ago

From the outstanding points of this discussion, the command-line overhaul has been turned into an issue. And https://github.com/glencoesoftware/bioformats2raw/pull/114 should allow to align the number of resolutions generated with/without --bf in the near future.

Any objections to getting this merged @joshmoore @will-moore ? I would propose a release of the plugin with bioformat2raw 0.3.0 support and start capturing the next items to review as issues.

joshmoore commented 3 years ago

SGTM :+1: