Open will-moore opened 2 years ago
Using an alternative approach as suggested by @sbesson and @joshmoore, I tried importing a "metadata only" NGFF plate (where all the binary chunks were missing but all other files were present. This took about 3 hours. Then I sync'd the chunks from s3 "in place" and also manually created symlinks in the OMERO Managed repo (since the data was imported "in-place" but OMERO only creates symlinks at import time for the files it know about. Instead of creating symlinks for each chunk, I created links at the 'row' level (probably could have done this with a single symlink at the plate level?):
e.g.
$ cd /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-8/2022-09/28/09-25-53.378/SQ00015098__2016-06-08T18_43_42-Measurement1.ome.zarr/
$ rm -rf A/
$ ln -s /uod/idr/filesets/idr0125-way-cellpainting/s3-20220927/SQ00015098__2016-06-08T18_43_42-Measurement1.ome.zarr/A A
This allowed me to view images in webclient, set rendering settings and regenerate thumbnails.
However, this process is not suitable for production, so it would be useful to emulate these steps as an option in Bio-Formats and OMERO import options.
e.g. $ omero import --metadata_only --transfer=ln_s --skip=all --depth=100 /path/to/SQ00015098__2016-06-08T18_43_42-Measurement1.ome.zarr/
Thanks for the validation, @will-moore!
As discussed on IDR call today... We are now converting various submissions with custom formats (mostly .pattern files) into NGFF data. We are facing a similar issue to that reported above for idr0125 - too many chunk files are causing import of idr0013 and idr0015 See https://github.com/IDR/idr-metadata/issues/644 and https://github.com/IDR/idr-metadata/issues/645
This current workaround is to create a 'metadata-only' copy of the Plate (no chunks), in-place import it and then replace the Plate in ManagedRepo with a symlink to the original plate (with chunks). This is kinda painful, so it would be nice to be able to directly import the original Plate, but for OMERO to NOT create an OriginalFile for every chunk (ignore chunks until we actually want to load pixel data).
cc @jburel
This should be fixed by https://github.com/ome/ZarrReader/pull/41
Tried downloading from https://cellpainting-gallery.s3.amazonaws.com/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr/SQ00015098__2016-06-08T18_43_42-Measurement1.ome.zarr/ and importing into
idr0125-pilot
server:Summary of import logs:
Start...
Seb: "Looking at the import logs, the issue is that the time to save OriginalFile dramatically increases from a few milliseconds at the beginning of the import"
to ~5s / file currently
Finally...