ome / omero-cli-zarr

https://pypi.org/project/omero-cli-zarr/
GNU General Public License v2.0
15 stars 10 forks source link

Big images #134

Closed will-moore closed 1 year ago

will-moore commented 1 year ago

Fixes #111

This adds support for exporting big images in a tile-based approach instead of a plane-based approach. As suggested by @joshmoore - this tile-based approach is used for ALL images (instead of using the whole plane for as a single chunk as before).

First we write a full-size array at 0, a tile at a time. Then resize via ome_zarr.dask_utils.resize (see bug-fix at https://github.com/ome/ome-zarr-py/pull/244) to create a pyramid.

Other changes:

The --numpy_cache functionality has been replaced. Previously, if you anticipated connection issues with OMERO, you could use the --numpy_cache option to cache chunks to disk at the same time as writing them to zarr. The problem with this was that if you lost connection and weren't expecting it, then you had nothing cached. Also, caching doubled the disk usage.

Now, if connection fails, you can simply rerun the omero zarr export Image:ID command and we pick-up writing to the existing partially-generated array.

This is nicer as you don't have to delete the partially-exported image as before and it's much faster as you do don't have to copy the cached chunks and re-write to zarr. Exporting the big image below required many re-runs of the export command - maybe 20 - 30 times before completion!

The tile_width and tile_height options that previously only applied to bioformats2raw usage, now also apply to API-based export.

Image exported with this PR is at: https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0048A/9846152.zarr/

I also fixed the Plate export to work with the new export logic and to support the interruption (partial export) of a Plate and the continued export from where you left off.

will-moore commented 1 year ago

@joshmoore Updated as I think you suggested? Removed code duplication so we use tile sizes from rfs.getTileSize() for ALL images now, even if not big images. Maybe not quite as fast for non-big images as previously, but I think speed is not so critical for export, and it certainly means less code to support 1 way of exporting. Also, this adds support for the --cache_numpy behaviour for ALL images (including big images) now.

will-moore commented 1 year ago

Looking to test this PR on the export of https://idr.openmicroscopy.org/webclient/?show=image-9846152, the preferred tile-size from that Image is tile_size_x: 19120, tile_size_y: 27. This seems like a sub-optimal tile size for viewing an OME-NGFF image, so I wonder if instead of using the preferred tile-size from rawPixelsStore, we simply choose a good default, e.g. 512 x 512 (which is what iviewer uses for all Images). This might mean slightly slower export, which is OK, but then you have better NGFF exported.

I'll go ahead and make that change, unless there's any objection?

sbesson commented 1 year ago

I'll go ahead and make that change, unless there's any objection?

I think the tile size given by the tile service is retrieved from the relevant Bio-Formats reader and effectively reflects how the data is stored on disk in its original file formats.

For comparison, bioformats2raw completely ignores this value when converting the data and sets a chunk size of 1024x1024. This is largely in agreement with the proposal you are making here.

will-moore commented 1 year ago

Since we already support --tile_width and --tile_height for bioformats2raw, I've used them for cli export too.

will-moore commented 1 year ago

Running current version against https://idr.openmicroscopy.org/webclient/?show=image-9846152 Loading all tiles for resolution 0 completed after many re-runs, losing connection after ~12 Z planes each time.

Then downsampling failed at this point, during downsample from 1 -> 2:

$ ls -alh ./9846152.zarr
total 4.0K
drwxrwxr-x. 5 wmoore wmoore 68 Feb  1 20:18 .
drwxrwxr-x. 4 wmoore wmoore 53 Feb  1 06:04 ..
drwxrwxr-x. 5 wmoore wmoore 68 Feb  1 12:03 0
drwxrwxr-x. 5 wmoore wmoore 68 Feb  1 20:10 1
drwxrwxr-x. 5 wmoore wmoore 68 Feb  1 20:19 2
-rw-rw-r--. 1 wmoore wmoore 24 Feb  1 06:04 .zgroup

$ find ./9846152.zarr/0 -type f | wc -l
72619
$ find ./9846152.zarr/1 -type f | wc -l
19111
$ find ./9846152.zarr/2 -type f | wc -l
downsample stacktrace ``` Traceback (most recent call last): File "/home/wmoore/miniconda3/envs/omero_zarr_export/bin/omero", line 11, in sys.exit(main()) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/main.py", line 125, in main rv = omero.cli.argv() File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/cli.py", line 1784, in argv cli.invoke(args[1:]) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/cli.py", line 1222, in invoke stop = self.onecmd(line, previous_args) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/cli.py", line 1299, in onecmd self.execute(line, previous_args) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/cli.py", line 1381, in execute args.func(args) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero_zarr/cli.py", line 107, in _wrapper return func(self, *args, **kwargs) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero_zarr/cli.py", line 316, in export image_to_zarr(image, args) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero_zarr/raw_pixels.py", line 36, in image_to_zarr add_image(image, root, tile_width=tile_width, tile_height=tile_height) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero_zarr/raw_pixels.py", line 63, in add_image paths = add_raw_image(image, parent, level_count, tile_width, tile_height) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero_zarr/raw_pixels.py", line 164, in add_raw_image downsample_pyramid_on_disk(parent, paths) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero_zarr/raw_pixels.py", line 188, in downsample_pyramid_on_disk da.to_zarr(arr=output, url=parent.store, component=path) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/array/core.py", line 3694, in to_zarr return arr.store(z, lock=False, compute=compute, return_stored=return_stored) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/array/core.py", line 1767, in store r = store([self], [target], **kwargs) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/array/core.py", line 1235, in store compute_as_if_collection(Array, store_dsk, map_keys, **kwargs) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/base.py", line 341, in compute_as_if_collection return schedule(dsk2, keys, **kwargs) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/threaded.py", line 89, in get results = get_async( File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/local.py", line 511, in get_async raise_exception(exc, tb) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/local.py", line 319, in reraise raise exc File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/local.py", line 224, in execute_task result = _execute_task(task, data) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/core.py", line 119, in _execute_task return func(*(_execute_task(a, cache) for a in args)) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/array/core.py", line 4366, in store_chunk return load_store_chunk(x, out, index, lock, return_stored, False) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/array/core.py", line 4348, in load_store_chunk out[index] = x File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/zarr/core.py", line 1373, in __setitem__ self.set_basic_selection(pure_selection, value, fields=fields) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/zarr/core.py", line 1468, in set_basic_selection return self._set_basic_selection_nd(selection, value, fields=fields) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/zarr/core.py", line 1772, in _set_basic_selection_nd self._set_selection(indexer, value, fields=fields) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/zarr/core.py", line 1800, in _set_selection check_array_shape('value', value, sel_shape) File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/zarr/util.py", line 546, in check_array_shape raise ValueError('parameter {!r}: expected array with shape {!r}, got {!r}' ValueError: parameter 'value': expected array with shape (1, 1, 265, 1024), got (1, 1, 259, 1024) ```
imagesc-bot commented 1 year ago

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/converting-other-idr-images-into-public-zarr/44025/20

will-moore commented 1 year ago

The resize error above was fixed by https://github.com/ome/ome-zarr-py/pull/244/commits/e1fdd124fd3f518c5681e1b5a1aaf420e2e7800d

The branch at this point was used to generate the image at https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0048A/9846152.zarr/

Also updated description with all changes in this PR.

will-moore commented 1 year ago

The pre-commit.ci check is failing with:

              RuntimeError: The Poetry configuration is invalid:
                - [extras.pipfile_deprecated_finder.2] 'pip-shims<=0.3.4' does not match '^[a-zA-Z-_.0-9]+$'

Same error at https://github.com/ome/ome-zarr-py/pull/244

https://pypi.org/project/poetry/ last release was 10th Jan 2023, but that doesn't appear to be the issue since this check has passed after that date above.

I don't see any mention of Poetry itself in this repo's config.

will-moore commented 1 year ago

Googled for https://stackoverflow.com/questions/75269700/pre-commit-fails-to-install-isort-5-11-4-with-error-runtimeerror-the-poetry-co which suggested the fix in previous commit.

will-moore commented 1 year ago

Re logging, I tried using logging but couldn't work out how to actually see the logging statements.

If I do:

import logging
LOGGER = logging.getLogger("omero_zarr.raw_pixels")
...
 LOGGER.warn("t, c, z, chk_x, chk_y %s %s %s %s %s" % (t, c, z, chk_y, chk_x))

I see this e.g:

WARNING:omero_zarr.raw_pixels:t, c, z, chk_x, chk_y 0 2 0 0 0

which is not ideal (I really just want t, c, z, chk_x, chk_y 0 2 0 0 0). If I use LOGGER.info() I see nothing at-all.

joshmoore commented 1 year ago

That comes down to the config. The idea is that someone else can also influence how it appears. If you are doing this in your own code, then use logging.basicConfig and you set the format and the log-level. Just make sure that that isn't in the released code so that the CLI, e.g., can set its preferences.

will-moore commented 1 year ago

Testing the export of a Plate with this PR, I see an issue with the logic for testing whether a chunk has been previously written:

chunk_keys = list(zarray.chunk_store.keys())
separator = zarray.chunk_store.key_separator

chunk_key = separator.join([str(k) for k in key_dims])
print("chunk_key", chunk_key)
print("keys", chunk_keys)
if chunk_key not in chunk_keys:
   ... load from omero...

This is failing to detect existing chunks and prints out...

chunk_key 0/4/0/0
keys ['.zgroup', '1/.zarray', '1/0/0/0', '1/1/0/0', '1/2/0/0', '1/3/0/0', '1/4/0/0', 'L/.zgroup', 'L/9/.zattrs', 'L/9/.zgroup', 'L/9/0/.zattrs', 'L/9/0/.zgroup', 'L/9/0/0/.zarray', 'L/9/0/0/0/0/0', 'L/9/0/0/1/0/0', 'L/9/0/0/2/0/0', 'L/9/0/0/3/0/0', 'L/9/0/0/4/0/0', 'L/9/0/1/.zarray', 'L/9/0/1/0.0.0', 'L/9/0/1/1.0.0', 'L/9/0/1/2.0.0', 'L/9/0/1/3.0.0', 'L/9/0/1/4.0.0', 'L/9/0/2/.zarray', 'L/9/0/2/0.0.0', 'L/9/0/2/1.0.0', 'L/9/0/2/2.0.0', 'L/9/0/2/3.0.0', 'L/9/0/2/4.0.0', 'L/9/0/3/.zarray', 'L/9/0/3/0.0.0', 'L/9/0/3/1.0.0', 'L/9/0/3/2.0.0', 'L/9/0/3/3.0.0', 'L/9/0/3/4.0.0', 'L/9/1/.zattrs', 'L/9/1/.zgroup', 'L/9/1/0/.zarray', 'L/9/1/0/0/0/0', 'L/9/1/0/1/0/0', 'L/9/1/0/2/0/0', 'L/9/1/0/3/0/0', 'L/9/1/0/4/0/0', 'L/9/1/1/.zarray', 'L/9/1/1/0.0.0', 'L/9/1/1/1.0.0', 'L/9/1/1/2.0.0', 'L/9/1/1/3.0.0', 'L/9/1/1/4.0.0', 'L/9/1/2/.zarray', 'L/9/1/2/0.0.0', 'L/9/1/2/1.0.0', 'L/9/1/2/2.0.0', 'L/9/1/2/3.0.0', 'L/9/1/2/4.0.0', 'L/9/1/3/.zarray', 'L/9/1/3/0.0.0', 'L/9/1/3/1.0.0', 'L/9/1/3/2.0.0', 'L/9/1/3/3.0.0', 'L/9/1/3/4.0.0', 'L/9/2/.zattrs', 'L/9/2/.zgroup', 'L/9/2/0/.zarray', 'L/9/2/0/0/0/0', 'L/9/2/0/1/0/0', 'L/9/2/0/2/0/0', 'L/9/2/0/3/0/0', 'L/9/2/0/4/0/0', 'L/9/2/1/.zarray', 'L/9/2/1/0.0.0', 'L/9/2/1/1.0.0', 'L/9/2/1/2.0.0', 'L/9/2/1/3.0.0', 'L/9/2/1/4.0.0', 'L/9/2/2/.zarray', 'L/9/2/2/0.0.0', 'L/9/2/2/1.0.0', 'L/9/2/2/2.0.0', 'L/9/2/2/3.0.0', 'L/9/2/2/4.0.0', 'L/9/2/3/.zarray', 'L/9/2/3/0.0.0', 'L/9/2/3/1.0.0', 'L/9/2/3/2.0.0', 'L/9/2/3/3.0.0', 'L/9/2/3/4.0.0', 'L/9/3/.zattrs', 'L/9/3/.zgroup', 'L/9/3/0/.zarray', 'L/9/3/0/0/0/0', 'L/9/3/0/1/0/0', 'L/9/3/0/2/0/0', 'L/9/3/0/3/0/0', 'L/9/3/0/4/0/0', 'L/9/3/1/.zarray', 'L/9/3/1/0.0.0', 'L/9/3/1/1.0.0', 'L/9/3/1/2.0.0', 'L/9/3/1/3.0.0', 'L/9/3/1/4.0.0', 'L/9/3/2/.zarray', 'L/9/3/2/0.0.0', 'L/9/3/2/1.0.0', 'L/9/3/2/2.0.0', 'L/9/3/2/3.0.0', 'L/9/3/2/4.0.0', 'L/9/3/3/.zarray', 'L/9/3/3/0.0.0', 'L/9/3/3/1.0.0', 'L/9/3/3/2.0.0', 'L/9/3/3/3.0.0', 'L/9/3/3/4.0.0', 'L/9/4/.zgroup', 'L/9/4/0/.zarray']

It looks like a different path separator is being used for the path to the Image (/) and for the Image dims (.). Trying to match this behaviour - and have it behave differently for Images vv Plate - seems too fragile. So I think I'll revert to my previous approach (changed in https://github.com/ome/omero-cli-zarr/pull/134/commits/6af4061fc87cde2df9f3b51581afd949a37f4f06) to check for existing chunks

joshmoore commented 1 year ago

:+1: Merging.

Anything else to roll into an 0.6.0, @will-moore? @dominikl?