ome / omero-cli-zarr

https://pypi.org/project/omero-cli-zarr/
GNU General Public License v2.0
16 stars 10 forks source link

Error with exporting Plate as ZARR #164

Closed TorecLuik closed 1 month ago

TorecLuik commented 1 month ago

Hey,

We have a OME TIFF plate in OMERO, which uploaded fine through OMERO.insight:

image

But I get errors when I try to export it through omero-cli-zarr (0.5.3), with or without '--bf':

bash-5.1$ omero zarr export --output zarr_test Plate:51 or bash-5.1$ omero zarr export --output zarr_test_bf Plate:51 --bf

Any idea where I should start looking to fix this?

They both give the same error:

Using session for <session>. Expires in : 10080 min. Current group: <group>
Exporting to zarr_test/51.zarr (0.4)
sizes x: 512, y: 512, z: 1, c: 3, t: 1
tile_width: 1024, tile_height: 1024
t, c, z, chk_x, chk_y 0 0 0 0 0
loading Tile...
t, c, z, chk_x, chk_y 0 1 0 0 0
loading Tile...
t, c, z, chk_x, chk_y 0 2 0 0 0
loading Tile...
downsample_pyramid_on_disk /opt/omero/server/zarr_test/51.zarr/A/1/0
Traceback (most recent call last):
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/zarr/core.py", line 202, in _load_metadata_nosync
    meta_bytes = self._store[mkey]
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/zarr/storage.py", line 1120, in __getitem__
    raise KeyError(key)
KeyError: '.zarray'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/omero/server/venv3/bin/omero", line 8, in <module>
    sys.exit(main())
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/omero/main.py", line 125, in main
    rv = omero.cli.argv()
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/omero/cli.py", line 1771, in argv
    cli.invoke(args[1:])
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/omero/cli.py", line 1208, in invoke
    stop = self.onecmd(line, previous_args)
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/omero/cli.py", line 1285, in onecmd
    self.execute(line, previous_args)
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/omero/cli.py", line 1367, in execute
    args.func(args)
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/omero_zarr/cli.py", line 125, in _wrapper
    return func(self, *args, **kwargs)
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/omero_zarr/cli.py", line 336, in export
    plate_to_zarr(plate, args)
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/omero_zarr/raw_pixels.py", line 291, in plate_to_zarr
    add_image(img, field_group)
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/omero_zarr/raw_pixels.py", line 80, in add_image
    paths = add_raw_image(image, parent, level_count, tile_width, tile_height)
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/omero_zarr/raw_pixels.py", line 180, in add_raw_image
    downsample_pyramid_on_disk(parent, paths)
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/omero_zarr/raw_pixels.py", line 199, in downsample_pyramid_on_disk
    dask_image = da.from_zarr(path_to_array)
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/dask/array/core.py", line 3612, in from_zarr
    z = zarr.Array(store, read_only=True, path=component, **kwargs)
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/zarr/core.py", line 170, in __init__
    self._load_metadata()
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/zarr/core.py", line 193, in _load_metadata
    self._load_metadata_nosync()
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/zarr/core.py", line 204, in _load_metadata_nosync
    raise ArrayNotFoundError(self._path) from e
zarr.errors.ArrayNotFoundError: array not found at path %r' '
TorecLuik commented 1 month ago

By the way, the direct execution of bioformats2raw works with no errors:

bash-5.1$ bioformats2raw /\<direct data location without asking OMERO>/NIRHTa+001.ome.tiff zarr

Outputs the following folder:

zarr/A:
1  10  11  12  13  14  15  16  17  18  19  2  20  21  22  23  24  3  4  5  6  7  8  9

zarr/B:
1  10  11  12  13  14  15  16  17  18  19  2  20  21  22  23  24  3  4  5  6  7  8  9

zarr/C:
1  10  11  12  13  14  15  16  17  18  19  2  20  21  22  23  24  3  4  5  6  7  8  9

zarr/D:
1  10  11  12  13  14  15  16  17  18  19  2  20  21  22  23  24  3  4  5  6  7  8  9

zarr/E:
1  10  11  12  13  14  15  16  17  18  19  2  20  21  22  23  24  3  4  5  6  7  8  9

zarr/F:
1  10  11  12  13  14  15  16  17  18  19  2  20  21  22  23  24  3  4  5  6  7  8  9

zarr/G:
1  10  11  12  13  14  15  16  17  18  19  2  20  21  22  23  24  3  4  5  6  7  8  9

zarr/H:
1  10  11  12  13  14  15  16  17  18  19  2  20  21  22  23  24  3  4  5  6  7  8  9

zarr/I:
1  10  11  12  13  14  15  16  17  18  19  2  20  21  22  23  24  3  4  5  6  7  8  9

zarr/J:
1  10  11  12  13  14  15  16  17  18  19  2  20  21  22  23  24  3  4  5  6  7  8  9

zarr/K:
1  10  11  12  13  14  15  16  17  18  19  2  20  21  22  23  24  3  4  5  6  7  8  9

zarr/L:
1  10  11  12  13  14  15  16  17  18  19  2  20  21  22  23  24  3  4  5  6  7  8  9

zarr/M:
1  10  11  12  13  14  15  16  17  18  19  2  20  21  22  23  24  3  4  5  6  7  8  9

zarr/N:
1  10  11  12  13  14  15  16  17  18  19  2  20  21  22  23  24  3  4  5  6  7  8  9

zarr/O:
1  10  11  12  13  14  15  16  17  18  19  2  20  21  22  23  24  3  4  5  6  7  8  9

zarr/OME:
METADATA.ome.xml

zarr/P:
1  10  11  12  13  14  15  16  17  18  19  2  20  21  22  23  24  3  4  5  6  7  8  9
TorecLuik commented 1 month ago

File (if needed, 2GB): https://filesender.surf.nl/?s=download&token=540bb660-6a76-48cd-b482-f3e86dcb20b1

will-moore commented 1 month ago

Off the top of my head, I think that the --export argument is being ignored by the downsampling step. Can you try without the --export argument? You'll get the output directly to e.g. 51.zarr.

TorecLuik commented 1 month ago

@will-moore You mean --output?

bash-5.1$ omero zarr export Plate:51

or

bash-5.1$ omero zarr export Plate:51 --bf

Gives the same error, ending in:

File "/opt/omero/server/venv3/lib64/python3.9/site-packages/zarr/core.py", line 193, in _load_metadata
    self._load_metadata_nosync()
  File "/opt/omero/server/venv3/lib64/python3.9/site-packages/zarr/core.py", line 204, in _load_metadata_nosync
    raise ArrayNotFoundError(self._path) from e
zarr.errors.ArrayNotFoundError: array not found at path %r' ''

and only

bash-5.1$ ls 51.zarr/
1  A
will-moore commented 1 month ago

Can you update to omero-cli-zarr 0.5.5. That version seems to be working for me (haven't run the export to completion yet but it seems to be running fine)...

TorecLuik commented 1 month ago

Ok yes that seems to work, thanks.

That was easy enough of a fix ;)

Does the --bf option work for screens? Because it doesn't seem to change anything in the output.

TorecLuik commented 1 month ago

bioformats2raw time:

real 1m57.669s user 2m46.511s sys 0m13.364s

omero zarr export (--bf) time: real 21m17.244s user 3m46.389s sys 0m25.778s

Not great. But at least better than exporting each individual image from the Plate, with 3 seconds per image =P

I will time without --bf too, but I'm sure its the exact same as with --bf:

real 21m17.867s user 3m48.389s sys 0m25.612s

TorecLuik commented 1 month ago

(just to answer myself, no --bf is only in the Image case: https://github.com/ome/omero-cli-zarr/blob/9b73dd9dbdeb36dbdb675b02661d253052fca0f0/src/omero_zarr/cli.py#L330 )

will-moore commented 1 month ago

You could try to use --bf to export an Image from a Plate. This will give you the same behaviour as if --bf were supported for Plates, since the list of Files in the Fileset (linked to Image) will cover the whole Plate.

Just reading the code... It will pass the first file in the Fileset to bioformats2raw. I don't know for sure that this will work for all Plates, but it's worth a try if you're interested.

TorecLuik commented 1 month ago

You could try to use --bf to export an Image from a Plate. This will give you the same behaviour as if --bf were supported for Plates, since the list of Files in the Fileset (linked to Image) will cover the whole Plate.

Just reading the code... It will pass the first file in the Fileset to bioformats2raw. I don't know for sure that this will work for all Plates, but it's worth a try if you're interested.

Actually that is an interesting test.

Turns out that doesn't actually work though. Which might be an actual issue? Not sure who wants to get 1 image from a plate, but apparently it doesn't work.

Exception in thread "main" picocli.CommandLine$ExecutionException: Error while calling command (com.glencoesoftware.bioformats2raw.Converter@6aaceffd): java.io.IOException: '.zgroup' expected but is not readable or missing in store.
        at picocli.CommandLine.executeUserObject(CommandLine.java:1962)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
        at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
        at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
        at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
        at picocli.CommandLine.call(CommandLine.java:2761)
        at com.glencoesoftware.bioformats2raw.Converter.main(Converter.java:2826)
Caused by: java.io.IOException: '.zgroup' expected but is not readable or missing in store.
        at com.bc.zarr.ZarrGroup.validateGroupToBeOpened(ZarrGroup.java:109)
        at com.bc.zarr.ZarrGroup.open(ZarrGroup.java:102)
        at com.bc.zarr.ZarrGroup.open(ZarrGroup.java:95)
        at com.glencoesoftware.bioformats2raw.Converter.saveHCSMetadata(Converter.java:2055)
        at com.glencoesoftware.bioformats2raw.Converter.convert(Converter.java:1352)
        at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:1137)
        at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:104)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
        ... 9 more
will-moore commented 1 month ago

Just to confirm, when you check the Plate -> Image in webclient (the Image you just tried to export with --bf) do you see a single File in the Fileset: under /../. e.g. sample.ome.tiff:

Screenshot 2024-10-02 at 17 24 04

I'm not sure why you're seeing '.zgroup' expected but is not readable or missing in store. since I don't expect bioformats2raw to be reading a zarr, but should be reading a tiff?!

TorecLuik commented 1 month ago

Yes, I see those same values as your screenshot.

Without the --bf it makes a zarr of that 1 image, 3 channels, 4 sizes (I guess).

bash-5.1$ ls 12612.zarr/*
12612.zarr/0:
0  1  2

12612.zarr/1:
0  1  2

12612.zarr/2:
0  1  2

12612.zarr/3:
0  1  2

With the --bf it gives this error at some midway point after creating an empty zarr and a tmp file that seems to go towards a screen (A1):

bash-5.1$ ls 12612*/
12612.tmp/:
A

12612.zarr/:

specifically:

bash-5.1$ ls 12612.tmp/A/1/0/
0/       1/       .zattrs  .zgroup  

I'm not sure why you're seeing '.zgroup' expected but is not readable or missing in store. since I don't expect bioformats2raw to be reading a zarr, but should be reading a tiff?!

I guess it might be reading the tmp ZARR to write the actual ZARR? Then maybe it doesn't expect the screen-layout (since we are expecting 1 image) and looks for the layout at the top.

The error comes from com.bc.zarr.ZarrGroup.validateGroupToBeOpened. So at this point it is validating (tmp) zarr.

And, since the error mentions HCS specifically at com.glencoesoftware.bioformats2raw.Converter.saveHCSMetadata(Converter.java:2055), maybe it expects in a true HCS zarr / tmp file already but there is some top-level .zgroup files missing now.

TorecLuik commented 1 month ago

Conclusions so far: just don't use --bf with omero-cli-zarr.

But, it still takes 20 minutes, which in a subprocess makes the OMERO script ICE connection time out in the meantime and fail. So the long time is still an issue beyond twiddling your fingers.

To get the 2 minutes bioformats2raw version, I could bypass omero cli zarr and just directly call bioformats2raw in a subprocess if I ask OMERO for this original file location "paths on server" that is shown in your screenshot?