ome / omero-cli-zarr

https://pypi.org/project/omero-cli-zarr/
GNU General Public License v2.0
15 stars 10 forks source link

Name option #147

Open will-moore opened 1 year ago

will-moore commented 1 year ago

This PR adds an optional name_by argument with options of id (default behaviour) and name. It is needed for batch exporting many Images or Plates where we want the exported OME-Zarr image to have a useful name.

This PR has been used for all the omero-cli-zarr exports for ongoing IDR NGFF upgrade work:

When exporting from OMERO, we now adopt the naming convention of ID.ome.zarr or PlateName.ome.zarr instead of the previous ID.zarr.

If names contains square brackets [ ] then this can break writing to zarr (see errors below) so these are replaced by ( ).

To test:

$ omero zarr export Image:123 --name_by name
# will create image_name.ome.zarr

$ omero zarr export Plate:123 --name_by name
# will create plate_name.ome.zarr
will-moore commented 1 year ago

I have reverted the removal of .pattern from image names above. Just looking at how to handle various image names... I previously thought that names containing whitespace were causing errors, but it seems this is not always the case, since this works OK...

omero zarr export Image:5025553 --name_by name
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Exporting to Tonsil 3.ome.zarr (0.4)
...

But exporting an Image with a more complex name, e.g. JL_120731_S6A [Well A-1; Field #1] https://idr.openmicroscopy.org/webclient/?show=image-1229801 fails, with an exception that can be reproduced as follows:

from zarr.storage import FSStore
from zarr.hierarchy import open_group
open_group(FSStore("JL_120731_S6A [Well A-1; Field #1]", mode="w"))

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/zarr/hierarchy.py", line 1465, in open_group
    return Group(store, read_only=read_only, cache_attrs=cache_attrs,
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/zarr/hierarchy.py", line 164, in __init__
    meta_bytes = store[mkey]
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/zarr/storage.py", line 1393, in __getitem__
    return self.map[key]
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/mapping.py", line 143, in __getitem__
    result = self.fs.cat(k)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/spec.py", line 826, in cat
    paths = self.expand_path(path, recursive=recursive)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/spec.py", line 1005, in expand_path
    out = self.expand_path([path], recursive, maxdepth)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/spec.py", line 1011, in expand_path
    bit = set(self.glob(p))
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/implementations/local.py", line 70, in glob
    return super().glob(path, **kwargs)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/spec.py", line 591, in glob
    pattern = re.compile(pattern.replace("=PLACEHOLDER=", ".*"))
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/re.py", line 252, in compile
    return _compile(pattern, flags)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/sre_compile.py", line 788, in compile
    p = sre_parse.parse(p, flags)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/sre_parse.py", line 955, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/sre_parse.py", line 444, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/sre_parse.py", line 599, in _parse
    raise source.error(msg, len(this) + 1 + len(that))
re.error: bad character range A-1 at position 58

I don't know a good way to make all names "safe" from these types of errors.

Another example that fails with a different Error has Image name plate1_1_013 [Well 1, Field 1 (Spot 1)]

$ omero zarr export Image:179693 --name_by name
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Exporting to plate1_1_013 [Well 1, Field 1 (Spot 1)].ome.zarr (0.4)
Traceback (most recent call last):
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/mapping.py", line 143, in __getitem__
    result = self.fs.cat(k)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/spec.py", line 826, in cat
    paths = self.expand_path(path, recursive=recursive)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/spec.py", line 1005, in expand_path
    out = self.expand_path([path], recursive, maxdepth)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/spec.py", line 1031, in expand_path
    raise FileNotFoundError(path)
FileNotFoundError: ['/Users/wmoore/Desktop/ZARR/data/TEMO/plate1_1_013 [Well 1, Field 1 (Spot 1)].ome.zarr/.zgroup']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/zarr/storage.py", line 1393, in __getitem__
    return self.map[key]
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/mapping.py", line 147, in __getitem__
    raise KeyError(key)
KeyError: '.zgroup'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/zarr/hierarchy.py", line 164, in __init__
    meta_bytes = store[mkey]
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/zarr/storage.py", line 1395, in __getitem__
    raise KeyError(key) from e
KeyError: '.zgroup'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/bin/omero", line 8, in <module>
    sys.exit(main())
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/omero/main.py", line 125, in main
    rv = omero.cli.argv()
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/omero/cli.py", line 1784, in argv
    cli.invoke(args[1:])
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/omero/cli.py", line 1222, in invoke
    stop = self.onecmd(line, previous_args)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/omero/cli.py", line 1299, in onecmd
    self.execute(line, previous_args)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/omero/cli.py", line 1381, in execute
    args.func(args)
  File "/Users/wmoore/Desktop/ZARR/omero-cli-zarr/src/omero_zarr/cli.py", line 125, in _wrapper
    return func(self, *args, **kwargs)
  File "/Users/wmoore/Desktop/ZARR/omero-cli-zarr/src/omero_zarr/cli.py", line 342, in export
    image_to_zarr(image, args)
  File "/Users/wmoore/Desktop/ZARR/omero-cli-zarr/src/omero_zarr/raw_pixels.py", line 56, in image_to_zarr
    root = open_group(store)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/zarr/hierarchy.py", line 1465, in open_group
    return Group(store, read_only=read_only, cache_attrs=cache_attrs,
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/zarr/hierarchy.py", line 167, in __init__
    raise GroupNotFoundError(path)
zarr.errors.GroupNotFoundError: group not found at path ''

Same error with:

$ omero zarr export Image:3414011 --name_by name
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Exporting to 10percent-Wt1-GFP-spheroid-MV.czi [0].ome.zarr (0.4)
...
zarr.errors.GroupNotFoundError: group not found at path ''

and

$ omero zarr export Image:9022301 --name_by name
Exporting to subpool-1_run-1_EXP-19-BQ3550 [Pos101].ome.zarr (0.4)
...
zarr.errors.GroupNotFoundError: group not found at path ''

But, removing the [ character from this name fixed the error, so it looks like it's being recognised as a regex, which is causing the errors above, particularly the re.error: bad character range A-1 at position 58?!

Other examples that work OK:

$ omero zarr export Image:1884807 --name_by name
Exporting to Centrin_PCNT_Cep215_20110506_Fri-1545_0_SIR_PRJ.dv.ome.zarr (0.4)
...
$ omero zarr export Image:4995043 --name_by name
Exporting to ExperimentB_No05_DMSO_11_10min__010.czi.ome.zarr (0.4)
...
will-moore commented 1 year ago

So it looks like all names are OK except for those with [ and ] in them. Not sure how to avoid those being recognised as broken regex without actually changing the name we want to write?

will-moore commented 1 year ago

Replacing [] with () in names now.

dominikl commented 12 months ago

👍 Looks good to me. I've used the build from this branch a few times already for the NGFF conversion/export work.

will-moore commented 12 months ago

@joshmoore I reduced duplication by creating def get_zarr_name(obj, args)

I also noticed that we need the name for polygon/masks export, but supporting the --name_by name argument there could be quite a bit of work, so probably not worth it until we know it's needed. I fixed the .zarr -> .ome.zarr name at least and updated README

will-moore commented 8 months ago

Anything else needed here?

joshmoore commented 8 months ago

Nothing outstanding from my side.