ome / omero-py

Python project containing Ice remoting code for OMERO
https://www.openmicroscopy.org/omero
GNU General Public License v2.0
20 stars 33 forks source link

Support omero download Fileset:ID #298

Closed will-moore closed 3 years ago

will-moore commented 3 years ago

See https://forum.image.sc/t/download-full-projects-with-their-datasets/56434/

This adds support for:

$ omero download Fileset:ID download_dir

Files are downloaded to the named directory. If original files have a path that is longer than the fileset template prefix, that path will be used to create additional directories inside the named directory.

The same behaviour is used for Images (this fixes the previous limitation of only a single-file for an Image):

$ omero download Image:ID download_dir

Other functionality to test for regressions:

$ omero download OriginalFile:ID download_filename
# assumes OriginalFile:
$ omero download ID download_filename
$ omero download FileAnnotation:ID download_filename
will-moore commented 3 years ago

Doing: $ omero download Image:1 --parent my-project/my-dataset/ isn't going to be very nice for the user above who "would like to download multiple (more than 100) whole projects while maintaining their dataset structure".

Is the main objection to $ omero download Project:ID that it might accidentally overwrite existing files (which is easy enough to fix) or that the logic doesn't handle edge-cases: e.g.

If the code as it is will work for the user who needs it, but we don't want to add it and maintain it here, then it could be a separate user script (gist) instead?

will-moore commented 3 years ago

As discussed this morning, added support for:

$ omero download Fileset:ID download_dir

This is now used by:

$ omero download Image:ID download_dir

PR description updated

joshmoore commented 3 years ago

Initially testing looks good on the Fileset: usage. :+1: If we end up sending more people to this tool, I imagine something like the ProgressBars from https://github.com/glencoesoftware/bioformats2raw/pull/83 might be useful, which might end up driving us to a framework like https://click.palletsprojects.com/en/8.0.x/

One other nice to have might be to clean up the file on cancellation and/or support restart, but that's getting a bit ahead of ourselves for now.

Edit: https://github.com/willmcgugan/rich was the other library for nicer CLIs (see https://github.com/ome/omero-py/issues/217)

will-moore commented 3 years ago

This is failing a bunch of tests for various reasons:

pwalczysko commented 3 years ago

Could the -h mesage of the command be updated for the new features too ?

pwalczysko commented 3 years ago

Tested images:

omero download Image:ID download_dir

Download only lif: https://nightshade.openmicroscopy.org/webclient/?show=image-4471884

ndpi https://outreach.openmicroscopy.org/webclient/?show=image-51066

Download and reimport DICOM (the below link points to the reimported image, as it is actually performing better than the original which I successfully downloaded) https://outreach.openmicroscopy.org/webclient/?show=image-59527

vsi: https://demo.openmicroscopy.org/webclient/?show=image-149042 reimported as https://demo.openmicroscopy.org/webclient/?show=image-182901 - is okay, except for thumbs still not generated after long time

All went fine

pwalczysko commented 3 years ago

Bug: the omero download Image:4409956 bread command fails because that image has no fileset, as it is coming from OMERO.figure (figure saved as a new image). Is it possible to let this fail more gracefully ?

Traceback (most recent call last):
  File "/Users/pwalczysko/miniconda3/envs/test-will/bin/omero", line 8, in <module>
    sys.exit(main())
  File "/Users/pwalczysko/miniconda3/envs/test-will/lib/python3.6/site-packages/omero/main.py", line 125, in main
    rv = omero.cli.argv()
  File "/Users/pwalczysko/miniconda3/envs/test-will/lib/python3.6/site-packages/omero/cli.py", line 1784, in argv
    cli.invoke(args[1:])
  File "/Users/pwalczysko/miniconda3/envs/test-will/lib/python3.6/site-packages/omero/cli.py", line 1222, in invoke
    stop = self.onecmd(line, previous_args)
  File "/Users/pwalczysko/miniconda3/envs/test-will/lib/python3.6/site-packages/omero/cli.py", line 1299, in onecmd
    self.execute(line, previous_args)
  File "/Users/pwalczysko/miniconda3/envs/test-will/lib/python3.6/site-packages/omero/cli.py", line 1381, in execute
    args.func(args)
  File "/Users/pwalczysko/miniconda3/envs/test-will/lib/python3.6/site-packages/omero/plugins/download.py", line 83, in __call__
    self.download_fileset(conn, image.getFileset(), args.filename)
  File "/Users/pwalczysko/miniconda3/envs/test-will/lib/python3.6/site-packages/omero/plugins/download.py", line 91, in download_fileset
    self.ctx.out(f"Fileset: {fileset.id}")
AttributeError: 'NoneType' object has no attribute 'id
will-moore commented 3 years ago

@pwalczysko That "bug" should be fixed by eeed4656225771 to fail as it did before, which is a bit more graceful.

pwalczysko commented 3 years ago

@pwalczysko That "bug" should be fixed by eeed465 to fail as it did before, which is a bit more graceful.

thanks, I will rebuild and retry

pwalczysko commented 3 years ago

Indeed, now the failure is much nicer

omero download Image:4409956 bread
Using session for pwalczysko@nightshade.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Swedlow Lab
Input image has no associated Fileset
pwalczysko commented 3 years ago

In summary, tested all 5 commands in the header of this PR on different file formats. All went okay. Ready to merge fmpov.

will-moore commented 3 years ago

All except 1 test fixed above (and changes to the tests at https://github.com/ome/openmicroscopy/pull/6288)

Failing test is: https://merge-ci.openmicroscopy.org/jenkins/job/OMERO-test-integration/lastCompletedBuild/testReport/OmeroPy.test.integration.clitest.test_download/TestDownload/testPolicyGroupRestriction__read__write__image__plate_/

EDIT: test fixed at https://github.com/will-moore/openmicroscopy/commit/4d50effc18d1b1f1ce211d197819c5cacf7dc3d9

will-moore commented 3 years ago

Tests passing (I added one to test download of Fileset). I assume we don't need to wait on user at https://forum.image.sc/t/download-full-projects-with-their-datasets/56434/ before merging this?

sbesson commented 3 years ago

Thanks @will-moore, overall I am very much in favor of releasing features added in this PR as they lift a long-standing limitation of the plugin when downloading original files associated with images. Allowing complete fileset download has the classical pitfalls associated large download that we have seen previously. I assume this feature respects the policy restrictions set up server-side?

As suggested by the description of this PR and is more clearly exposed in the accompanying integration tests, this PR effectively changes the semantics for a single file image download. Previously omero download Image:1 foo would create a file called foo with the content of the original file. With this PR, a foo folder will be created and contain the original file(s).

I have been pondering about the breaking nature or not of this API change and I am increasingly inclined to qualify the previous behavior as incorrect and prone to data corruption. This is related to the fact that the command gives full control over the target filename which is for instance at odds with the fact that the original file extension can be a requirement for Bio-Formats and other translational libraries. Taking the FITS format as a representative example , with the current implementation, the following round-tripping will fail with the current version of omero-py plugin but and the current implementation will fix it:

omero import test.fits
omero download Image:1 /tmp/foo
omero import /tmp/foo

Happy to see this released as a minor version increment under these terms.

joshmoore commented 3 years ago

I assume this feature respects the policy restrictions set up server-side?

It has no choice :wink:

will-moore commented 3 years ago

@sbesson There are tests for the download policy which are testing the Image download - see https://github.com/ome/omero-py/pull/298#issuecomment-902553211

imagesc-bot commented 2 years ago

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/download-full-projects-with-their-datasets/56434/16