ome / omero-py

Python project containing Ice remoting code for OMERO
https://www.openmicroscopy.org/omero
GNU General Public License v2.0
22 stars 32 forks source link

conn.c.download behavior differs from java implementation #425

Open Rdornier opened 2 months ago

Rdornier commented 2 months ago

Hello,

I noticed that the behavior of the conn.c.download method is not the same as the one implemented on omero.inisght (and omero-java-gateway) In the latest version of omero.inisght, when an image is downloaded, a parent folder Fileset_xxxx is created and the image(s) linked to the corresponding fileset are downloaded inside.

This is not the case in omero-py. The image is simply downloaded without creating any folder.

I was wondering if it is possible to mirror the behavior of the latest java implementatio into omero-py. It is quite annoying to deal with those two behaviors in automated tasks, espacially when I have to read the images froma a different script.

Thanks, Rémy.

Edit : current existing bug in Java implementation for vsi files : https://github.com/ome/omero-gateway-java/issues/89

sbesson commented 2 months ago

@Rdornier you are right that the two implementations behave differently. Going further, I believe the contract of these APIs and their signature was never expected to be compared directly.

The Python omero.client.download API effectively mirrors the Java one at omero.client.download. In both cases, the behavior is to take an OriginalFile as an input and download it to a local file (or file handle). Both APIs have no support for sets of OriginalFile like Fileset. When using them, it is effectively the responsibility of the caller to do the looping and structure the download appropriately.

On the contrary, the omero.gateway.facility.TransferFacilityHelper.downloadImage API is operating on Image objects. For OMERO 5 data, it resolves the associate Fileset and will handle the download of the set of OriginalFile objects, preserving their internal relationships so that Bio-Formats can read them. In OMERO.py, the closest existing implementation would be the omero download CLI plugin which takes an Image or a Fileset as an input and then calls omero.client.download to download individual files - see https://github.com/ome/omero-py/blob/master/src/omero/plugins/download.py.

Rdornier commented 2 months ago

Hi @sbesson

Thanks for the clarification. I wasn't comparing the right APIs together. Ok, so Python and Java "core" API are mirrored but not "high level" ones.

In OMERO.py, the closest existing implementation would be the omero download CLI plugin which takes an Image or a Fileset as an input and then calls omero.client.download to download individual files - see https://github.com/ome/omero-py/blob/master/src/omero/plugins/download.py.

Thanks for pointing this out ! Actually, this implementation works fine but is not really usable outside the CLI, although the download_fileset method is the one that mimic the omero.gateway.facility.TransferFacilityHelper.downloadImage behavior, minus the Fileset_xxx folder.

Do you think it would be possible to implement a method Download_image at the BlitzGateway level (or any other level) which mirrors omero.gateway.facility.TransferFacilityHelper.downloadImage behavior, with the Fileset_xxx folder and being usable without the CLI ?

I currently duplicate the code of the download_fileset method in my project, but it's not super elegent and I would prefer to use a built-in API method.

Rémy.

will-moore commented 2 months ago

Hi @Rdornier - have a look at https://gist.github.com/will-moore/a9f90c97b5b6f1a0da277a5179d62c5a That code iterates through Projects and Datasets and downloads Images to a new Folder per Dataset (or per Image - see comments). The only thing you might need to address is if you've got multiple Images in the Dataset that come from the same Fileset then you'd download the same files multiple times.

Rdornier commented 2 months ago

Hi @will-moore,

Thanks for pointing out this code ; it works pretty well !

The only thing you might need to address is if you've got multiple Images in the Dataset that come from the same Fileset then you'd download the same files multiple times.

Ok, I filtered the filesets that have already been downloaded to avoid multiple times download.

At the end, the only thing to add to the downoload_fileset() method, to match Java implementation, is dir_path= os.path.join(dir_path, "Fileset_%s" % fileset.id). As it is only one line of code, it could also be easily added by the user before calling downoload_fileset(), but I think it is still better if both implementation give the same results.

will-moore commented 2 months ago

@Rdornier please feel free if you'd like to open a PR to propose the changes you'd like to see?