FilesetWrapper lazy loads Images and UsedFiles

will-moore commented 3 months ago

Fixes #404.

When we do conn.getObject('Fileset', fileset_id) we don't try to load all Images and Files up front. Instead, these are lazily loaded as required using the same pattern as for other Wrapper objects.

Test that fileset.copyImages() and fileset.listFiles() behaviour don't change:

E.g. test script should show no change:

python test.py 123

import argparse

from omero.cli import cli_login
from omero.gateway import BlitzGateway

def main(args):
    parser = argparse.ArgumentParser()
    parser.add_argument('fileset', type=int)
    args = parser.parse_args(args)
    fileset_id = args.fileset

    with cli_login() as cli:
        conn = BlitzGateway(client_obj=cli._client)

        fileset = conn.getObject('Fileset', fileset_id)

        for i in fileset.copyImages():
            print("Image", i.id, i.name)

        for f in fileset.listFiles():
            print("File", f.id, f.name)

if __name__ == '__main__':
    import sys
    main(sys.argv[1:])

sbesson commented 3 months ago

Maybe I am missing something obvious but shouldn't _FilesetWrapper._getQueryString() also be updated to remove the fetch commands in order to achieve proper lazy loading?

will-moore commented 3 months ago

@sbesson - Apologies, that's a copy/paste error on my part. Thanks for catching! I have had issues in the past with pip install -e for omero-py so I was editing in a different location. I certainly used the updated query string in testing. Fixed in b86cd89d5

sbesson commented 3 months ago

Thanks. Two additional questions:

arguably, what is proposed is as a breaking change as a consumer relying conn.getObject('Fileset', fileset_id) to return a loaded object would need to modify their code to call copyImage and/or listFiles. Should this consider some form of backwards-compatibility e.g. using the opts parameter to toggle lazy loading?
when using lazy loading, should _FilesetWrapper._getQueryString simply use BlitzObjectWrapper._getQueryString which fetches the owner and creation event in addition to the object?

will-moore commented 3 months ago

I'm not sure if this is considered a breaking change, at least not to the BlitzGateway API. For a consumer to use the FilesetWrapper as in the example above, nothing should change, and it shouldn't make any difference that the loading happens under the hood during the initial getObject() call or subsequent calls (except that if you're only loading images or files (and not both) then the new behaviour should be faster).

If you consider the API to extend to the underlying fileset._obj object behaviour, and you want to do this:

fileset = conn.getObject('Fileset', fileset_id)
fileset._obj.copyUsedFiles()

then this will break. This is kinda edge-case behaviour, so I'm not sure that it justifies a major release, but if so then I think it's worth doing because it's important that the default behaviour of conn.getObject('Fileset', fileset_id) is changed to avoid OOM errors.

Using opts to specify whether we want to load more of the graph up front has been useful for other objects, primarily for use in the JSON API, where we want to pass the underlying omero.model objects to the encoder. In the case where we are using the BlitzGateway API to traverse the graph, lazy loading is the preferred behaviour, particularly in this case.

I'm happy to add opts for load_images and load_files, as long as the default opts are False, but I also feel that this feature isn't really needed yet so I'm OK with leaving it till it is required.

In the meantime I'll remove the _FilesetWrapper._getQueryString as suggested is it's not needed.

ome / omero-py

FilesetWrapper lazy loads Images and UsedFiles #405