RFE: per-image rendering settings export/import

sbesson commented 4 years ago

The initial logic of the rendering plugin has been primarily driven by the high-content screening use case i.e. fairly homogeneous imaging datasets where a single rendering file can be applied to all images within a plate. In the case of IDR experiments, datasets can be much more heterogeneous with mixed dimensionalities and modalities. In many cases, a single set of rendering settings is not sufficient.

The https://github.com/IDR/idr0067-king-yeastmeiosis was fairly representative in the sense that datasets contained fluorescence multi-C multi-Z, fluorescence multi-C maximally projected and single channel brightfield images and each image needed to be adjusted individually. During the curation, I used a branch with the following changes:

dff690033bfb693c621d2adbcadb98daa4cc1c0a adds an option allowing to store rendering settings in a directory structure. The return value of render_images API is expanded to include a string context in addition to list of images. This context is then used for creating the directory tree, e.g. if the command target is a project, the layout is <project_name>/<dataset_name>/<image_name>.yml
5850f28e4bd80abf04b91603308a24cc5c839581 updates the set command to be able to consume rendering settings using the layout above.

With both changes, bin/omero render info -f Project:<id> was used to recursively extract the imported rendering settings. After curation bin/omero render set -f Project:<id> was invoked to recursively update all images in the project:

Based on recent IDR examples, my impression is that such a functionality would be valuable to backport to the render plugin. Before consolidating the prototype as well as adding tests, I am opening as an issue in case we need to capture similar use cases that should be also considered like:

the ability to handle rendering settings per dataset as in https://github.com/IDR/idr0040-aymoz-singlecell/tree/master/experimentA/rendering_settings
an API allowing more granularity in the directory structure e.g. using some expansion pattern -f %p/%d/%i.yml

/cc @joshmoore @dominikl

joshmoore commented 4 years ago

Few thoughts:

any overlap with the layout of downloader? @mtbc
is there any yml-native way of building up a tree from multiple files?
should we unify expression patterns here with bfconvert?

mtbc commented 4 years ago

At the least, when downloader offers some friendly customizable directory structure (even if just as links into the weird one) it would be good to ensure that it's possible to express whatever the string contexts here can. Oddly I can't currently find a corresponding downloader issue/card but it's on the radar anyway.

sbesson commented 4 years ago

I spent a minimal amount of time on omero-downloader testing the binary retrieval of an entire dataset for comparison. At the moment, the layout produced by the utility is very focused on mimicking the state of the source database:

Fileset/<fileset_id>/Binary/<filename>
Image/<image_id>/Binary/<filename>
Repository/<repository_id>/<username_userID>/<YYYY-MM>/...

While this is probably the most unambiguous construction, two comments specifically in the context of an export/import workflow:

this is not the easiest representation for expanding with additional data types e.g. rendering settings
the metadata associated with the Project/Dataset information seems to be lost. For instance it would not be possible to reimport the same structure without additional information

Re handling multiple files, I think this is still not a feature of the core YAML. Libraries could certainly have their own implementation like we already do in bulk imports for instance. In the mid-run, this might be another data type that we would like the next-generation file format to gracefully handle.

Generally, all for unifying the semantics of the patterns for our core concept (image name...) across the board wherever possible - from Bio-Formats to OMERO.

mtbc commented 4 years ago

OMERO.downloader is heavily tied to the OME data model and what OMERO provides for working with it. Container data is lost because OmeroMetadata is so very incomplete, ideally that would be code-generated; it's an even more stark omission for HCS. Also rendering settings are not captured in the data model. Were they then they would probably end up in some subdirectory of Image/<image_id>/ but on the roadmap for a friendly UI (needs a design phase first) is to allow specifying an arbitrary layout that can copy or symlink into the server-side layout that is needed for knowing what's already downloaded and how to assemble XML from it.

mtbc commented 4 years ago

(Or a "downloader gateway" could simply know how the folder layout works and provide nice API calls for opening data and navigating the links.)

dominikl commented 3 years ago

Is there a reason why we can't just get your two commits into the render plugin @sbesson ? This would be very helpful. For most IDR datasets these days "cloning" the rendering settings from the pilot into the production system is quite a tedious task, which would be very much simplified by this.

sbesson commented 3 years ago

@dominikl Sorry for dropping the ball on this. A few thoughts to try and move forward :

the need to export/import rendering settings defined at individual images regularly comes up for IDR submissions. Do we a rough estimate of the percentage of studies where this is required @francesw ?
the level of distribution will vary from one study to another i.e. HCS vs non-HCS, dataset-level vs image-level.
the implementation proposed for idr0067 as well as https://github.com/ome/omero-cli-render/pull/50 make use of folders as a way to represent the hierarchy while https://github.com/ome/omero-cli-render/pull/52 make use of JSON/YAML structures.

The proposed upcoming OME-Zarr on collections & rendering might lead us to explore an hybrid solution where the rendering settings would still be distributed in different folders but while relaxing the constraint that the subfolder path must match the project/dataset structure. Instead this hierarchy could be stored in the top-level metadata e.g.

experimentA/
    .zattrs # metadata containing project/dataset information
    image_1/ 
        .zattrs # metadata including the rendering settings
    image_2/ 
        .zattrs # metadata including the rendering settings
    ...

As the above is largely unspecified and will take several itereations, I think it makes sense to introduce an intermediate extension either the CLI render spec that supports our needs. Either way it would be useful to keep the above in mind as we agree on the layout. How many use cases do we want to support? HCS? dataset-level? image-level?

ome / omero-cli-render

RFE: per-image rendering settings export/import #36