ome / omero-blitz

Gradle project containing Ice remoting code for OMERO
https://www.openmicroscopy.org/omero
GNU General Public License v2.0
0 stars 15 forks source link

Fix ordering of OriginalFile objects registered in saveFileset() #148

Closed sbesson closed 3 months ago

sbesson commented 4 months ago

Short summary

Initially reported in https://forum.image.sc/t/in-place-import-path-from-image-id-with-omero-database-incoherence-with-originalfile-filesetentry/81509 by @Tom-TBT, the link between FilesetEntry and OriginalFile objects created at import time can be incorrect under some conditions for multi-file filesets imports.

Technical description

During the import process, the server receives a Fileset object and a matching list of paths. The saveFileset() API first registers these paths as OriginalFile then associate these files to each fileset entry in order.

The current implementation of _internalRegister groups the input checked paths by parent using a ListMultimap and iterates through each group to create or retrieve the OriginalFile objects. Under some conditions, the returned list will be in a different order than the input list. This change was originally implemented in https://github.com/ome/openmicroscopy/pull/3399.

This commit does not modify the mapping and creation logic but adds a final step to return a list of OriginalFile sorted in the same order as the input checked paths. This contract is also added to the Javadoc.

Testing

The issue should only affect multi-file filesets with multiple directories. Single-file filesets or multi-file filesets contained within a single directory should be unaffected. In the HCS domain, several formats will span multiple folders and can be used for testing. In the Digital Pathology domain, the VSI file format generated by Evident (formerly Olympus) cellSens software is a good candidate for testing.

Without this PR, importing a fileset in any of these formats using either OMERO.insight or the command-line, independently of the selected options, can result in incorrect links between FilesetEntry and OriginalFile objects. This can be visualized by looking the order of the Imported from and Paths on server lists in OMERO.web when inspecting the Fileset info e.g. in https://idr.openmicroscopy.org/webclient/?show=image-4995801

Screenshot 2024-03-22 at 10 37 49

The Paths on server list should be grouped by sub-directory in an non-deterministic order. This means 1- multiple imports of the same fileset might result in possibly different orders of the server paths 2- it is possible that an import will result in the server paths matching the client paths.

With this PR including, the same import workflow should result consistently in correct FilesetEntry / OriginalFile links. At the OMERO.web level, the order of the client paths and server paths lists should be consistent across all formats.

Next steps

This change only aims to fix the issue with new imports and won't address incorrect FilesetEntry/OriginaFile links for existing filesets. For the latter, the only viable approach will be to used an upgrade script that ideally takes advantage of the Fileset.clientPath entry, if this has not been modified. I will write up a separate issue discussing this upgrade script as well as wider cleanups and DB schema upgrade associated with this problem

dominikl commented 4 months ago

Looks good to me 👍 On demo server (without PR) I got the different ordering ('imported from' vs 'path on server'), on merge-ci (with PR) the order was the same.