Open sbesson opened 7 months ago
When looking into the usage of Pixels path/name in IDR, I think I checked a bunch of Filesets in IDR to see if the first File matched the path/name in Pixels and found that there were differences in many cases. Unfortunately I can't remember where I documented that... Also I don't know if Bio-Formats would have behaved differently if the different path/name was used in setId().
Ah - I found it: https://github.com/IDR/idr-metadata/issues/660#issuecomment-1570145684 So, testing 1 image from each study in IDR (to get a good mix of formats etc) it looks like there were 22 studies where the Pixels path/name didn't match the first OriginalFile from the Fileset.
Thanks @will-moore looking at your list of mismatches, all of these examples are multi-file & multi-folder file formats, primarily HCS but not only. Also from a quick search using the IDR UI, it seems that the first FilesetEntry.clientPath
is matches Pixels.path
and Pixels.name
. Both of these observations are consistent with my expectations based on preliminary investigation.
Also I don't know if Bio-Formats would have behaved differently if the different path/name was used in setId().
Unfortunately, the answer here is "it depends". In the worst case scenario, Bio-Formats would throw an UnknownFormatException
on setId
.
As a next step here my plan is to write a pre-check SQL script that iterates through all the Fileset
in a database and tries to match the first FilesetEntry
with any of the OriginalFile
using fileset.clientPath
and originalfile.{path,name}
. We should be able to run this script against the IDR database and other OMERO databases to give us a feeling on whether we can fix these links in an authoritative manner.
Background
The OMERO 4.2.0 release in July 2010 included an alteration to the database schema to add
name
,path
andrepo
columns to thePixels
table with a similar meaning as the columns in theOriginalFile
table. This change was part of the initial work adding native file format support in OMERO via Bio-Formats also known as FS lite. For a subset of file formats (primarily single file and with large XY dimensions), the original file was uploaded to the binary repository and linked from the Pixels object. This allowed the server to perform certain operations including the generation of OMERO pyramids.Full support for native file format support in OMERO, also known as OMERO.fs, was introduced in OMERO 5.0.0 in February 2014 with the introduction of the
Fileset
table linked to theImage
. EachFileset
row is linked to an ordered set ofFilesetEntry
rows each of these being themselves associated with a singleOriginalFile
entry. This change effectively superseded the FS Lite concept allowing native support for single and multi-file formats as well as multi-image formats. In OMERO 5.1, theseries
column was also introduced to theImage
table to store the mapping between an image and the underlying Bio-Formats series.Current API
Despite OMERO 5 actually deprecating their usage, the
Pixels.name
,Pixels.path
andPixels.repo
columns are still currently heavily used server-side as of OMERO 5.6.x:Challenges
The current logic is problematic for several reasons:
Pixels
object as well as theOriginalFile
linked to theFilesetEntry
Pixels
metadata in the importer can cause substantial DB operations especially in the case of high-content screeningFileset/FilesetEntry
, thePixels
name
,repo
andpath
attributes are still hidden from an API perspective.omero.db
configurations properly set in its configuration in order to execute the PostgresSQLActionPixels
object via SQL script - see https://github.com/IDR/idr-metadata/issues/656As an additional related complication, a historical bug has been reported in the image.sc forum where the
OriginalFile
are incorrectly linked toFilesetEntry
for some multi-file filesets.Proposal
IFormatReader.getUsedFiles
, the output ofImportCandidates
and the firstFilesetEntry
is the file that should be passed toIFormatReader.setId
d. Optionally, create an upgrade script allowing to convert FS lite imports intoFileset
Pixels.name
,Pixels.path
andPixels.repo
columns b. Update all the server APIs to use the OriginalFile from the first FilesetEntry as the source of truth/cc @joshmoore @jburel @kkoz @chris-allan @will-moore @dominikl @Tom-TBT