Open lahr-ul opened 3 years ago
Hi @lahr-ul we have been facing a similar issue while trying to load CIF files in the context of an IDR submission.
My suspicion is that the problem is related to the number of objects in the file, which can easily reach several 10K in this cytometry format. Do you know how many individual images (Bio-Formats series) are contained in the file?
Hi @lahr-ul we have been facing a similar issue while trying to load CIF files in the context of an IDR submission.
My suspicion is that the problem is related to the number of objects in the file, which can easily reach several 10K in this cytometry format. Do you know how many individual images (Bio-Formats series) are contained in the file?
About 70K. Sometimes smaller files with about 60MB also fail.
Sorry for dropping the ball. Understood and the large number of images (>10K) is most likely the reason for the hanging metadata due to the huge number of objects to be inserted into the database (typically ~10 / image so we are talking about 1M rows insertion).
We dealt with very similar scalability issues in the case of high-content screening datasets, which have similar number of images in the 1-100K range. The database bottlenecks have been mitigated by a series of optimizations like collapsing some of the elements e.g. https://github.com/ome/openmicroscopy/pull/3261.
The only immediate workaround I can think of would be to export the CIF series as individual images e.g. using bfconvert
or bioformats2raw
and import the images individually. To be able to natively import these filesets, I suspect we need to identify the elements that are duplicated and could be reduced if possible.
We want to import CIF files of varying size in OMERO with the importer. This process works seamlessly for small files (e.g. 16MB) but not for large files (e.g. 192MB). The import of a small file takes about 20 seconds and the import of a large file times out after several hours (user session and/or Hibernate session). The issue can be reproduces on a production system and a local docker compose setup.
Here is an excerpt of the log:
There are multiple
Starting referenceBatch #
statements, the importer hangs at "importing metadata" and after several hours there is a timeout.We also tried to change some configuration values without success: