Open tstoeter opened 1 week ago
Thanks, @tstoeter. Could you give a sense of the performance improvement you observe with and without this change?
I'm also curious what the use case is for running showinf
on many files at once. I can understand where this feature would be useful for conversion (i.e. in bfconvert
), but less clear on how it applies to just displaying metadata/images. Are you parsing the output of showinf
in some way?
Quick tests showed a 5x speedup for processing and printing metadata of 200 DICOM files (without displaying the images).
Our use-case is reading out metadata of many files in order to build relationships between them, based on their metadata. So yes, the metadata output will then be process further. We do not display the images.
Certainly, the code can also be quickly adopted by bfconvert
for speeding up batch processing converting images, too.
I do see a noticeable difference in a simple test without this change:
$ bfconvert "test&sizeX=10&sizeY=10&sizeZ=1000.fake" test_z%z.tiff
$ time for i in `ls *.tiff`; do showinf -no-upgrade -nopix $i; done
and with this change (on the same data):
$ time ls *.tiff | showinf -no-upgrade -nopix --
I'd be fine with including this in CI builds, but defer to @ome/formats on timing.
Note that before merging this pull request, we will need to have a signed Contributor License Agreement. @tstoeter, please submit the CLA according to the instructions here: https://ome-contributing.readthedocs.io/en/latest/cla.html
Quick tests showed a 5x speedup for processing and printing metadata of 200 DICOM files (without displaying the images).
Our use-case is reading out metadata of many files in order to build relationships between them, based on their metadata. So yes, the metadata output will then be process further. We do not display the images.
Interesting. Outside the scope of this pull request, note that showinf
may not print every DICOM tag in a file. All tags can be accessed via the initialized DICOM reader, though, which may be a more reliable way to compare metadata across files.
Thanks a lot @melissalinkert for the additional tests and pointing out the limitations of showinf
regarding DICOM tags. I have signed and submitted the CLA and would be happy if my PR got merged.
All tags can be accessed via the initialized DICOM reader, though, which may be a more reliable way to compare metadata across files.
Thanks, @melissalinkert. Outside the scope of this PR, but I wonder if as a parallel to the ability to add metadata from #4016 there could be a flag to include the additional metadata in an OME-XML annotation.
Thank you @sbesson. I intended to contrast the --
functionality from the other options, but using -
instead works just as well.
This reduces JVM startup overhead when processing thousands of files. If -- is passed as file argument then file names are read line by line from stdin and processed sequentially. Usage information for reading file names from stdin are added with the second commit.
Code compiles with
ant clean jars tools
, compilation with Maven not tested. All tests usingant test
have passed.