Closed melissalinkert closed 6 years ago
Could I propose another way for your consideration. Using Sub-IFDs to store low-res pyramid levels would require no changes to OME-TIFF schema, keep images fully backward compatible and allow supporting software to access pyramids. In libbioimage I've chosen the Photoshop sub-ifd approach as the default one and so far it has been working great.
Yes, use of the SubIFD tag could be considered (also suggested in https://github.com/openmicroscopy/bioformats/pull/2747#issuecomment-278966356) instead of point (1) in the OME-TIFF section. This still requires a specification change, and points (2), (3) and part of (4) above may still be relevant in either case.
Coming a bit late to the discussion. Thanks @dimin for your comments and the Sub-IFD suggestion. We completely agree that backwards compatibility should be considered as a requirement for this proposal.
One comment regarding the semantics: we effectively maintain and discuss two related specifications: the OME Data Model and the OME-TIFF format.
The initial idea is certainly to explore extensions of the OME-TIFF format specification without requiring any Data Model change (i.e. no new schema release). In that sense, using annotations and/or implicit conventions for the prototyping should be sufficient while working on enforcing rules for the numbers and order of the sub-resolutions.
While we are still brainstorming/exploring alternatives, a completely different approach would be to make use of the Folder introduced in the 2016-06 as the container for the various pyramidal resolutions. In this case,
Image
element in the OME-XML using the current semantics for mapping IFDs with TiffData
Folder
elements would group all sub-resolution images belonging to the same pyramidthe major impacts of this approach would be the following:
a. model level: this would require a namespace in a long-term while it could use an annotation in the short-term to distinguish pyramid folders from regular image folders b. API level: the largest impact would be at the subresolution API level as this approach completely breaks the representation of non-flattened resolutions as a folder of images rather than a single image. This would likely have an impact on the entire stack incl. OMERO
Note also https://www.openmicroscopy.org/community/viewtopic.php?f=15&t=8433 from @dsudar.
Thanks @mtbc for the alert and pointing me to this existing discussion.
Looks like most of what I wrote was already being considered by @melissalinkert and @sbesson in the above. I'm hoping this is a good time to pick this topic up again. We specifically need something like this for our highly multiplex IF imaging workflow where we cyclic-ly (4-5 channels per cycle) acquire many (>30) channels from tissues or TMAs. Our pre-processing performs the image registration between the "cycles" and ideally would output the single pyramidal OME-TIFF image or a channel-stack of pyramidal single-channel (OME-)TIFFs which can then be imported into OMERO for downstream analysis, visualization with PathViewer or iViewer, etc.
I'm about to start our programmers on implementing my proposed changes to libtiff and hopefully with @rleigh-codelibre 's help into OME Files. Any work on Bio-Formats and possible model changes is of course fully in your wheelhouse.
@dsudar To keep you up to date, I've spent most of last week looking at how to implement SubIFD properly in TiffWriter, so we can create proper TIFF images with sub-resolutions, and I'm writing it up as I go. I'll follow up in a few days once I have something concrete to show and discuss.
Thanks @rleigh-codelibre . So is it the team's preference to use the SubIFD structure for the sub-resolution images? I'm perfectly fine with that approach even though most existing ad-hoc implementations use a sequence of top-level IFDs so it might be harder to get the wider community on-board. I look forward to your findings.
@dsudar Either approach should work, and there are tradeoffs for both. Using top-level IFDs is better for interoperability with software lacking SubIFD support, and also for being able to reference these IFDs from OME-XML. However, libtiff supports writing out SubIFDs properly as part of its basic functionality, and it's likely difficult to change that behaviour to write out top-level IFDs (though I still need to test this further). If we wish to care about Bio-Formats and OME Files C++ being able to create and read the same data, something which works for both is likely a prerequisite.
Could I clarify a few points from your forum post?
4) Compression is supported on a per-tile basis and multiple compression methods are optionally allowed
6) In either case the reduced resolution sequence should follow a dyadic reduction in both X and Y until one of the dimensions reaches 256. Other reduction schemes are allowed and can be encoded in a higher level specification such as OME-TIFF or alternatively the reader code can deduce it from the sizes of the reduced resolution images in the sequence.
7)
NewSubFileType and SubFileType.
8) Each IFD not containing a reduced resolution image can optionally store a thumbnail in its SubIFD 1. Thumbnail must be JPEG or PNG, strip or raster (no tiles), and no larger than 4096x4096; recommended size is 1024 pixels on largest side. In addition, to support digital pathology applications, the first IFD in the file can optionally store a slide label image in its SubIFD 2 with the same specifications as the thumbnail image. And optionally a slide overview image with the same specifications can be stored as SubIFD 3.
Thanks, Roger
Hi @rleigh-codelibre. Thanks for the detailed comments.
using top-level IFDs vs. SubIFDs for the subresolutions: Indeed, I was going back and forth between the 2 approaches as well. I was hoping that within OME-TIFF and ideally also in libtiff (assuming we can get them on-board) we could standardize on only one of the 2 approaches and only write in that "approach". But indeed, reading both approaches would be best.
compression per tile: Sorry, that was indeed poorly worded. I just wanted to confirm what tiff already supports. Indeed, one compression method per IFD but the compression is done on a per-tile basis. But it would be good to be flexible (within reason) as to which compression methods are supported. Just one thought: there is a argument to be made to store the full-res image in an uncompressed or lossless compressed way so the "real" pixels are available for image analysis that needs that. But for fast visualization it is better if the sub-res versions are more aggressively compressed. Such hybrid compression scheme would be difficult with the "subresolutions in SubIFDs" approach. What do you think?
reduction strategy: I completely agree with your statement.
NewSubFileType and SubFileType: Yes, agreed. I got a little into the weeds there. Will remove.
thumbnails etc.: The only precedent I know of is Kemp Watson's ZIF format (http://zif.photo/) and it seemed to be a nice balance with the approach of storing the subresolutions in toplevel IFDs. If however, we decide to go with SubIFDs for the subresolutions, then indeed, the label image, overview image, etc. should be in toplevel IFDs (and indeed the OME-XML can properly annotate that). And indeed the thumbnail should always just go into the full-res IFD. I'll check with Kemp why he went with the thumbnail in a SubIFD concept.
Cheers, Damir
On the last item re: thumbnails: what I said is probably not correct. In regular TIFFs, aren't thumbnail images (if present) not always the first SubIFD? However, if we end up using SubIFDs for the subresolutions, then the thumbnail could simply be the last of the SubIFDs (or maybe better, keep it as the first??).
@dsudar If using an EXIF thumbnail then I think you would use and ExifIFD
on the main image, which points to a sub-IFD containing the thumbnail and any additional EXIF metadata; this would be completely independent of the sub-IFDs pointed to by SubIFDs
. There may be multiple ways of doing this though; we would have to investigate in more detail. However, I think we can do this independently of supporting sub-resolutions.
@rleigh-codelibre Ah yes, thanks. I had not realized that the thumbnail is typically stuffed into the Exif IFD structure. Yes, indeed, that would make such a "general purpose" thumbnail fully independent from what we are discussing here. So do you think it's appropriate to support the optional "slide label image" and "slide overview image" in separate top-level IFDs that are defined by the OME-XML metadata? And if we do that, what do you think of also providing support for an optional "thumbnail image" the same way? That thumbnail image could be more flexible than the limited Exif thumbnail (e.g. could be larger, could be different image format, could even be an animated GIF, etc.).
Based on the discussion so far, I have changed my original write-up so it specifies use of the SubIFD structure for the subresolutions and more. The new draft is enclosed here. PyramidTIFF_SubIFD_proposal_20180205.docx
And I just noticed @rleigh-codelibre 's write-up in the Code section. Looks great and addresses pretty much everything in detail. For clarity strategy C looks best to me but strategy B has the hidden benefit that it maintains some compatibility with the "subresolutions in top-level IFDs" approaches.
@dsudar The design proposal for implementing sub-resolutions is now available at http://openmicroscopy.github.io/design/OME005/ (no changes since you looked at it under the code
section; it's just been moved and made available via github pages). If you have any additional suggestions or comments, we would be very interested. Following any further public feedback, we should be in a good position to start implementation and testing of sub-resolutions with TIFF.
Closing this issue given this has moved into a more formal proposal https://github.com/openmicroscopy/design/issues/74#issuecomment-364128424. As mentioned above, looking forward to hearing additional comments and feedback about http://openmicroscopy.github.io/design/OME005.
Hi all,
I'd like to add a few comments to @rleigh-codelibre 's excellent design doc OME005.
1) As I mentioned above, considering the pros and cons of the various strategies, I would like to register my vote for C. It provides the most clarity and stays closest to the TIFF spec and intent. 2) While I agree that my suggestions to formalize the storage of "utility images" such as the slide label, overview, etc. is a bit orthogonal to the storage of sub-resolution pyramids, but in the domain we all work in, i.e. digital pathology, it is important to have a formal way to handle those and this might be the time to also specify that. If not in OME005, would there be support to open a parallel design proposal to specify those? 3) Similarly, and this may be going too far and again is orthogonal, would it make sense to also provide a formal way to handle multiple scenes (or regions) in a single file? 4) In my write-up that is attached to this discussion I inserted a suggestion in point 4) to formalize the option of storing the full-size image using a different compression (lossless or uncompressed) than the subresolution pyramid in order to give downstream analysis software access to the full data while still providing good visualisation performance when showing the sub-res images which could be compressed. 5) I tried to understand how currently the pyramid is stored in the OMERO internal format. Is that compatible with the OME005 proposal? 6) I like the considerations for handling reductions in Z and other advanced concepts as considered in strategy E. However, I'm wondering whether such advanced concepts are still reasonable within TIFF's limitations. So the thought of starting to look at a container such as HDF5 sounds very reasonable to me. That may also open the door to concepts to store data in arbitrary chunks that can be retrieved much faster than a fairly linear flat container structure such as TIFF can provide. I'm thinking of the considerations that went into Keller's KLB format.
Dear Damir,
Thanks for raising all these interesting points. To respond to them:
Kind regards, Roger
Dear Roger, Yes, understood and acknowledged that my point 2 and 3 are separate and mostly orthogonal to the pyramid design issue. Maybe since it especially 2 is so integral to the digital pathology application area, Glencoe might have some interest in tackling that (@chris-allan )? On 4: That's great and glad it's working out that way. So I guess any future developer's doc should just alert the developer of that valuable option. On 5: Yes indeed. It's just a little confusing that there will be 2 fairly different pyramid implementation in an OMERO environment. Or is it likely that your new design will also be used for the internal format eventually? I actually had another reason to ask: the internal pyramid format uses JPEG2000 which is capable of doing built-in lossless compression in addition to a broader range of good lossy compression options. Rather than having to rely on the current TIFF compression options (lossless: LZW and ZIP, and lossy: JPEG) it would be nice to have JP2K compression available as part of the Bio-Formats tool set. And of course, most of the code for that already exists in your hands ..... just a thought. On 6: great to hear your thinking on that. I completely agree. And I'd be happy to discuss further. How about a topic (again) for the User's Meeting 3rd day. Cheers, Damir
Regarding (4), we can certainly extend the documentation to make it clear that this is possible.
For (5) it is a possibility that TIFF pyramids could be used in the future to replace OMERO pyramids, but we will have to wait until we have a working pyramid implementation before we can think about testing it. Bio-Formats already has JPEG2000 support, so we could certainly make use of that to create JPEG2000-compressed pyramids.
We can certainly discuss HDF5 at the user’s meeting. It might be best suited as an ad-hoc discussion with interested participants on Friday, but I can also look into running a workshop on HDF5-related matters if there is wider interest.
Thanks, Roger
This issue has been mentioned on Image.sc Forum. There might be relevant details there:
https://forum.image.sc/t/next-generation-file-formats-for-bioimaging/31361/1
Current status
With http://github.com/openmicroscopy/bioformats/pull/2747, pyramid data can be stored in a hybrid format that combines the Faas lab pyramid TIFF format (cf. "the fish") with OME-TIFF. The storage of pixel data is consistent with Faas pyramid TIFFs: there is one IFD per plane per pyramid resolution, with planes ordered like so:
IFD 0 = plane 0, largest resolution ... IFD n = plane n, largest resolution ... IFD (r * plane_count + n) = plane n, resolution r
An OME-XML comment is included in the first IFD, consistent with OME-TIFF. This provides any additional metadata, including the
SizeZ
,SizeC
,SizeT
, andDimensionOrder
necessary for determining how IFDs should be mapped to a specificImage
and ZCT index.TiffData
elements in the OME-XML are ignored; IFD orderings other than the natural order implied byDimensionOrder
are not supported. Multiple pyramids and extraImages
outside the pyramid are also not supported.Detection of hybrid pyramid TIFF files is done via the
Software
IFD tag. IfPyramidTiffReader
is missing from the list of valid readers, then the files would be detected as OME-TIFF.This was chosen as a temporary solution for storing pyramid data as it required minimal changes to the existing
PyramidTiffReader
and did not incur the extensive testing burden of modifyingOMETiffReader
. A complete solution for storing pyramid data would require changes to the OME data model, OME-TIFF specification,OMETiffReader
, andOMETiffWriter
. Proposed model and specification changes follow.Proposed changes to support pyramids in OME-TIFF
All IFDs containing a pyramid sub-resolution must set the
NewSubfileType
tag (254) with a value of 1 (if the image count is 1) or 2 (if the image count is greater than 1). See page 36 ofdata_repo/curated/specs/tiff.pdf
).The number of sub-resolutions for each full resolution image must be specified in the OME-XML. The specific use case for which http://github.com/openmicroscopy/bioformats/pull/2747 was intended only requires a single pyramid per file, but in general it is possible to have multiple pyramids stored together (cf. CZI, VSI) which can each potentially contain a different number of sub-resolutions. The sub-resolution count could be specified as either a new attribute or an annotation on
Image
.The specification should explicitly define that the number of
Images
must match the number of pyramids, and not the sum of all resolution counts. This is consistent with the OME-XML generated by Bio-Formats when resolution flattening is disabled.Ordering and mapping of sub-resolution IFDs must be defined. In the long term, a
FirstResolution
attribute could be added toTiffData
, which would allow for explicit mapping such that the IFDs can be stored in any order. In the medium term, annotations similar toTiffData
could be added to Image (similar toModulo
). In the short term, all resolutions could be assumed to be stored in order from largest to smallest, with the plane ordering within a resolution being defined by existingTiffData
elements and assumed to be constant across all resolutions.Possible changes to hybrid pyramid format
If we decide to reject http://github.com/openmicroscopy/bioformats/pull/2747 outright, this section is not relevant. Otherwise, these changes could be implemented easily to address some of the concerns there before any OME-TIFF model/specification changes are complete.
If the type detection of hybrid pyramid TIFF vs. OME-TIFF is too problematic, an option is to move the OME-XML from the comment (
ImageDescription
tag) to some other tag that would not be checked byOMETiffReader
. This should preventOMETiffReader
from picking up pyramid TIFFs in all cases, and would allow changes toreaders.txt
to be reverted.A new extension could be used for pyramid TIFFs which makes it clear that this is a new custom format.
/cc @sbesson, @chris-allan, @rleigh-codelibre, @emilroz, @dgault