Storage of pyramid data in OME-TIFF

melissalinkert commented 7 years ago

Current status

With http://github.com/openmicroscopy/bioformats/pull/2747, pyramid data can be stored in a hybrid format that combines the Faas lab pyramid TIFF format (cf. "the fish") with OME-TIFF. The storage of pixel data is consistent with Faas pyramid TIFFs: there is one IFD per plane per pyramid resolution, with planes ordered like so:

IFD 0 = plane 0, largest resolution ... IFD n = plane n, largest resolution ... IFD (r * plane_count + n) = plane n, resolution r

An OME-XML comment is included in the first IFD, consistent with OME-TIFF. This provides any additional metadata, including the SizeZ, SizeC, SizeT, and DimensionOrder necessary for determining how IFDs should be mapped to a specific Image and ZCT index. TiffData elements in the OME-XML are ignored; IFD orderings other than the natural order implied by DimensionOrder are not supported. Multiple pyramids and extra Images outside the pyramid are also not supported.

Detection of hybrid pyramid TIFF files is done via the Software IFD tag. If PyramidTiffReader is missing from the list of valid readers, then the files would be detected as OME-TIFF.

This was chosen as a temporary solution for storing pyramid data as it required minimal changes to the existing PyramidTiffReader and did not incur the extensive testing burden of modifying OMETiffReader. A complete solution for storing pyramid data would require changes to the OME data model, OME-TIFF specification, OMETiffReader, and OMETiffWriter. Proposed model and specification changes follow.

Proposed changes to support pyramids in OME-TIFF

All IFDs containing a pyramid sub-resolution must set the NewSubfileType tag (254) with a value of 1 (if the image count is 1) or 2 (if the image count is greater than 1). See page 36 of data_repo/curated/specs/tiff.pdf).
The number of sub-resolutions for each full resolution image must be specified in the OME-XML. The specific use case for which http://github.com/openmicroscopy/bioformats/pull/2747 was intended only requires a single pyramid per file, but in general it is possible to have multiple pyramids stored together (cf. CZI, VSI) which can each potentially contain a different number of sub-resolutions. The sub-resolution count could be specified as either a new attribute or an annotation on Image.
The specification should explicitly define that the number of Images must match the number of pyramids, and not the sum of all resolution counts. This is consistent with the OME-XML generated by Bio-Formats when resolution flattening is disabled.
Ordering and mapping of sub-resolution IFDs must be defined. In the long term, a FirstResolution attribute could be added to TiffData, which would allow for explicit mapping such that the IFDs can be stored in any order. In the medium term, annotations similar to TiffData could be added to Image (similar to Modulo). In the short term, all resolutions could be assumed to be stored in order from largest to smallest, with the plane ordering within a resolution being defined by existing TiffData elements and assumed to be constant across all resolutions.

Possible changes to hybrid pyramid format

If we decide to reject http://github.com/openmicroscopy/bioformats/pull/2747 outright, this section is not relevant. Otherwise, these changes could be implemented easily to address some of the concerns there before any OME-TIFF model/specification changes are complete.

If the type detection of hybrid pyramid TIFF vs. OME-TIFF is too problematic, an option is to move the OME-XML from the comment (ImageDescription tag) to some other tag that would not be checked by OMETiffReader. This should prevent OMETiffReader from picking up pyramid TIFFs in all cases, and would allow changes to readers.txt to be reverted.
A new extension could be used for pyramid TIFFs which makes it clear that this is a new custom format.

/cc @sbesson, @chris-allan, @rleigh-codelibre, @emilroz, @dgault

dimin commented 7 years ago

Could I propose another way for your consideration. Using Sub-IFDs to store low-res pyramid levels would require no changes to OME-TIFF schema, keep images fully backward compatible and allow supporting software to access pyramids. In libbioimage I've chosen the Photoshop sub-ifd approach as the default one and so far it has been working great.

melissalinkert commented 7 years ago

Yes, use of the SubIFD tag could be considered (also suggested in https://github.com/openmicroscopy/bioformats/pull/2747#issuecomment-278966356) instead of point (1) in the OME-TIFF section. This still requires a specification change, and points (2), (3) and part of (4) above may still be relevant in either case.

sbesson commented 7 years ago

Coming a bit late to the discussion. Thanks @dimin for your comments and the Sub-IFD suggestion. We completely agree that backwards compatibility should be considered as a requirement for this proposal.

One comment regarding the semantics: we effectively maintain and discuss two related specifications: the OME Data Model and the OME-TIFF format.

The initial idea is certainly to explore extensions of the OME-TIFF format specification without requiring any Data Model change (i.e. no new schema release). In that sense, using annotations and/or implicit conventions for the prototyping should be sufficient while working on enforcing rules for the numbers and order of the sub-resolutions.

While we are still brainstorming/exploring alternatives, a completely different approach would be to make use of the Folder introduced in the 2016-06 as the container for the various pyramidal resolutions. In this case,

each pyramidal image would be defined as a separate Image element in the OME-XML using the current semantics for mapping IFDs with TiffData
one (or multiple) top level Folder elements would group all sub-resolution images belonging to the same pyramid
the major impacts of this approach would be the following:

a. model level: this would require a namespace in a long-term while it could use an annotation in the short-term to distinguish pyramid folders from regular image folders b. API level: the largest impact would be at the subresolution API level as this approach completely breaks the representation of non-flattened resolutions as a folder of images rather than a single image. This would likely have an impact on the entire stack incl. OMERO

mtbc commented 6 years ago

Note also https://www.openmicroscopy.org/community/viewtopic.php?f=15&t=8433 from @dsudar.

dsudar commented 6 years ago

Thanks @mtbc for the alert and pointing me to this existing discussion.

Looks like most of what I wrote was already being considered by @melissalinkert and @sbesson in the above. I'm hoping this is a good time to pick this topic up again. We specifically need something like this for our highly multiplex IF imaging workflow where we cyclic-ly (4-5 channels per cycle) acquire many (>30) channels from tissues or TMAs. Our pre-processing performs the image registration between the "cycles" and ideally would output the single pyramidal OME-TIFF image or a channel-stack of pyramidal single-channel (OME-)TIFFs which can then be imported into OMERO for downstream analysis, visualization with PathViewer or iViewer, etc.

I'm about to start our programmers on implementing my proposed changes to libtiff and hopefully with @rleigh-codelibre 's help into OME Files. Any work on Bio-Formats and possible model changes is of course fully in your wheelhouse.

rleigh-codelibre commented 6 years ago

@dsudar To keep you up to date, I've spent most of last week looking at how to implement SubIFD properly in TiffWriter, so we can create proper TIFF images with sub-resolutions, and I'm writing it up as I go. I'll follow up in a few days once I have something concrete to show and discuss.

dsudar commented 6 years ago

Thanks @rleigh-codelibre . So is it the team's preference to use the SubIFD structure for the sub-resolution images? I'm perfectly fine with that approach even though most existing ad-hoc implementations use a sequence of top-level IFDs so it might be harder to get the wider community on-board. I look forward to your findings.

rleigh-codelibre commented 6 years ago

@dsudar Either approach should work, and there are tradeoffs for both. Using top-level IFDs is better for interoperability with software lacking SubIFD support, and also for being able to reference these IFDs from OME-XML. However, libtiff supports writing out SubIFDs properly as part of its basic functionality, and it's likely difficult to change that behaviour to write out top-level IFDs (though I still need to test this further). If we wish to care about Bio-Formats and OME Files C++ being able to create and read the same data, something which works for both is likely a prerequisite.

Could I clarify a few points from your forum post?

4) Compression is supported on a per-tile basis and multiple compression methods are optionally allowed
- My understanding of TIFF is that while tiles can be compressed, the compression algorithm is specified in the IFD which would preclude changing the compression on a per-tile basis. Is this what was meant or is this just me reading the wording in a way other than intended?
6) In either case the reduced resolution sequence should follow a dyadic reduction in both X and Y until one of the dimensions reaches 256. Other reduction schemes are allowed and can be encoded in a higher level specification such as OME-TIFF or alternatively the reader code can deduce it from the sizes of the reduced resolution images in the sequence.
- I think here the "should" is fine. We certainly need to accommodate data which has non-standard reductions, so I think we should recommend that all readers obtain this by introspecting the SubIFDs rather than assuming the size of the reduction.
7) NewSubFileType and SubFileType.
- I'm unsure if it's worth using SubFileType for compatibility; it's been deprecated for so long. Could we get away with using only NewSubFileType?
8) Each IFD not containing a reduced resolution image can optionally store a thumbnail in its SubIFD 1. Thumbnail must be JPEG or PNG, strip or raster (no tiles), and no larger than 4096x4096; recommended size is 1024 pixels on largest side. In addition, to support digital pathology applications, the first IFD in the file can optionally store a slide label image in its SubIFD 2 with the same specifications as the thumbnail image. And optionally a slide overview image with the same specifications can be stored as SubIFD 3.
- Is there any established precedent for this pattern? Putting the slide label and overview as thumbnails seems like it could significantly restrict the pixel types and image sizes we could store here. Would top-level IFDs not be more appropriate? Could OME-XML annotations to link an image with its label and overview be a workable alternative?
- Edit: Another question here is why store the thumbnail in SubIFD1 rather than on the full-resolution IFD?

Thanks, Roger

dsudar commented 6 years ago

Hi @rleigh-codelibre. Thanks for the detailed comments.

using top-level IFDs vs. SubIFDs for the subresolutions: Indeed, I was going back and forth between the 2 approaches as well. I was hoping that within OME-TIFF and ideally also in libtiff (assuming we can get them on-board) we could standardize on only one of the 2 approaches and only write in that "approach". But indeed, reading both approaches would be best.
compression per tile: Sorry, that was indeed poorly worded. I just wanted to confirm what tiff already supports. Indeed, one compression method per IFD but the compression is done on a per-tile basis. But it would be good to be flexible (within reason) as to which compression methods are supported. Just one thought: there is a argument to be made to store the full-res image in an uncompressed or lossless compressed way so the "real" pixels are available for image analysis that needs that. But for fast visualization it is better if the sub-res versions are more aggressively compressed. Such hybrid compression scheme would be difficult with the "subresolutions in SubIFDs" approach. What do you think?
reduction strategy: I completely agree with your statement.
NewSubFileType and SubFileType: Yes, agreed. I got a little into the weeds there. Will remove.
thumbnails etc.: The only precedent I know of is Kemp Watson's ZIF format (http://zif.photo/) and it seemed to be a nice balance with the approach of storing the subresolutions in toplevel IFDs. If however, we decide to go with SubIFDs for the subresolutions, then indeed, the label image, overview image, etc. should be in toplevel IFDs (and indeed the OME-XML can properly annotate that). And indeed the thumbnail should always just go into the full-res IFD. I'll check with Kemp why he went with the thumbnail in a SubIFD concept.

Cheers, Damir

dsudar commented 6 years ago

On the last item re: thumbnails: what I said is probably not correct. In regular TIFFs, aren't thumbnail images (if present) not always the first SubIFD? However, if we end up using SubIFDs for the subresolutions, then the thumbnail could simply be the last of the SubIFDs (or maybe better, keep it as the first??).

rleigh-codelibre commented 6 years ago

@dsudar If using an EXIF thumbnail then I think you would use and ExifIFD on the main image, which points to a sub-IFD containing the thumbnail and any additional EXIF metadata; this would be completely independent of the sub-IFDs pointed to by SubIFDs. There may be multiple ways of doing this though; we would have to investigate in more detail. However, I think we can do this independently of supporting sub-resolutions.

dsudar commented 6 years ago

@rleigh-codelibre Ah yes, thanks. I had not realized that the thumbnail is typically stuffed into the Exif IFD structure. Yes, indeed, that would make such a "general purpose" thumbnail fully independent from what we are discussing here. So do you think it's appropriate to support the optional "slide label image" and "slide overview image" in separate top-level IFDs that are defined by the OME-XML metadata? And if we do that, what do you think of also providing support for an optional "thumbnail image" the same way? That thumbnail image could be more flexible than the limited Exif thumbnail (e.g. could be larger, could be different image format, could even be an animated GIF, etc.).

dsudar commented 6 years ago

Based on the discussion so far, I have changed my original write-up so it specifies use of the SubIFD structure for the subresolutions and more. The new draft is enclosed here. PyramidTIFF_SubIFD_proposal_20180205.docx

And I just noticed @rleigh-codelibre 's write-up in the Code section. Looks great and addresses pretty much everything in detail. For clarity strategy C looks best to me but strategy B has the hidden benefit that it maintains some compatibility with the "subresolutions in top-level IFDs" approaches.

rleigh-codelibre commented 6 years ago

@dsudar The design proposal for implementing sub-resolutions is now available at http://openmicroscopy.github.io/design/OME005/ (no changes since you looked at it under the code section; it's just been moved and made available via github pages). If you have any additional suggestions or comments, we would be very interested. Following any further public feedback, we should be in a good position to start implementation and testing of sub-resolutions with TIFF.

sbesson commented 6 years ago

Closing this issue given this has moved into a more formal proposal https://github.com/openmicroscopy/design/issues/74#issuecomment-364128424. As mentioned above, looking forward to hearing additional comments and feedback about http://openmicroscopy.github.io/design/OME005.

dsudar commented 6 years ago

Hi all,

I'd like to add a few comments to @rleigh-codelibre 's excellent design doc OME005.

1) As I mentioned above, considering the pros and cons of the various strategies, I would like to register my vote for C. It provides the most clarity and stays closest to the TIFF spec and intent. 2) While I agree that my suggestions to formalize the storage of "utility images" such as the slide label, overview, etc. is a bit orthogonal to the storage of sub-resolution pyramids, but in the domain we all work in, i.e. digital pathology, it is important to have a formal way to handle those and this might be the time to also specify that. If not in OME005, would there be support to open a parallel design proposal to specify those? 3) Similarly, and this may be going too far and again is orthogonal, would it make sense to also provide a formal way to handle multiple scenes (or regions) in a single file? 4) In my write-up that is attached to this discussion I inserted a suggestion in point 4) to formalize the option of storing the full-size image using a different compression (lossless or uncompressed) than the subresolution pyramid in order to give downstream analysis software access to the full data while still providing good visualisation performance when showing the sub-res images which could be compressed. 5) I tried to understand how currently the pyramid is stored in the OMERO internal format. Is that compatible with the OME005 proposal? 6) I like the considerations for handling reductions in Z and other advanced concepts as considered in strategy E. However, I'm wondering whether such advanced concepts are still reasonable within TIFF's limitations. So the thought of starting to look at a container such as HDF5 sounds very reasonable to me. That may also open the door to concepts to store data in arbitrary chunks that can be retrieved much faster than a fairly linear flat container structure such as TIFF can provide. I'm thinking of the considerations that went into Keller's KLB format.

rleigh-codelibre commented 6 years ago

Dear Damir,

Thanks for raising all these interesting points. To respond to them:

Additional design proposals can certainly be considered to add additional features such as slide labels, overviews, etc. However, we will need to take the time to fully implement TIFF pyramids before we can work on additional proposals. I have begun work on the implementation.
Support for multiple scenes would be valuable. David is currently working on adding support for the BigDataViewer XML format, and this includes work on a multi-dimensional model. It may well be the case that the “scene” position can be a higher dimension, and be supported fairly transparently. However, it might require additional model additions to link regions of the overview to individual images to be sufficiently flexible.
This is permitted by the TIFF format, since every IFD specifies the compression algorithm to use, and this can include SubIFDs as a matter of course. The compression type should be selectable via the standard Bio-Formats reader and write API methods to get and set the compression type in use. From this perspective, there are no changes required for this support since it’s already available by default.
The OMERO pyramid format (https://docs.openmicroscopy.org/bio-formats/5.8.0/developers/wsi.html#omero-pyramid) is using JPEG2000. As noted on this page, it has certain limitations which limits its usefulness from a restricted set of pixel types to only supporting a single image series (similar to Strategy A it has an implicit IFD ordering). However, with the import of TIFF files containing pyramids, the OMERO pyramid format will not be used; it is only for large flat files which don’t have pyramids.
I don’t think strategy E is easily implementable with TIFF without the use of TIFF extensions such as libtiff adding support for Z sizes so that it can store volumes rather than planes within each directory. I think for this advanced usage a more flexible container format like HDF5 would be more appropriate for future development work. I’ve recently added HDF5 to the OME-Files C++ build for this reason, and JHDF5 is already useful for Java, so that we can explore writing HDF5 images with these features.

Kind regards, Roger

dsudar commented 6 years ago

Dear Roger, Yes, understood and acknowledged that my point 2 and 3 are separate and mostly orthogonal to the pyramid design issue. Maybe since it especially 2 is so integral to the digital pathology application area, Glencoe might have some interest in tackling that (@chris-allan )? On 4: That's great and glad it's working out that way. So I guess any future developer's doc should just alert the developer of that valuable option. On 5: Yes indeed. It's just a little confusing that there will be 2 fairly different pyramid implementation in an OMERO environment. Or is it likely that your new design will also be used for the internal format eventually? I actually had another reason to ask: the internal pyramid format uses JPEG2000 which is capable of doing built-in lossless compression in addition to a broader range of good lossy compression options. Rather than having to rely on the current TIFF compression options (lossless: LZW and ZIP, and lossy: JPEG) it would be nice to have JP2K compression available as part of the Bio-Formats tool set. And of course, most of the code for that already exists in your hands ..... just a thought. On 6: great to hear your thinking on that. I completely agree. And I'd be happy to discuss further. How about a topic (again) for the User's Meeting 3rd day. Cheers, Damir

rleigh-codelibre commented 6 years ago

Regarding (4), we can certainly extend the documentation to make it clear that this is possible.

For (5) it is a possibility that TIFF pyramids could be used in the future to replace OMERO pyramids, but we will have to wait until we have a working pyramid implementation before we can think about testing it. Bio-Formats already has JPEG2000 support, so we could certainly make use of that to create JPEG2000-compressed pyramids.

We can certainly discuss HDF5 at the user’s meeting. It might be best suited as an ad-hoc discussion with interested participants on Friday, but I can also look into running a workshop on HDF5-related matters if there is wider interest.

Thanks, Roger

imagesc-bot commented 3 years ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/next-generation-file-formats-for-bioimaging/31361/1

ome / design