bfconvert: crop, downsampling and multi-resolution issues

sbesson commented 4 years ago

See https://forum.image.sc/t/cropping-large-pyramidal-tiffs-with-bfconvert-crashes/44857

The issue can be reproduced using a pyramidal fake image and trying to convert and crop it

printf "sixeX=40000\nsizeY=100000\nresolutions=10\n" test.fake.ini
bfconvert -crop 30000,0,10000,10000 -tilex 512 -tiley 512 -noflat test.fake out.tif

With Bio-Formats 6.5.1, this will result in a FormatException when checking for the tile size as soon as a resolution of size smaller than the crop size is reached. One of the underlying issue is that the cropped region is never adjusted at each resolution scale. This is not that a simple scenario which involves computing the downsampling scale as well as handling rounding errors. The current workaround is to first crop the largest resolution "only", then convert it into a pyramidal image by recomputing the resolution levels.

A variation of this issue had been reported in https://forum.image.sc/t/bfconvert-throwing-loci-formats-formatexception-for-a-czi-file/37478 where the number of requested resolutions is larger than the number of resolutions in the image but the downsampling is inconsistent.

As a general rule, it feels like like for multi-resolution images, we should only support one of the two cases:

convert the original image exactly as it is including all pyramidal level
take the largest resolution of each multi-resolution image and perform computations (cropping, downscaling...) from this resolution level The latter is also the approach taken by the Glencoe bioformats2raw pipeline.

Two additional issues specifically related to cropping is the modification of any positional metadata or region of interests as well as the absence of recording of the crop region in the metadata.

For upcoming release, we should probably define a set of recommended or disallowed scenarios, and minimally fail fast with an informative message for some combination of options and image metadata.

imagesc-bot commented 4 years ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/cropping-large-pyramidal-tiffs-with-bfconvert-crashes/44857/2

rleigh-codelibre commented 3 years ago

This is a situation where I think the best solution is at the model level, as detailed below.

You don't want to change any of the raw tile data. Particularly when lossy compression is used. In order to correctly preserve the data, you could treat the crop region as a bounding box, and select all tiles which intersect the region for retention. All non-intersecting tiles can be discarded. This will handle overlapping tiles at all resolution levels. In order to correctly set the crop region for rendering and the image size, you will need a separate bit of metadata in Pixels (or attached to Pixels via a reference) which describes the bounding box, for example an x+y offset and an x+y size. It would also generalise to z and t, and potentially c as well.

With this metadata, you would then have two sizes: the render/viewport size and the underlying data size. Tools will need to select the region to work with. bfconvert would be able to use this when cropping. It will also permit repeated cropping and each time will downscale the region and drop any tiles no longer intersecting.

One complication to consider is that you will need to also store the original size (best) / scale factor (lossy) of each resolution level, so that the bounding region can be correctly computed at each level. We currently infer this from e.g. TIFF SUBIFDs and this will no longer be sufficient: the subresolution sizes will depend upon the tile size at each level (which can vary), so it will be impossible to accurately infer anything. We previously discussed adding full modelling of subresolutions into the data model; you'll need it for this to work properly. Might be worth considering having multiple Pixels elements or SubResolutionPixels within Pixels which will inherit all of the settings but have a different resolution and tile size. Might also be worth modelling the tile size/strip size as well.

Note: this would also be useful for bounding deconvolution artefacts; though you might want a different name to describe that. Perhaps a general annotation which can be used for multiple purposes would fit these needs?

One other consideration. You might want to think about whether the bounding region should be in pixels at the full resolution, or in physical units.

Another consideration. You might also want to think about having multiple bounding regions. For example, this would be useful for modelling WSI overview scans with regions linking to detailed scans.

imagesc-bot commented 2 years ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/inconsistent-formatexception-errors-when-cropping-large-pyramidal-ome-tiff/73176/3

ome / bioformats

bfconvert: crop, downsampling and multi-resolution issues #3630