Could not decode image: 0: Unsupported feature: Unsupported color conversion

xiongqiangcs commented 3 years ago

libheif v1.11.0

command

heif-convert test.heic test.png

result

File contains 1 images
Could not decode image: 0: Unsupported feature: Unsupported color conversion

test.heic.zip

chris0e3 commented 3 years ago

Here’s my analysis:

The file contains a 320 x 693 YUV 4:2:0 image and a 320 x 693 Gray/Alpha image. [YUV 4:2:0 has U & V (Cb & Cr) planes at half the image size (vertically & horizontally). These get doubled in size producing a 320 x 694 image.] This gets converted to a 320 x 694 RGB image + 320 x 693 Alpha plane. This size mismatch between the RGB planes & the Alpha plane causes heif::convert_colorspace to fail.

convert_libde265_image_to_heif_image calls de265_get_image_height which returns 694 for the YUV image. Fixing it here would require delving into libde265.

HeifContext::decode_image_planar calls HeifPixelImage::transfer_plane_from_image_as to merge the RGB & Alpha images. I was able to workaround the problem by effectively modifying transfer_plane_from_image_as to adjust the target image size (and extant plane sizes) to match the added plane (when its size is off by 1).

Are YUV 4:2:0 images with odd widths or heights allowed by the standard?

silverbacknet commented 3 years ago

My parser shows the base image as 320x694 with a crop box of 320x693, which is definitely legal. An actual 320x693 4:2:0 image certainly wouldn't be. How an implementation does cropping isn't specified, but since libheif already always converts to RGB for any colorspace conversion the cropping should be applied after conversion and before alpha merging. (It already does for composites, this must've just been an oversight.)

baumanj commented 3 years ago

Are YUV 4:2:0 images with odd widths or heights allowed by the standard?

Generally yes, I think. Though I admit I'm always finding new things sprinkled across the hundreds of pages of BMFF, HEIF, MIAF, the amendments and other standards they depend on.

My parser shows the base image as 320x694 with a crop box of 320x693, which is definitely legal

As an example of something I just found the other day, there's this bit from MIAF (ISO/IEC 23000-22:2019) § 7.3.6.7 Clean aperture, rotation and mirror:

The clean aperture property is restricted according to the chroma sampling format of the input image (4:4:4, 4:2:2, 4:2:0, or 4:0:0); the cropping shall select an integer number of samples for all planes. In effect, this means that: — when the image is 4:0:0 (monochrome) or 4:4:4, there is no restriction; — when the image is 4:2:2, the horizontal cropped offset and width shall be even numbers; — when the image is 4:2:0, both the horizontal and vertical cropped offsets, and heights and widths, shall be even numbers.

So if I'm reading that right, I think a 4:2:0 image with an odd crop height (693) would violate this requirement, no?

silverbacknet commented 3 years ago

Good to know, it's legal in HEVC but I don't have access to MIAF so I didn't realize that. Probably not unexpected that some implementations are creating such malformed images.

baumanj commented 3 years ago

Yeah, the fact that the MIAF spec is not freely available and contains a lot of important nuggets like that is a real source of confusion/frustration

farindk commented 3 years ago

h265 only allows even image sizes and the problems with cropping and rotating chroma-subsampled images was the reason that libheif was switched to internal RGB processing. Now MIAF again forbids odd image sizes, which makes sense from a technical standpoint, but is completely unexpected and impractical for users who just want to save their images. On the other hand, MIAF allows for rotation which again introduces technical problems.

This is all unfortunate and does not increase my trust in the standardization process :-|

I guess, libheif should simply behave in a way that is most reasonable. Maybe it should remove the miaf brand when saving odd-sized images. Also not sure how this MIAF requirement should play together with AVIF, which allows odd-sized images right in the codec.

chris0e3 commented 3 years ago

@silverbacknet

My parser shows the base image as 320x694 with a crop box of 320x693, which is definitely legal.

OK, but dumping the raw boxes says that it is 320 x 693 and that it has no 'clap' item.

@baumanj

So if I'm reading that right, I think a 4:2:0 image with an odd crop height (693) would violate this requirement, no?

After reading it 3 times … I agree with your interpretation of the text.

However, this can’t be right:

the cropping shall select an integer number of samples for all planes.

as clap specifies clean aperture width & height as fractions (<int32-num>/<int32-den>). Also, of the random set of docs I have found, ISO 23008-12 states that there is ‘at most one’ 'clap' item. So it presumably also applies to any thumbnail image. It would be rather restrictive if the 'clap' was required to make both have even widths/heights. Also, changing the image encoding could require changing the 'clap' bounds, as could rotating an image. Not very intuitive.

Also AFAICS, ISO 23008-12 contains no restrictions for 'ispe' (image spatial extents) values.

farindk commented 3 years ago

ispe if the image size before transformation are applied. See 23008-12:

The ImageSpatialExtentsProperty documents the width and height of the associated image
item. Every image item shall be associated with one property of this type, prior to the association of all
transformative properties.

ispe is also descriptive only and does not modify the image. It does not actively crop. Hence, it provides no work-around for odd-sizes images.

farindk commented 3 years ago

@chris0e3 Ok, I finally had a look at the file (sorry for the delay). The file appears to be invalid to me, but the libheif error message is misleading.

The base image has size 320x694 (hevc only allows for even sizes). There is no crop transform. Only an ispe that says that the image is 320x693, which is incorrect. ispe is only descriptive and not actively cropping the image.

The alpha image is indeed 320x693, which is possible because it is a greyscale image without color planes.

The file does not claim to be miaf compatible, so the discussion above about MIAF does not apply. However, it is still invalid as it should at least include a crop transform to 320x693.

Do you know which software was used to write the image?

baumanj commented 3 years ago

However, this can’t be right:

the cropping shall select an integer number of samples for all planes.

as clap specifies clean aperture width & height as fractions (<int32-num>/<int32-den>).

The way clap is specified is definitely a bit unintuitive if one is expecting it to be used as a general cropping operation. But as I understand it, the original purpose was more about trimming the edges of a video stream to do compositing, and therefore it's defined in terms of an offset relative to the image center, which for an even-numbered image dimension will be at a half-pixel value, and require a half-pixel offset to resolve to an integer number of pixels in the derived (that is, post-clap) image. The situation is similar if the base image has an odd-numbered dimension and the corresponding clap dimension is even.

I scratched my head at this for a long time, and eventually visual examples helped me to grok it. Consider a 4x4 image, where you want the result of clap to be just the pixel at (1, 1):

basic-2

To get that, one way the clap box could be specified (simplifying the fooN, fooD notation to foo: fooN / fooD):

    CleanApertureBox {
        cleanApertureWidth: 1/1,
        cleanApertureHeight: 1/1,
        horizOff: -1/2,
        vertOff: -1/2,
    }

(It could many other ways since there are many fractions which reduce to -1/2, but I find this simplest)

Applying the first formula from ISO/IEC 14496-12:2020 § 12.1.4.1, we calculate the new picture center (I'll just do X since Y is similar):

pcX = horizOff + (width  - 1)/2
    = ( -1/2 ) + ( 4     - 1)/2
    = ( -1/2 ) + (    3     )/2
    =          1

Which makes sense: the derived image is centered at (1, 1) of the original image. Then applying the second formula to give us leftmost/rightmost pixel and the topmost/bottommost line of the clean aperture:

pcX ± (cleanApertureWidth - 1)/2
 1  ± (        1          - 1)/2
 1  ± (           0          )/2
               1

Which is kind of boring since the leftmost and rightmost pixels are the same (1), but do the calculation with a cleanApertureWidth > 1 and it makes a bit more sense.

I hope this explains why fractions are necessary in the clap specification, but result in integer values when applied.

strukturag / libheif

Could not decode image: 0: Unsupported feature: Unsupported color conversion #481