strukturag / libheif

libheif is an HEIF and AVIF file format decoder and encoder.
Other
1.64k stars 294 forks source link

Chroma subsampling does not lowpass filter (take average) before discarding samples #521

Open wantehchang opened 3 years ago

wantehchang commented 3 years ago

The RGB-to-YCbCr conversion code in libheif/heif_colorconversion.cc performs chroma subsampling by discarding samples only, without first lowpass filtering (taking a weighted average of) the samples. A representative example is the second nested for loop in Op_RGB24_32_to_YCbCr::convert_colorspace():

std::shared_ptr<HeifPixelImage>
Op_RGB24_32_to_YCbCr::convert_colorspace(const std::shared_ptr<const HeifPixelImage>& input,
                                         ColorState target_state,
                                         ColorConversionOptions options)
{
  ...
  for (int y = 0; y < height; y += chromaSubV) {
    const uint8_t* p = &in_p[y * in_stride];

    for (int x = 0; x < width; x += chromaSubH) {
      uint8_t r = p[0];
      uint8_t g = p[1];
      uint8_t b = p[2];
      p += bytes_per_pixel * chromaSubH;

      float cb = r * coeffs.c[1][0] + g * coeffs.c[1][1] + b * coeffs.c[1][2];
      float cr = r * coeffs.c[2][0] + g * coeffs.c[2][1] + b * coeffs.c[2][2];

      if (full_range_flag) {
        out_cb[(y / chromaSubV) * out_cb_stride + (x / chromaSubH)] = clip_f_u8(cb + 128);
        out_cr[(y / chromaSubV) * out_cr_stride + (x / chromaSubH)] = clip_f_u8(cr + 128);
      }
      else {
        out_cb[(y / chromaSubV) * out_cb_stride + (x / chromaSubH)] = (uint8_t) clip_f_u8(cb * 0.875f + 128.0f);
        out_cr[(y / chromaSubV) * out_cr_stride + (x / chromaSubH)] = (uint8_t) clip_f_u8(cr * 0.875f + 128.0f);
      }
    }
  }
  ...
}

Consider 4:2:0. This nested for loop subsamples each 2x2 block of chroma samples by taking the top-left sample and discarding the rest. This has two problems:

  1. The lack of lowpass filtering introduces distortion due to aliasing.
  2. The subsampled chroma is positioned at the top-left corner of the 2x2 block. But most YCbCr-to-RGB conversion functions in image decoders assume the chroma sample is positioned at the center of the 2x2 block. So the chroma samples are misaligned.

I will upload a pull request as a proof of concept for a fix.

farindk commented 3 years ago

I understand your point and know that the color-conversion functions are ad-hoc solutions that still need to be improved.

In the case above (RGB -> YCbCr) the conversion takes place when encoding the image. According to the AV1 spec (https://aomediacodec.github.io/av1-spec/#color-config-semantics), the chroma samples in AV1 can either be positioned co-located with the top-left sample of the 2x2 area (which is approx. what is done here without the low-pass filtering), or it can be centered vertically between the two left pixels.

What we should do:

When encoding, the chroma sample position should be signaled to the AV1 encoder.

When decoding, we should read the chroma sample position from the AV1 bitstream and select the appropriate filter.

farindk commented 3 years ago

BTW: this is for 4:2:0 only. For 4:2:2 I did not find any information about chroma sample position in the AV1 spec. My guess is that it is colocated with the left sample as this is consistent with the two modes in 4:2:0. (Any references would be appreciated.)

farindk commented 3 years ago

As HEVC uses vertically centered for 4:2:0 and colocated on left sample for 4:2:2, let's use the same when encoding AV1.

wantehchang commented 3 years ago

Hi Dirk,

The CSP_VERTICAL and CSP_COLOCATED chroma sample positions for 4:2:0 in the AV1 spec are commonly used for videos. The chroma sample position commonly used for 4:2:0 still images is at the center of a 2x2 block. The YCbCr-to-RGB conversion functions in both libavif and Chrome only support the center chroma sample position for 4:2:0. Since the AV1 spec does not have a CSP_CENTER chroma sample position for 4:2:0, the libavif encoder uses CSP_UNKNOWN as a substitute. We have proposed to the AOM Codec Working Group to assign the reserved value 3 to CSP_CENTER. See https://github.com/AOMediaCodec/av1-avif/issues/88 https://github.com/AOMediaCodec/av1-spec/pull/308

In practice, the center chroma sample position is the most important to AVIF still images. Please support the center chroma sample position and signal CSP_UNKNOWN in the bitstream.

For 4:2:2, the AV1 bitstream does not signal the chroma sample position and the chroma sample position is not documented in the AV1 spec. I guess the implicitly assumed chroma sample position for 4:2:2 for videos is CSP_COLOCATED. It's not clear what the chroma sample position for 4:2:2 still images should be.

wantehchang commented 3 years ago

Note that the YCbCr-to-RGB conversion functions in libheif/heif_colorconversion.cc also assume the center chroma sample position. For example, in Op_YCbCr420_to_RGB24::convert_colorspace((), we have:

  int x, y;
  for (y = 0; y < height; y++) {
    for (x = 0; x < width; x++) {
      int yv = (in_y[y * in_y_stride + x]);
      int cb = (in_cb[y / 2 * in_cb_stride + x / 2] - 128);
      int cr = (in_cr[y / 2 * in_cr_stride + x / 2] - 128);

      out_p[y * out_p_stride + 3 * x + 0] = clip_int_u8(yv + ((r_cr * cr + 128) >> 8));
      out_p[y * out_p_stride + 3 * x + 1] = clip_int_u8(yv + ((g_cb * cb + g_cr * cr + 128) >> 8));
      out_p[y * out_p_stride + 3 * x + 2] = clip_int_u8(yv + ((b_cb * cb + 128) >> 8));
    }
  }

This further supports my suggestion that the center chroma sample position is the most important to implement for AVIF because the assumption of the center position is widespread, even though it is not specified in the AV1 spec.

farindk commented 3 years ago

Thank you, @wantehchang for the background information on AV1.

Ok, this is fine with me. If AV1 supports (is going to support) CSP_CENTER, I can make this the default. This also has the advantage that it fits the JPEG chroma positions.

I've looked up in the JPEG (JFIF) standard how chroma subsampling is carried out. See the excerpt below. This says that JPEG uses the horizontal center position between the two pixels in the 4:2:2 case.

How could we handle 4:2:2 in AVIF? Since chroma position is not signaled in AV1 for 4:2:2, could we simply define this to be horizontally centered (a la CSP_CENTER), or would that conflict with the typical video interpretation (CSP_COLOCATED)?

Screenshot from 2021-06-15 11-04-57

wantehchang commented 3 years ago

Hi Dirk,

Thank you for looking up the 4:2:2 chroma sample position in JPEG. Leo Barnes told me the same thing.

How we should handle 4:2:2 in AVIF is a difficult question. I would suggest we take the first option: simply define this to be horizontally centered (a la CSP_CENTER). In my experience, many people will intuitively take an average if we ask them to subsample. I believe this is another reason why the center chroma sample position is commonly used, in addition to it being used in JPEG.

Note: libavif's avifImageRGBToYUV() function uses the center chroma sample position for both 4:2:0 and 4:2:2 (I omitted the 4:2:0 case for brevity):

            // Populate any subsampled channels with averages from the 2x2 block
            if (image->yuvFormat == AVIF_PIXEL_FORMAT_YUV420) {
                // YUV420, average 4 samples (2x2)

                ...
            } else if (image->yuvFormat == AVIF_PIXEL_FORMAT_YUV422) {
                // YUV422, average 2 samples (1x2), twice

                for (int bJ = 0; bJ < blockH; ++bJ) {
                    float sumU = 0.0f;
                    float sumV = 0.0f;
                    for (int bI = 0; bI < blockW; ++bI) {
                        sumU += yuvBlock[bI][bJ].u;
                        sumV += yuvBlock[bI][bJ].v;
                    }
                    float totalSamples = (float)blockW;
                    float avgU = sumU / totalSamples;
                    float avgV = sumV / totalSamples;

                    const int chromaShiftX = 1;
                    int uvI = outerI >> chromaShiftX;
                    int uvJ = outerJ + bJ;
                    if (state.yuvChannelBytes > 1) {
                        uint16_t * pU = (uint16_t *)&yuvPlanes[AVIF_CHAN_U][(uvI * 2) + (uvJ * yuvRowBytes[AVIF_CHAN_U])];
                        *pU = (uint16_t)avifReformatStateUVToUNorm(&state, avgU);
                        uint16_t * pV = (uint16_t *)&yuvPlanes[AVIF_CHAN_V][(uvI * 2) + (uvJ * yuvRowBytes[AVIF_CHAN_V])];
                        *pV = (uint16_t)avifReformatStateUVToUNorm(&state, avgV);
                    } else {
                        yuvPlanes[AVIF_CHAN_U][uvI + (uvJ * yuvRowBytes[AVIF_CHAN_U])] =
                            (uint8_t)avifReformatStateUVToUNorm(&state, avgU);
                        yuvPlanes[AVIF_CHAN_V][uvI + (uvJ * yuvRowBytes[AVIF_CHAN_V])] =
                            (uint8_t)avifReformatStateUVToUNorm(&state, avgV);
                    }
                }
            }
kmilos commented 3 years ago

How could we handle 4:2:2 in AVIF? Since chroma position is not signaled in AV1 for 4:2:2, could we simply define this to be horizontally centered (a la CSP_CENTER), or would that conflict with the typical video interpretation (CSP_COLOCATED)?

How we should handle 4:2:2 in AVIF is a difficult question. I would suggest we take the first option: simply define this to be horizontally centered (a la CSP_CENTER).

AFAICT, this goes against most current implementations (focusing on video content though), where 4:2:2 is assumed to be co-located (and seems to be the only option and thus also not signalled/ignored in H.265).

Otherwise (chroma_format_idc is not equal to 1), the values of the syntax elements chroma_sample_loc_type_top_field and chroma_sample_loc_type_bottom_field shall be ignored. When chroma_format_idc is equal to 2 (4:2:2 chroma format) or 3 (4:4:4 chroma format), the location of chroma samples is specified in clause 6.2.

AV1 and AVIF are of course not bound by this (nor JFIF), but would just feel a bit "weird" in this case, that's all...

Also from the JFIF spec:

The 4:2:2 form of sub-sampling is used primarily for video-related applications (especially with the use of interlaced-scan systems). For JFIF usage, the use of sub-sampling formats other than 4:2:0 is therefore discouraged, as such other formats may not be supported in some applications.

silverbacknet commented 3 years ago

Although the H.265 spec says to ignore it because it INSISTS you use MPEG2 chroma for 4:2:2 and not JPEG, if the plan is to support JPEG anyway for ease of use then signaling it and treating the info as valid anyway is the lesser evil than just assuming. Not sure whether continuing to signal type 1 (centered) or using type 3 (top row centered) is better, since they're kind of equivalent at 4:2:2.

For AV1, I don't know. Stick it in Exif? Use location 3 despite it being reserved? Very annoying they didn't even consider that use case.

farindk commented 3 years ago

Thanks @kmilos for pointing out the note about 4:2:2 in the JFIF spec. As there is no consistent way to specify the sample location with the 4:2:2 video codecs, I decided to use colocated chroma samples for 4:2:2. At least for the moment.

Since 4:2:2 is probably never used for images in practice, JPEG compatibility is not a big issue, and this choice is likely irrelevant anyways.