watery01 / libyuv

Automatically exported from code.google.com/p/libyuv
0 stars 0 forks source link

BT.709 support #159

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The I420 format that libyuv supports is BT.601 as used by most video codecs.
The main characteristic is the Y is a compressed range of 16 to 235.

Consider adding BT.709, which in practice is used my JPEG.
The main characteristic is the Y is full range of 0 to 255.

This especially affects conversion to/from RGB, but could also affect YUV to 
YUV for converting BT.709 from JPG to BT.601 for video.

Original issue reported on code.google.com by fbarch...@google.com on 19 Nov 2012 at 11:13

GoogleCodeExporter commented 9 years ago
ffmpeg supports the following BT.709 color spaces:
yuvj420p
yuvj422p
yuvj444p
yuvj440p

whereas the rest are BT.601:
yuv420p
yuv422p
yuv444p
yuv410p
yuv411p
yuv440p

Original comment by fbarch...@google.com on 19 Nov 2012 at 11:38

GoogleCodeExporter commented 9 years ago
/usr/include/linux/videodev.h has a v4l2_colorspace enum. You can get the 
colorspace value a v4l2 device is using as part of the info returned from the 
VIDIOC_G_FMT ioctl.

http://www.linuxtv.org/downloads/legacy/video4linux/API/V4L2_API/spec-single/v4l
2.html#colorspaces

Original comment by fbarch...@google.com on 20 Nov 2012 at 5:29

GoogleCodeExporter commented 9 years ago
libjpeg does not use BT.709, it uses BT.601, but it does indeed use full range 
values. Y Cb and Cr all use the full range of 0-255, with 128 being "0" for Cb 
and Cr.

Original comment by frea...@gmail.com on 21 Nov 2012 at 9:10

GoogleCodeExporter commented 9 years ago
Any suggestions on API?

Original comment by fbarch...@google.com on 12 Jan 2013 at 9:22

GoogleCodeExporter commented 9 years ago
Closing for now, due to lack of response.  If you'd still like this feature, 
suggest a change, api and/or patch.

Original comment by fbarch...@chromium.org on 19 Mar 2013 at 6:56

GoogleCodeExporter commented 9 years ago
I'd like to convert rgb to yuvj420p. Can I do this with libyuv now?

Original comment by Peter.Ko...@gmail.com on 20 Mar 2013 at 4:53

GoogleCodeExporter commented 9 years ago
ARGBToI420 converts to yuv420p.
The I420 format that libyuv converts to/from BT.601, which compresses the Y 
range 16..235.

According this this page
http://en.wikipedia.org/wiki/YCbCr
The coefficients for yuvj420p are the same, and UV is identical, but in BT.601 
yuv420p, Y has 16 added, and in jpeg, it doesn't.
Y is supposed to be clamped to 235, for 'headroom', but libyuv doesn't.

A work around for JPEG would be use ARGBToI420, and then subtract 16 from Y.
A proper solution would be a new function ARGBToJ420.
Or a compromise would be ARGBToYJ for just the Y channel in JPeg full range.

Bt 709 is different coefficients for both Y and UV.  The code is the same, but 
different matrix.

If you look at the code, it starts with
void ARGBToYRow_SSSE3(const uint8* src_argb, uint8* dst_y, int pix) {
  __asm {
    mov        eax, [esp + 4]   /* src_argb */
    mov        edx, [esp + 8]   /* dst_y */
    mov        ecx, [esp + 12]  /* pix */
    movdqa     xmm5, kAddY16
    movdqa     xmm4, kARGBToY
...

This gives me a thought.  For 2 recent functions with similar small 
differences, I passed the SIMD constants.

Advantages
1. One function can handle any color matrix.
2. One function can handle reordered coefficients for different BGRA, RGBA, 
ARGB and ABGR channel orders.
3. Less code allows more variations to be done for different CPUs, alignment, 
number of pixels.

Disadvantages
1. More calling overhead/slower.
2. The matrix is tuned for SSSE3, and not ideal for C code, NEON or AVX2.
3. Implementation time.

As the code for yuvj420p is well known, I'll start with this.

Original comment by fbarch...@chromium.org on 20 Mar 2013 at 10:53

GoogleCodeExporter commented 9 years ago
Do you mean as a workaround I should do something like this:
void RGB2YCbCr420_Libyuv(const BYTE* iSrcBuffer, UINT32 iSrcWidth, UINT32 
iSrcHeight, const Rect& iSrcRect, VP8::vpx_image* iDestImage, const Rect& 
iDestRect)
{
    int srcStride = iSrcWidth * 3;
    int src_x = iSrcRect.left;
    int src_y = iSrcRect.top;

    int dest_x = iDestRect.left;
    int yuv_top = iDestRect.top;
    libyuv::RAWToI420((BYTE*)iSrcBuffer + src_y * srcStride + src_x * 3, srcStride,
        iDestImage->planes[0] + yuv_top * iDestImage->stride[0] + dest_x, iDestImage->stride[0],
        iDestImage->planes[1] + (yuv_top>>1)* iDestImage->stride[1] + (dest_x>>1), iDestImage->stride[1],
        iDestImage->planes[2] + (yuv_top>>1)* iDestImage->stride[2] + (dest_x>>1), iDestImage->stride[2],
        iDestRect.Width(), iDestRect.Height());

    // Workaround
    for (int y = 0; y < iDestRect.Height(); ++y)
    {
        for (int x = 0; x < iDestRect.Width(); ++x)
        {
            BYTE* y_ptr = iDestImage->planes[0] + (yuv_top + y) * iDestImage->stride[0] + x;
            if (*y_ptr < 16)
                *y_ptr = 0;
            else
                *y_ptr -= 16;
        }
    }
}

But It doesn't work :(

Original comment by Peter.Ko...@gmail.com on 24 Mar 2013 at 7:38

GoogleCodeExporter commented 9 years ago
For the work around you can skip the if, but yes.  That doesn't work?
Are you sure the source you have is 'RAW'?  Thats R, G, B in memory - 3 bytes 
with the first byte 'R'.   Its not very common.  On Windows RGB24 is more 
common, which is 3 bytes starting with 'B' in memory - little endian.

I have started the function.  This is the work in progress:
https://webrtc-codereview.appspot.com/1243004
That would do the rows of the Y plane.

Original comment by fbarch...@chromium.org on 24 Mar 2013 at 8:04

GoogleCodeExporter commented 9 years ago
r622 adds full range BT.601 as ARGBToJ420 and ARGBToJ400.
Same as ARGBToI420 and ARGBToI400 but without the +16 on Y, so slightly faster.

Original comment by fbarch...@chromium.org on 26 Mar 2013 at 9:21

GoogleCodeExporter commented 9 years ago
Win32
ARGBToJ420_Unaligned (389 ms)
ARGBToJ420_Any (387 ms)
ARGBToJ420_Invert (351 ms)
ARGBToJ420_Opt (344 ms)
ARGBToJ400_Random (271 ms)
ARGBToJ400_Unaligned (263 ms)
ARGBToJ400_Invert (243 ms)
ARGBToJ400_Opt (241 ms)
ARGBToJ400_Any (239 ms)

Linux
ARGBToJ420_Any (374 ms)
ARGBToJ420_Unaligned (358 ms)
ARGBToJ420_Invert (318 ms)
ARGBToJ420_Opt (316 ms)
ARGBToJ400_Unaligned (248 ms)
ARGBToJ400_Opt (225 ms)
ARGBToJ400_Any (225 ms)
ARGBToJ400_Invert (223 ms)

Original comment by fbarch...@chromium.org on 26 Mar 2013 at 9:36

GoogleCodeExporter commented 9 years ago
The Y channel of full range BT.601 needs new coefficients, roughly scaled up by 
255/219.
Does Chroma also need a wider range?

Original comment by fbarch...@google.com on 27 Mar 2013 at 1:51

GoogleCodeExporter commented 9 years ago
r624 reimplements full range Y channel of BT.601 ARGBToJ420 and ARGBToJ400.
7 bit coefficients are used for C, so it will match the SSSE3 and Neon.
SSSE3 uses 7 bit due to pmaddubsw being signed bytes.
Neon uses 7 bit due to shift by 7 being maximum for vqrshrun.s16
Rounding is free on Neon.  It takes an additional add for C and SSSE3 on full 
range yuv, but on mpeg style there was an add 16 and rounding is free.

There are several problems:
UV needs to be full range as well.  The 112 constant needs to be 128 and the 
other 2 add up to 128.
The coefficients add up to 127.  It should be 128; For I420 it should be 256.
I420 should round and use 7 bit consistently.
The code is duplicated for BGRA etc, cpus and full range.  It would be better 
to use a single function with matrix passed in.

BT.709 comes in both full range and mpeg constrained range, and tunes the 
matrix for LCD's commonly used for HD (720p) today.
There is also REC.2020 proposed for uhdtv.
Code wise these would be the same, with a different matrix.
Conversion from ARGB to I420 and back to ARGB are mainly affected.  And other 
subsampling (411/422/444).

How to expose this at a high level remains unclear.  But with 'jpeg', a fourcc 
was contrived.  The low level is more clear - a matrix parameter.
Next small step fixes rounding and makes all CPUs equal.
Next big step refactors all functions to use matrix parameter.

Original comment by fbarch...@google.com on 28 Mar 2013 at 9:23

GoogleCodeExporter commented 9 years ago
Full range BT.601 (J420) not ready for use yet.
The Y channel is nearly right, but is scaled to '253' instead of 255.  Will fix.
The UV channels are currently scaled to +-112 and should be +-127.

Original comment by fbarch...@google.com on 29 Mar 2013 at 5:24

GoogleCodeExporter commented 9 years ago
r628 will improve Y channel jpeg colorspace.
The BT.601-1 full range is same on Y, but different than JPeg on UV for 
normalization.
The J400/J420, the goal should be yuvj420 - jpeg colorspace, as defined by jpeg.
I've renormalized the Y channel using 7 bit coefficients that sum to 128.  See 
row_common for formulas used.
static __inline int RGBToYJ(uint8 r, uint8 g, uint8 b) {
  return (38 * r + 75 * g +  15 * b + 64) >> 7;
}
And reimplements ARGBGray to use full range.

Original comment by fbarch...@chromium.org on 31 Mar 2013 at 6:06

GoogleCodeExporter commented 9 years ago
r629 implements ARGBToJ420 for SSSE3/Neon with corrected coefficients and 
better rounding.  The matrix should match jccolor.c for jpeg.
The C code matches SSSE3 for subsampling on UV, but Neon is done differently.

Original comment by fbarch...@google.com on 1 Apr 2013 at 8:33

GoogleCodeExporter commented 9 years ago
Peter, yuvj420p should work now and be fully optimized using ARGBToJ420

If there are no further requests for 709 or yuvj420 I'll close this as fixed.
If there is demand for 709, I would start with a similar implementation.

Original comment by fbarch...@google.com on 2 Apr 2013 at 10:52

GoogleCodeExporter commented 9 years ago
Full range BT.601 for JPeg added.  Closing as fixed.
Will do additional color spaces in future if there is demand.

Original comment by fbarch...@chromium.org on 5 Apr 2013 at 4:29