The previous implementation needed to convert both main and overlay frames to BGRA texture and then convert back to YUV. This operation is bandwidth heavy.
Add a faster shader when the overlay is in BGRA format which calculates YUV values in the shader. This eliminates the need to convert the main frame and does not require extra copy for the overlay frame, leading to more than 100% performance improvements overlaying 10-bit 1080p HEVC inputs on M1 Max (190fps -> 407fps).
The rgb to yuv formula is currently hard-coded to premultiplied BT.709 matrix.
The previous implementation needed to convert both main and overlay frames to BGRA texture and then convert back to YUV. This operation is bandwidth heavy.
Add a faster shader when the overlay is in BGRA format which calculates YUV values in the shader. This eliminates the need to convert the main frame and does not require extra copy for the overlay frame, leading to more than 100% performance improvements overlaying 10-bit 1080p HEVC inputs on M1 Max (190fps -> 407fps).
The rgb to yuv formula is currently hard-coded to premultiplied BT.709 matrix.
Changes
Issues