xiph / rav1e

The fastest and safest AV1 encoder.
BSD 2-Clause "Simplified" License
3.65k stars 249 forks source link

Support for Bayer pattern videos #1274

Open newpavlov opened 5 years ago

newpavlov commented 5 years ago

It's a bit strange question/request, but I would like to know if it's possible in theory to use AV1 for compressing raw Bayer videos (e.g. with RGGB or RGBW patterns). Most color cameras use Bayer pattern and usually before performing compression frames are demosaiced, which results in addition of redundant information to them. It usually compensated by chroma subsampling, but I wonder if AV1 can work directly with raw data, it could result in a better compression ratios, especially if you target lossless or near-lossless compression.

I've seen papers which introduce similar compression based on H.264, if you interested I could reference them here.

Feel free to close this issue if you think it's a wrong place to discuss this question!

tdaede commented 5 years ago

Short answer: AV1 doesn't have any native support for Bayer pattern pixel formats, and you will probably get better compression converting to YUV 4:2:0 anyway (or 4:4:4 at very high bitrates).

Long answer: Using YUV instead of RGB is not only useful for subsamping, but also for removing redundant information across planes - if you compare the compression of 4:4:4 YUV and RGB, the YUV will be superior. AV1 has CfL which negates that some but not completely.

Assuming you take the hit of using RGB, you now need to format your Bayer pattern into something suitable for compression. For green, you could shift every other row left one pixel and then subsample by 2 - but some of your pixels are now a half sample off and it will compress poorly. You could resample by a half pixel every other line, though that's going to blur every other line. You could also split even/odd lines into top and bottom halves of a frame, or separate frames, though now you've also removed a lot of context and made it compress worse. You could also just double-up pixels for green, but now you've at 6 coefficients per Bayer block, the same as 4:2:0.

It's possible one of these techniques could pay off, especially if there is some other use case (like running a non-realtime demosaicer after recording), but it would require some experimentation to see.

If you have papers handy I can look at them.

newpavlov commented 5 years ago

I am interested in lossless compression of RGGB videos, so YUV is probably a non-starter here.

My experiments in compressing frames separately with FLIF showed that good compression rate is achieved by interpreting RGGB image as an RGBA one, where alpha channel is computed as G1 - G2 + 0x80 and green channel is set to G1. After that FLIF algorithm applies YCoCg transformation on pixels. Can AV1 process videos with alpha channel?

Some relevant papers: https://ieeexplore.ieee.org/abstract/document/4455567 https://www.sciencedirect.com/science/article/pii/S1047320314001266

tdaede commented 5 years ago

Yeah, lossless vs near-lossless are totally different in their requirements.

AV1 doesn't have a native alpha channel. There is container level support to have one, but it's basically a separate mono frame.

The most obvious thing that would be similar would just be to duplicate pixels into a 4:4:4 RGB, and then do a YCoCg transformation to 4:4:4. Then you can drop some of the redundant equations, and you'll get a transform to 4:2:0.

Having duplicate pixel values in the output isn't the worst thing because AV1 uses the hadamard transform for lossless, which compresses those quite efficiently.

Are you interested in inter frame compression or intra only?

tdaede commented 5 years ago

https://ieeexplore.ieee.org/abstract/document/4455567

This paper only applies to lossy, not lossless, and has a couple of flaws that make its results not really usable even for lossy:

  1. It uses PSNR averaged over RGB, instead of actual color metrics like CIEDE2000.
  2. The bitrates are insane, e.g. they measure up to 30Mbps... for QCIF video (176x144@30fps). Their method only becomes better even for PSNR-RGB at 15Mbps+.
  3. They don't test 4:4:4 full-range YUV or YCgCo, only 4:2:2. Unsurprisingly, the limit to chroma resolution (good demosaicing can recover sharp chroma features) and limited-range quantization make the curve flatten out at those really high bitrates.
ycho commented 5 years ago

During the course of developing AV1 standard, I remember there has been zero discussion about it use cases for compressing CFA (Color Filter Array) form, which include Bayer pattern. Most of available video compression schemes input YCbCr, which are decorrelated from RGB signals from image sensors. Inside an image sensor, Bayer Pattern or other CFA is converted to RGB via de-Mosaicing step by ISP (Image Signal Processor). As I know, the ISPs are generally tightly coupled with image sensor and not accessible except developers tasked to write/update firmware. Importantly, when ISP is doing de-Mosaicing, there comes other processes such as photometric and geometric corrections, and even dead pixels corrections. So, I am uncertain how useful the raw Bayer patterns values are. Hence.. I think we can close this.

newpavlov commented 5 years ago

Are you interested in inter frame compression or intra only?

I am interested in replacing the existing setup in which I store videos in the form of separate FLIF-encoded frames, so for better results ideally we need both. I guess the most straightforward approach would be to somehow add support for 4 (or even more) channel videos. It may be possible to keep the 4th channel independent from the first three (as currently done for RGBA images), but it will result in a somewhat worse compression, as you will not be able to utilize obvious correlations.

As I know, the ISPs are generally tightly coupled with image sensor and not accessible except developers tasked to write/update firmware.

It's quite common for industrial cameras to provide raw Bayer data (e.g. see The Imaging Source line of products). One of the advantages of retrieving raw data is that it allows to effectively get higher camera FPS, as you have to transfer 3 times less data.