k4yt3x / video2x

A lossless video/GIF/image upscaler achieved with waifu2x, Anime4K, SRMD and RealSR. Started in Hack the Valley II, 2018.
https://video2x.org
GNU Affero General Public License v3.0
9.65k stars 947 forks source link

RGB -> YUV color difference #403

Open 28598519a opened 3 years ago

28598519a commented 3 years ago

Color Primaries : BT.709 Transfer Characteristics : BT.709 Matrix Coefficients : BT.709

Usually the three tags in metadata are unknown, but sometimes they are not.

Take BT.709, for example, when ffmpeg convert "png to video" ( RGB -> YUV ), if you don't specify out_color_matrix as bt709, which will result in color difference after conversion.

-vf scale=out_color_matrix=bt709

You only need to add this if the three tags are not unknown, so this needs to be determined by changing the code and cannot be resolved simply by editing video2x.yaml

Note. Not sure which of the three tags is the key.

ref.

  1. FFmpeg Filters Documentation You cannot specify auto because it has no effect.
  2. Video Example You can use this video to do simple direct cutting ffmpeg -ss 00:00:** -t 00:00:01 -i input.mp4 -vcodec copy -acodec copy output.mp4 Output the video_1 with Video2x, and then edit assemble_video.output_options invideo2x.yaml '-vf': 'pad=ceil(iw/2)*2:ceil(ih/2)*2' to '-vf': 'scale=out_color_matrix=bt709,pad=ceil(iw/2)*2:ceil(ih/2)*2' Output the video_2 with Video2x, Compare the two videos to the source. (Please make sure there is no one else nearby.)
k4yt3x commented 3 years ago

Thanks for the issue. I'll review this after I'm over the midterm week.

28598519a commented 3 years ago

Video Example smpte170m

Please check this, too. Not sure why, but this may cause the original inference to be modify. (960x540 -> 1920x1080;ffmpeg -ss 00:00:50 -t 00:00:01) I specify smpte170m -> color difference I specify bt709 -> correct

mirh commented 2 years ago

I think I have what's happening here, after checking the temporary frames too and trying every input combination (thanks for your videos). EDIT: original context

Ffmpeg is pretty nice and dandy in just about everything, acknowledging the h264 colorspace bits when they are there, and somehow even being able to write color profiles to png if you wanted (too bad the upscalers are just dumb as a rock, and would wipe out every extra chunk/field from there, just like they do with physical pixel dimensions). There's just one problem: every time you do yuv2rgb it uses a bt.601 color space. It's not just the fallback matrix, it really is the only one supported (funnily enough they even dropped a FIXME that should have kind of reminded this.. sigh).

This isn't incorrect per se, these conversions are still (at least more or less) all according to the spec. But you have no "tag" anymore in the final file to inform the video player on the right details, requiring it to take a blind guess. And so they follow an old ass broadcast convention, whereas if you classify above a certain arbitrary "HD" resolution you are rendered with bt.709, otherwise you are assumed to have been mastered with bt601.

This is also why your second case gets screwed too: regardless of what the thing was previously, being FHD with no color information attached it will always be read as bt709. Smpte170m is just an useless waste of CPU cycles to the same rec.601 color space which players still won't be aware of - meanwhile bt709 is finally matching the implicit expectations.

Most people here are doing something like this later scenario (you would need very small source resolutions not to surpass SD dimensions after even the tiniest 2x scaling), so I'm quite astonished it wasn't noticed earlier.

What I found out to work:

But all of these are a chore in their own right, if not any just to think. What fixes everything easily and plainly 100% of times now that we understood the issue, is simply setting '-bsf:v': 'h264_metadata=matrix_coefficients=6' into the assemble_video output_options. Maybe image extraction to some YUV lossless format could also do the trick (whatever the latest hip JPEG flavour, or TIFF?) but I didn't try.

p.s. on top of all of this, take also note there is https://github.com/nihui/realsr-ncnn-vulkan/issues/16

mirh commented 2 years ago

The default is always 601 limited mtrix if input and output are untagged.

Too bad it is also converted to 601 if the input is 709 tagged? (see the very first example of this thread for example)

There is an open bug about not tagging scale matrix indeed. RGB does work though.

This one? And if there is a bug how can it work lol?

It does not, this is internal stuff not to be used and it does not convert anything just tags.

I guess sample_aspect_ratio could be also added through the -sar switch, but what's the matter?

You must use -colorspace bt709 instead with all correct out_color_matrix.

The problem isn't that video colorspace is different (I mean, every useless conversion is if we want to be anal, but then here you are again doing another one). It's that video playback will suck with the wrong colors because there's no formal agreement on what the rules are. After ffmpeg managing to specify the thing itself, forcing the tags really seems like the second best solution. As I said, it's the only "fire and forget" fix, not last because it really matches the reality of what's happening under the hood.

It is wrong anyway, like 709 is tagged as gamma 2.2, but it is actually 2.4 on OLED.

Because they follow BT.1886? That sounds like their own business to care tbh.

mirh commented 2 years ago

Idk about those issues, the colours here aren't basically revolted, and nobody is touching 4:4:4 YCbCr or complaining about colour levels. I would in fact argue there's nothing really wrong with reading inputs (except perhaps the wish that pngs could be color managed if necessary). Bug 9167 (in turn tracking its origins into two decades of "handwaving users can eventually workaround it somehow, so there's nothing else to do here") is the thing AFAICT.

I does not if you have all input primaries, transfer and matrix set. Just checked. Maybe you have old version.

I just tested all the three meaningful combinations, again, with the last git.

ffmpeg -i ..\xxx.mkv -pix_fmt rgb24 extracted_%0d.png

There's no difference in the resulting images, between 601 (untagged), 601 (tagged, only color_space and color_range tbh tho) and 709 (tagged, ditto). Source is yuv420p, if it can matter.

mirh commented 2 years ago

I was just spitballing there, in an ideal world you'd never colour convert as far as technically allowed by the output formats. Anyhow, putting aside this aspirational goal (and since I think we already agreed on the actual issue), just for the records I just wanted to make clear tagged 709 is properly converted and parsed to the same expected rgb24 results of 601.

CuteZombie7 commented 2 years ago

I tried this option: -vf scale=out_color_matrix=bt709, but sometimes it works, and sometimes it doesn't work (the colors are screwed up) , which is really confusing. So i turn to use zscale filter, and it always works: -vf "zscale=m=709:min=709:r=limited,format=yuv420p10le" -colorspace bt709 -color_primaries bt709 -color_trc bt709 -color_range tv Note also that some SD movies use bt601 colorspace. So when you convert yuv movie to rgb pictures you should specify the correct colorspace, or the colors will be screwed up at the first step.

CuteZombie7 commented 2 years ago

Yes, that's what i mean. If you specify bt709 for a bt601 movie, or the other way round, it will cause color difference. For yuv2rgb, this page may be helpful https://tieba.baidu.com/p/6597249102

mirh commented 2 years ago

Sigh.. you write long thorough explanations and then people just skip them over. Video2x already uses png (well, at least by default in version 4.8.1), so ffmpeg should already be good enough to properly understand and process colorspace.

I'm very skeptical that something that forces the input matrix (min) to be bt709 could always work. If it happens with your samples, it must be because not only they aren't tagged, but they also implicitly expect bt709.

And alas there's no escape when the source obliges you to play the guess game (maybe ffmpeg could implement some resolution-based heuristics, but I'm sure there would still be plenty of edge cases). The only sensible idea if any could be just to throw a warning when tags are missing.

But putting aside the worst case scenario of an ignorant encoder, this is what you need in your video2x.yaml

 assemble_video:
    output_options:
      ....
      '-bsf:v': 'h264_metadata=matrix_coefficients=6'

When you know the source input, you may instead tinker with adding an extra option to extract_frames:.

CuteZombie7 commented 2 years ago

The author of that article is not me LOL

mirh commented 2 years ago

I had already started typing the comment when you posted...

The article may have a point though, as I had half-way hinted above (but then omitted because it wasn't the night and day difference that OP was noticing) conversion isn't 100% accurate. Is there some way to force zscale without specifying any other extra non-default operation to it?

The second point that is risen up is not a problem with ffmpeg then. The input image is 1080p, I refuse to believe no color information could be attached to the image. If that is missing, you should fault and complain with the ripper.

The third makes no sense to me then.. I don't see why anybody in their right mind would use "packed RGB 16:16:16, 48bpp". Just force the yuv420p10le pixel format and you are done, if you want 10-bit video.

CuteZombie7 commented 2 years ago

Is there some way to force zscale without specifying any other extra non-default operation to it?

Which extra non-default operation?

I don't see why anybody in their right mind would use "packed RGB 16:16:16, 48bpp".

That's because directly yuv420p->RGB24 will result in reduced color quality. And directly yuv10bit->RGB24 will result in banding. So yuv420p->yuv444p10le->RGB48be->RGB24. Then you can get RGB pics which are very close to origin video in color (not sure 100% accuracy) .

CuteZombie7 commented 2 years ago

I'm very skeptical that something that forces the input matrix (min) to be bt709 could always work.

For RGB pics->yuv video, min=709 is actually no use at all. FFmpeg will ignore it since RGB have no bt709 colorspace. The input format will be gbr(pc). But if you drop "min=709", it will cause an error. :(

CuteZombie7 commented 2 years ago

min=input is a good idea, I never thought of it lol. But min=709 seems all right though. Does zscale automatically convert RGB to yuv? The method I mentioned is only applicable to bt601/bt709. As for bt2020, rgb24 is obviously not enough.

mirh commented 2 years ago

Which extra non-default operation?

I mean just using zscale in place of scale. Could -vf zscale already do it? I'm not sure. Of course if you want to manually add some other option you are free to do so, but my concern was what the program should have to ship out of the box.

That's because directly yuv420p->RGB24 will result in reduced color quality. And directly yuv10bit->RGB24 will result in banding.

Right, for as much as I still haven't seen any great examples of the former downsides. ON THE OTHER HAND after picking up some academical curiosity for this academical concern... I'm afraid all this care is placebo. https://github.com/nihui/waifu2x-ncnn-vulkan/blob/20220419/src/webp_image.h#L27 https://github.com/nihui/waifu2x-ncnn-vulkan/blob/20220419/src/stb_image.h#L1369 https://github.com/nihui/waifu2x-ncnn-vulkan/blob/20220419/src/wic_image.h#L59 It's RGB24 all the way down.

Maybe waifu2x-caffe (which maybe uses opencv internally.. the source isn't a walk in the park to parse) could still preserve more information, but I really wouldn't bet on it.

thekryz commented 2 years ago

But putting aside the worst case scenario of an ignorant encoder, this is what you need in your video2x.yaml

 assemble_video:
    output_options:
      ....
      '-bsf:v': 'h264_metadata=matrix_coefficients=6'

Does this work accordingly for x265 by using '-bsf:v': 'hevc_metadata=matrix_coefficients=6' ? According to the H.265 specification in Table E.5 (p. 451) I guess it should, right?

mirh commented 2 years ago

Yes, Table E.5 is identical to H.264.