k4yt3x / video2x

A machine learning-based lossless video super resolution framework. Est. Hack the Valley II, 2018.
https://video2x.org
GNU Affero General Public License v3.0
10.2k stars 982 forks source link

Upscaling Video with a 10-bit Color Depth Codec #117

Closed JohnTravolski closed 5 years ago

JohnTravolski commented 5 years ago

Environment Information

Module Version
video2x video2x-2.7.1-win32-full.zip
ffmpeg ffmpeg-20190330-5282cba-win64-static
waifu2x-caffe ver 1.2.0.2
waifu2x-converter-cpp Whatever was included in the zip

Symptom

When attempting to upscale a video with a color depth of 10 bits per channel (10 bpc), the colors in the output file appear to be muted (less colorful) than the original. It's as if the contrast was decreased significantly.

When I run:

./video2x -i testinput5.mxf -o out5.mp4 -m cudnn -r 2

I get the following output (these are just screenshots of the first frame of the each video file, don't worry about the resolution):

Original:

Output:

I am not sure if this is a limitation of ffmpeg, or if something in Video2X isn't handling higher bit depths correctly, or if I'm simply not using the tool correctly. Here are the example files I'm referring to:

https://drive.google.com/drive/folders/1Ibyx_wtb_9T5G_eyC9NpKcVFCU9h8wLO?usp=sharing

Are there some settings I can change so that it handles the 10 bit color correctly? Or is there some way I can change the output format to also use 10 bit color? I'm not sure.

If it is relevant, here is some more information about my input file: VirtualBox_2019-06-30_23-03-34

k4yt3x commented 5 years ago

Theoretically, ffmpeg copies the exact original settings over to the extracted frames/stitched videos. Therefore, it might be the problem of waifu2x? What happens if you take a single extracted frame and upscale it manually with waifu2x?

JohnTravolski commented 5 years ago

Theoretically, ffmpeg copies the exact original settings over to the extracted frames/stitched videos. Therefore, it might be the problem of waifu2x? What happens if you take a single extracted frame and upscale it manually with waifu2x?

Honestly, I'm not sure what image file formats support 10 bits per channel, so I don't exactly know how to do that. However, I just took a look at the output video: VirtualBox_2019-07-01_18-52-11 It appears that the output video is 8-bits per channel, which makes me believe that the color information is lost somewhere during the transformation.

Is it possible to change the codec settings used for the final video output? If I could change the codec used for the final output so that it's also using 10 bpc, maybe this issue would vanish.

I attempted running ./video2x -i testinput5.mxf -o out5.mxf -m cudnn -r 2 But the output file I get is still 8 bpc and has the same faded color problem as before. https://i.imgur.com/8K5QX5m.png Additionally, Premiere Pro isn't able to decode it properly, and many frames are completely black.

JohnTravolski commented 5 years ago

I dove into the problem a bit more. I imported the 10 bpc MXF into Adobe After Effects, exported the frames as 16 bpc PNG images to disk, and then used waifu2x-caffe to upscale those PNG images with these options selected: waifu2x-caffe_2019-07-02_21-10-58 There is no difference in color when doing it this way (I get the results I would expect to get). This leads me to believe that it must be an issue with either ffmpeg, video2x, or the way I'm using video2x.

sat3ll commented 5 years ago

Try to substitute all instances of yuv420p with yuv422p in video2x.json (the ffmpeg sections).

cr08 commented 5 years ago

I think in all steps of the process you will want to verify the component parts are maintaining that bit depth. ie: The video-to-frames stage that involves ffmpeg, ensure ffmpeg is writing out frames at 10 bits or higher and maintaining the same color you expect. waifu2x you'll definitely want to adjust for. Specifically the json file has an output_depth option that is defaulted to 8bit and needs adjusted. Then the frame-to-video stage, ensure it is outputting a 10bit or higher color depth from the source frames. I'm not familiar with ffmpeg enough unfortunately to know if it is smart enough to see the higher color depth and maintain it in the task at hand so there may be extra options needed to be added every step of the way. One of those is likely the yuv422p option @sat3ll mentioned.

JohnTravolski commented 5 years ago

Upon further inspection, it seems to be a problem with ffmpeg. When I attempt the following:

ffmpeg -y -i testinput5.mxf -pix_fmt rgb48be -ss 00:00:00.000 -vframes 1 thumb.png

I get this image, which has the same problem I observed with Video2X.

Even though the output image is 16 bpc, the colors still appear to be washed out, just like when using Video2X. It seems that ffmpeg is not decoding the file correctly, or I am not passing the correct parameters in my example above. If I can figure out what's going wrong with ffmpeg, I could probably fix the problem for Video2X. Does anybody have any ideas?

I have tried pretty much everything under the sun when it comes to the -pix_fmt argument, and nothing seems to fix the bad colors, so I'm really not sure what's going on. By the way, I have this problem even with the latest version of ffmpeg, ffmpeg version N-94150-g231d0c819f.

sat3ll commented 5 years ago
JohnTravolski commented 5 years ago

If I leave the -pix_fmt flag out, the output images look identical (still washed out), but they are 8 bits per channel. I obtained the first two images by importing the MXF and the output file from Video2X into a 16 bpc Adobe After Effects composition and took a screenshot for each.

sat3ll commented 5 years ago

I've did some digging, since amdgpu driver doesn't support 10-bit color (yet, Linux) I can't really spot the differences visually (there's placebo to account too!!).

Your original (correct) picture has this info:

$ exiftool -all ../60409376-f896ca00-9b88-11e9-9141-c3a3100e06d3.png 
ExifTool Version Number         : 11.50
File Name                       : 60409376-f896ca00-9b88-11e9-9141-c3a3100e06d3.png
Directory                       : ..
File Size                       : 1172 kB
File Modification Date/Time     : 2019:07:05 15:49:16+01:00
File Access Date/Time           : 2019:07:05 15:49:16+01:00
File Inode Change Date/Time     : 2019:07:05 15:49:16+01:00
File Permissions                : rw-rw-r--
File Type                       : PNG
File Type Extension             : png
MIME Type                       : image/png
Image Width                     : 958
Image Height                    : 538
Bit Depth                       : 8
Color Type                      : RGB with Alpha
Compression                     : Deflate/Inflate
Filter                          : Adaptive
Interlace                       : Noninterlaced
SRGB Rendering                  : Perceptual
Gamma                           : 2.2
Pixels Per Unit X               : 3779
Pixels Per Unit Y               : 3779
Pixel Units                     : meters
Image Size                      : 958x538
Megapixels                      : 0.515
$ ffprobe -i ../60409376-f896ca00-9b88-11e9-9141-c3a3100e06d3.png 
Input #0, png_pipe, from '../60409376-f896ca00-9b88-11e9-9141-c3a3100e06d3.png':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Video: png, rgba(pc), 958x538 [SAR 3779:3779 DAR 479:269], 25 tbr, 25 tbn, 25 tbc

Bit depth: 8 Format: RGBA (4 components)

FFMPEG default format:

$ ffprobe -i ../extract_1.png
[png_pipe @ 0x55f6b02e4100] Stream #0: not enough frames to estimate rate; consider increasing probesize
Input #0, png_pipe, from '../tess/extract_1.png':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Video: png, rgb48be(pc), 1920x1080 [SAR 1:1 DAR 16:9], 25 tbr, 25 tbn, 25 tbc
$ exiftool -all ../extract_1.png
ExifTool Version Number         : 11.50
File Name                       : extract_1.png
Directory                       : ../tess
File Size                       : 8.6 MB
File Modification Date/Time     : 2019:07:05 15:54:44+01:00
File Access Date/Time           : 2019:07:05 15:54:44+01:00
File Inode Change Date/Time     : 2019:07:05 15:54:44+01:00
File Permissions                : rw-rw-r--
File Type                       : PNG
File Type Extension             : png
MIME Type                       : image/png
Image Width                     : 1920
Image Height                    : 1080
Bit Depth                       : 16
Color Type                      : RGB
Compression                     : Deflate/Inflate
Filter                          : Adaptive
Interlace                       : Noninterlaced
Pixels Per Unit X               : 1
Pixels Per Unit Y               : 1
Pixel Units                     : Unknown
White Point X                   : 0.3127
White Point Y                   : 0.329
Red X                           : 0.64
Red Y                           : 0.33
Green X                         : 0.3
Green Y                         : 0.6
Blue X                          : 0.15
Blue Y                          : 0.06
Gamma                           : 1.961
Image Size                      : 1920x1080
Megapixels                      : 2.1

Bit depth: 16 Format: RGB48be (3 components)

Therefore -pix_fmt rgba should do what you're looking for. (replicating your original picture 8 bit depth). For 16bit per component: -pix_fmt rgba64be or -pix_fmt rgba64le

k4yt3x commented 5 years ago

@JohnTravolski With the newest commits the color depth issue has been fixed, but I'm not very sure why your video is not put back into a video correctly.

k4yt3x commented 5 years ago

This should be implemented already

mirh commented 2 years ago

ca90c5be02e2fc1981be43eaef57bf7dfa3016dc was good and all, but you'd also need the yuv420p10le pixel format to also encode it at the end. This shouldn't just be a win if your source was originally 10-bit, but even regular cheap ass 8-bit content should look better this way (especially given the "smooth colour palette" that upscalers give to textures)