k4yt3x / video2x

A lossless video/GIF/image upscaler achieved with waifu2x, Anime4K, SRMD and RealSR. Started in Hack the Valley II, 2018.
https://video2x.org
GNU Affero General Public License v3.0
9.98k stars 968 forks source link

after I upscaled a video, the video turns into 3x or 4x speed. #382

Open dive2bass01 opened 3 years ago

dive2bass01 commented 3 years ago

Component Versions

Please at least fill in the release version and GUI or CLI version.

Symptom

after I upscaled a video, the video turns into 3x or 4x speed. sound of the video did not turn into 3x or 4x speed.

1.original size of the video: 720*480

2.the way I set: Express Settings- caffe, process 1, scale ratio 2, png, mp4

driver settings(caffe)- width height 1920 1080, noise 3, cudnn(properly installed), cunet crop 128, quality -1, depth 8, batch 1, gpu 0, TTA not selected

FFmpeg settings- force_format image2, video_codec libx264, pixel yuv420p, crf 17, tune animation average_bitrate not filled, ensure output~by2 was selected, hardware_acceleration was selected

Error Log or Screenshots

Please upload or paste the error log here. You may also include screenshots. It is highly recommended to include your error log.

k4yt3x commented 3 years ago

If you read the video file's info with FFprobe, what's the value of r_frame_rate?

dive2bass01 commented 3 years ago

original one is "120000/1001" and, upscaled one is "120/1"

k4yt3x commented 3 years ago

Which evaluates to roughly the same value as two fractions. What about avg_frame_rate, codec_time_base and time_base?

dive2bass01 commented 3 years ago

original one is "avg_frame_rate": "91773750/3760757" "codec_time_base": "3760757/183547500" "time_base": "1/120000"

upscaled one is "avg_frame_rate": "480896000/4011467" "codec_time_base": "4011467/961792000" "time_base": "1/16000"

k4yt3x commented 3 years ago

v2x reads the video's frame rate from r_frame_rate, which is accurate and equal to the value of avg_frame_rate in most of the cases I've seen.

In your case, that value is not in line with your avg_frame_rate. I still need to figure out how to reliably read the video's frame rate and how should VFR be dealt with properly. For the time being, you should be able to re-encode the video with CFR (-vsync 1) to avoid this problem.

dive2bass01 commented 3 years ago

Thank you for your answer.

  1. How about just lengthening the video of upscaled one using Adobe Premiere-Pro, so that the video fits with the audio? Will this cause quality problems?

  2. In Driver Settings of Caffe (GUI), what does the value of Output Quality mean? What are the pros and cons of getting bigger?

k4yt3x commented 3 years ago
  1. From my experience, I wouldn't consider Premiere's output to be lossless -- at least not as good as if the video were to be encoded correctly from the upscaled frames. That though, is only my guestimation based on my experience and my understanding of video encoding. If you want a more accurate/professional answer, some of the folks in the Telegram group might be able to help you.

  2. This setting's effect depends on the chosen output format (PNG/JPEG). -1 means to use the default value for that format. The official documentation only says this much. I'd assume it's the same as the quality settings in FFmpeg, but I'm not certain. You might have to test it to be sure. However, I do not think this would have a significant impact on the quality. The default value should be visually lossless.

    変換後の画像の画質を設定します。デフォルト値は-1です 指定できる値と意味は「出力拡張子」で設定した形式により異なります。 -1の場合は、各画像形式のデフォルト値が使われます。

dive2bass01 commented 3 years ago

v2x reads the video's frame rate from r_frame_rate

  1. Do you mean v2x "combines" upscaled frames on the basis of original video's r_frame_rate? or "extracts" original frames on the basis of original video's r_frame_rate?

  2. I found the original video's r_frame_rate is exactly the same as its Maximum frame rate. I think this is weird, but is there any way to set r_frame_rate properly? (or just use avg_frame_rate for the task) (or set value that user wants like 30/1 or 60/1 manually)

Thank you =)

k4yt3x commented 3 years ago
  1. It combines the frames based on the value read from r_frame_rate.
  2. I think I used to use avg_frame_rate, but something was wrong with that value as well. I'll need to consult some of the experts in the telegram group regarding this problem. The output frame rate should be uniform with the input's. Manual settings shouldn't be necessary. Also, manual settings could make bulk processing more complex. You'll have to set one frame rate for each of the clips.
mirh commented 2 years ago

I believe the proper way to handle this is using timestamps https://github.com/staxrip/staxrip/issues/373