Audio desyncing with vpx encodes

cogman commented 3 years ago

Example video: https://mango.blender.org/download/ I did the 700MB 1080p version

command used

docker run -v "$(pwd)":/videos --user $(id -u):$(id -g) -it --rm masterofzen/av1an:latest -i ToS-4k-1920.mov -enc vpx --target_quality 95 -v "--profile=2 --threads=4 --cpu-used=4 --end-usage=q --cq-level=30 --row-mt=1 --bit-depth=10"

I've encountered this in other videos and every time the audio/video end up getting out of sync with one another. Typically with the audio lagging the video. About 8 minutes into ToS it is particularly bad.

It may just be the videos I've used, but I've not found one with vpx that doesn't exhibit this behavior.

cogman commented 3 years ago

Here's what I've tried so far (to no success :( )

I've tried forcing ffms2
I've tried using ffmpeg to turn the file from it's original form to h264 lossless (with -vsync 1) The output there had matching audio/video but the vpx encode of that ended up drifting.
I've tried using other chunking methods (whatever the 2nd recommended setting).
I've tried using vauporsynth files directly.
I've tried using mkvmerge

I didn't finish the encode (because it took too long) but one thing I'm about to try is using ffms2 with seekmode set to -1.

Any suggestions would be appreciated.

cogman commented 3 years ago

I fixed it!

It'd be nice if this were an option in av1an.

Step one, I wrote my own vpy script (because I also needed to do deinterlacing). It looks like so.

(input.vpy)

from vapoursynth import core
import havsfunc as haf

clip = core.ffms2.Source(source='S01E01.mkv', threads=1)
clip = haf.QTGMC(clip, Preset='Slow', TFF=True, EZDenoise=3.5, ChromaNoise=True, opencl=True)
clip = core.std.CropAbs(clip, 704, 480, 10, 0)
clip.set_output()

Shouldn't be too much interesting here but just an example of a working denoise script for some noisy content.

Next, I extracted the timecodes from the video in question, like so

vspipe input.vpy -t tc.dat .

(Notice the period, because I'm after the timecodes not the resultant video. This would be better if that timecode data was saved as part of the pyscenedetect or first pass phase, IMO)

From there I ran av1an with all my options (probably not important, but message me if you want to see them all. Mostly just fiddling with vpx nobs. Pretty much the same as the opening comment on this issue.)

After that, because Av1an doesn't appear to do audio when a custom script is provided, I extracted, transcoded, and merged in my audio track like so.

(S01E01.mkv is the original file targeted by the vspipe) ffmpeg -i output.mkv -i S01E01.mkv -c:v copy -map 0:v:0 -map 1:a:0 -c:a libopus -b:a 64K -vbr on output2.mkv

Finally, I took the timecodes and merged video/audio and used mkvmerge to merge them, like so.

mkvmerge --output output3.mkv --timestamps 0:tc.dat output2.mkv

I did not change the video from VFR to CFR or play around with the audio or whatever in order to get this to work. It just did after these steps.

If there's a better way to do this, I'm all ears :) Otherwise I'll keep using this and hopefully I've helped someone else if they stumble over the same issue. I suspect this mostly only works because I'm dealing with mkvs.

master-of-zen / Av1an

Audio desyncing with vpx encodes #250