Understanding audio stream lag in video-recordings

pmarini-nc commented 12 months ago

I have the following setup:

Nextcloud Server, Signaling Server and Recording Server as LXD containers in the same physical server
Communications between services are proxied via an HAProxy container, that does TLS offloading

I have noticed that in videorecordings audio is always lagged 1-2 seconds.

So, I decided to do some tests with a short .mkv file and pipe it into the benchmark utility. I can get the same results.

Same audio and video codec choices leads to a correct mkv-to-webm encoding when done directly in ffmpeg, that is bypassing the ffplay step, in the same system and the same input file.

Attaching here:

the source mkv video: source.mkv.zip
the transformed webm video: out-rt-b0.webm.zip

Their comparison should give evidence of the issue.

Moreover I also report the output of the first run of the benchmark as specified in the doc:

Recorder args: ffmpeg -loglevel level+warning -n -f pulse -i 1 -f x11grab -draw_mouse 0 -video_size 1920x1080 -i :0 -c:a libopus -c:v libvpx -deadline:v realtime -b:v 0 /tmp/test-ffmpeg/out-rt-b0.webm
File size: 330795
Average CPU percents: 182.44
Average memory infos: {'rss': 204182732.8, 'vms': 1006609612.8}
Average memory percents: 9.507999420166016

The webm file is cut to 10 seconds as I set length to this value in the benchmark call

pmarini-nc commented 11 months ago

The following ffmpeg settings in server.conf seem to get audio back in sync with video.

[ffmpeg]
common = ffmpeg -loglevel level+warning -itsoffset -2 -n
outputaudio = -map 0:a -c:a libopus
extensionvideo = .mkv
outputvideo = -map 1:v -c:v libvpx -deadline:v realtime -crf 10 -b:v 1M

But not sure about any "side effect".

bentuna commented 9 months ago

We tried to get video and audio in sync in multiple Nextcloud instances including AIO, but could not get it to work. The audio always lags behind.

We also tried your last comment, @pmarini-nc - but no luck - it just gives us an .mkv-video that the nextcloud can't play and which is not in sync either.

We tried multiple configurations for ffmpeg but none is solving the audio/video offset problems reliably.

pmarini-nc commented 7 months ago

Hello @bentuna , sorry for the late reply. In my case the ffmpeg settings work in different sites. none of those uses AIO. "Work" means reducing audio-to-video lag to an acceptable level.

It would be great to have a comment from @danxuliu ..!

P.S. mkv are played fine in chrome / chromium, but not in firefox. Maybe you are also impacted by https://bugzilla.mozilla.org/show_bug.cgi?id=1422891?

danxuliu commented 7 months ago

Sorry for the late response.

@pmarini-nc I am not sure either about side effects from using itsoffset, although in a quick test I noticed that the video is "choppier" (but it might have been just a coincidence, I do not really know).

In any case, the issue you are experiencing is most likely a regression introduced in ffmpeg 4.4 and fixed in ffmpeg 5.1 (I wish I had found the ticket before doing the bisect myself to find the breaking and fixing commits... :facepalm:), so it affects all ffmpeg 4.4.X and 5.0.X versions when using the default parameters.

Therefore, it affects Ubuntu 22.04 (when using the default ffmpeg package from the repositories), as it provides ffmpeg 4.4.2, but it should not affect Ubuntu 20.04 (ffmpeg 4.2.7) or Debian 11 (ffmpeg 4.3.6).

Fortunately the fix simply adjusts the value of a default parameter, so the bug can be work arounded in the recording server setting an explicit value for that parameter (strictly speaking it would be more difficult, as the value of that parameter in turn depends on other low level parameters, but let's assume those parameters that it depends on are using their default values, which is the typical scenario :-) ).

Could you try to replace https://github.com/nextcloud/nextcloud-talk-recording/blob/880a939ae4c5318a2f6422f770edce24feb05ef9/src/nextcloud/talk/recording/RecorderArgumentsBuilder.py#L56 with:

        ffmpegInputAudio = ['-f', 'pulse', '-fragment_size', '9600', '-i', audioSourceIndex]

run the benchmark again (without the itsoffset) and then check if the lag is fixed? Thanks!

@bentuna Your issue looks like something different, as even when it was initially added the AIO container for Nextcloud Talk Recording was already using Alpine 3.18 (specifically, python:3.11.3-alpine3.18), which ships ffmpeg 6.0.1 and therefore should not be affected by the ffmpeg regression. Nevertheless, could you also test the above change to be sure? Thanks!

pmarini-nc commented 7 months ago

Hi @danxuliu , thanks for the information.

I get:

NameError: name 'audioSinkIndex' is not defined. Did you mean: 'audioSourceIndex'?

I tried with audioSourceIndex, but didn't notice a great difference.

And.. definitely in a real video call the delay is still there.

As per your explanation, maybe it is worth trying with Ubuntu 20.04 or Debian 11?

danxuliu commented 7 months ago

I get:

NameError: name 'audioSinkIndex' is not defined. Did you mean: 'audioSourceIndex'?

Ah, yes, sorry, I copied that from an old branch by mistake; I have fixed the comment above.

I tried with audioSourceIndex, but didn't notice a great difference.

That is unexpected. The delay should have been noticeably reduced. This can be seen with the script below:

# Setup
docker run --interactive --tty --detach --name talk-recording-test ubuntu:22.04 bash
docker exec talk-recording-test bash -c "apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install --assume-yes git wget ffmpeg pulseaudio python3-pip xvfb"
docker exec talk-recording-test python3 -m pip install --upgrade pip
docker exec talk-recording-test useradd --create-home recording
docker exec --user recording talk-recording-test bash -c "cd /home/recording && git clone https://github.com/nextcloud/nextcloud-talk-recording"
docker exec --user recording talk-recording-test bash -c "DEB_PYTHON_INSTALL_LAYOUT=deb_system python3 -m pip install --editable file:///home/recording/nextcloud-talk-recording"
docker exec --user recording talk-recording-test wget --output-document=/tmp/big-buck-bunny.webm https://upload.wikimedia.org/wikipedia/commons/transcoded/1/18/Big_Buck_Bunny_Trailer_1080p.ogv/Big_Buck_Bunny_Trailer_1080p.ogv.360p.vp9.webm?download
docker cp talk-recording-test:/tmp/big-buck-bunny.webm /tmp

# Run benchmark
docker exec --user recording talk-recording-test python3 -m nextcloud.talk.recording.Benchmark --length 20 --width 640 --height 360 /tmp/big-buck-bunny.webm /tmp/out.webm
docker cp talk-recording-test:/tmp/out.webm /tmp

# Add "-fragment_size 9600" to the ffmpeg arguments
docker exec --user recording talk-recording-test sed --in-place "s/'-f', 'pulse', '-i', audioSourceIndex/'-f', 'pulse', '-fragment_size', '9600', '-i', audioSourceIndex/" /home/recording/nextcloud-talk-recording/src/nextcloud/talk/recording/RecorderArgumentsBuilder.py

# Run benchmark again
docker exec --user recording talk-recording-test python3 -m nextcloud.talk.recording.Benchmark --length 20 --width 640 --height 360 /tmp/big-buck-bunny.webm /tmp/out-fragment.webm
docker cp talk-recording-test:/tmp/out-fragment.webm /tmp

If you check the video around 00:13 (when the acorn hits the bunny) you should notice that the audio is now much closer to the video after adding -fragment_size 9600. But... the audio is still not in perfect sync :thinking:

On the other hand, running the script but changing it to use ubuntu:20.04 rather than ubuntu:22.04 shows no audio delay in the output. If ffmpeg 4.4.2 (the same version used in Ubuntu 22.04) is built and used on the Ubuntu 20.04 container the audio and video shows a delay as expected without the fragment_size parameter, but they are in sync once using it (script continued from above; remember to replace ubuntu:22.04 with ubuntu:20.04 in the first command):

# Build ffmpeg 4.4.2
docker exec --user recording talk-recording-test bash -c "cd /home/recording/nextcloud-talk-recording && git checkout ."
docker exec --user recording talk-recording-test bash -c "cd /home/recording && git clone https://git.ffmpeg.org/ffmpeg.git"
docker exec talk-recording-test apt-get install --assume-yes yasm pkg-config libopus-dev libpulse-dev libvpx-dev libsdl2-dev
docker exec --user recording talk-recording-test bash -c "cd /home/recording/ffmpeg && git checkout n4.4.2 && mkdir build"
docker exec --user recording talk-recording-test bash -c "cd /home/recording/ffmpeg/build && ../configure --enable-libvpx --enable-libpulse --enable-libopus --enable-libxcb --enable-sdl2 && make -j 4"

# Run benchmark with ffmpeg 4.4.2
docker exec --user recording talk-recording-test bash -c "PATH=/home/recording/ffmpeg/build:$PATH python3 -m nextcloud.talk.recording.Benchmark --length 20 --width 640 --height 360 /tmp/big-buck-bunny.webm /tmp/out-4.4.2.webm"
docker cp talk-recording-test:/tmp/out-4.4.2.webm /tmp

# Add "-fragment_size 9600" to the ffmpeg arguments
docker exec --user recording talk-recording-test sed --in-place "s/'-f', 'pulse', '-i', audioSourceIndex/'-f', 'pulse', '-fragment_size', '9600', '-i', audioSourceIndex/" /home/recording/nextcloud-talk-recording/src/nextcloud/talk/recording/RecorderArgumentsBuilder.py

# Run benchmark again with ffmpeg 4.4.2
docker exec --user recording talk-recording-test bash -c "PATH=/home/recording/ffmpeg/build:$PATH python3 -m nextcloud.talk.recording.Benchmark --length 20 --width 640 --height 360 /tmp/big-buck-bunny.webm /tmp/out-4.4.2-fragment.webm"
docker cp talk-recording-test:/tmp/out-4.4.2-fragment.webm /tmp

I have also checked Ubuntu 22.04 on a virtual machine rather than on a Docker container and the (smaller) audio delay is there also using -fragment_size 9600. So there is something strange going on with Ubuntu 22.04...

As per your explanation, maybe it is worth trying with Ubuntu 20.04 or Debian 11?

Yes, at least based on my tests both should work fine.

danxuliu commented 7 months ago

Interestingly the delay can be also reproduced on Alpine Linux, so @bentuna despite what I initially thought you may have the same issue after all ;-)

# Setup
docker run --interactive --tty --detach --name talk-recording-test-alpine python:3.11.3-alpine3.18 sh
docker exec talk-recording-test-alpine apk add git wget ffmpeg pulseaudio xvfb gcc python3-dev musl-dev linux-headers
docker exec talk-recording-test-alpine adduser -D recording
docker exec --user recording talk-recording-test-alpine sh -c "cd /home/recording && git clone https://github.com/nextcloud/nextcloud-talk-recording"
docker exec --user recording talk-recording-test-alpine python3 -m pip install --editable file:///home/recording/nextcloud-talk-recording
docker exec --user recording talk-recording-test-alpine wget --output-document=/tmp/big-buck-bunny.webm https://upload.wikimedia.org/wikipedia/commons/transcoded/1/18/Big_Buck_Bunny_Trailer_1080p.ogv/Big_Buck_Bunny_Trailer_1080p.ogv.360p.vp9.webm?download

# Run benchmark
docker exec --user recording talk-recording-test-alpine python3 -m nextcloud.talk.recording.Benchmark --length 20 --width 640 --height 360 /tmp/big-buck-bunny.webm /tmp/out-alpine.webm
docker cp talk-recording-test-alpine:/tmp/out-alpine.webm /tmp

pmarini-nc commented 7 months ago

You are right, @danxuliu. The audio lag is much reduced with the modification that includes fragment_size, as I did further testing in the same environment.

I have also tried to replace apt ffmpeg binaries with those compiled using the latest version available, that is 6.1, and I didn't notice any difference in a longer video.

What I did notice is that in the first few seconds of the video, audio is lagged but get back in sync rapidly.

Given the results are similar, for me the best is to keep the apt version and apply the manual fix in the code.

nextcloud / nextcloud-talk-recording

Understanding audio stream lag in video-recordings #9