melt.exe is creating pops in audio in clips with PCM audio in Kdenlive and ShotCut.

ArtisticTuxedo commented 2 months ago

I have noticed that the audio pops occasionally (usually within a few seconds of the start of the clip) when playing/rendering clips that have PCM audio, which always occur at the same point in the clips when replayed. I've used audacity to verify that the original clips don't have the popping. At first I thought it was Kdenlive, so I tried using the clip in ShotCut and noticed that it does the same exact thing. Because both Kdenlive and ShotCut have the issue, I assumed that it caused was something that they had in common. I tested ffmpeg/ffplay by playing the clip through the command line and there was no popping. Then, I tried playing the clip using melt.exe and the popping issue appeared. I've tried converting the original clips to various audio and video codecs (I also tried changing the frame rate and resolution) and I've pinpointed the issue down to the clip having a PCM codec, though it seems there are other factors and a bit of randomness on whether a clip has this problem when used in melt.exe. Here is the results of my testing (all clips below use the Matroska container):

Clip #⁠1

	Video Codec	Audio Codec	FPS	Resolution	Popping?
Original File	ffv1	PCM_f32le	24	854x480	Yes
Test #⁠1	ffv1	PCM_s16le	24	854x480	Yes
Test #⁠2	ffv1	PCM_u8	24	854x480	Less
Test #⁠3	vp9	PCM_f32le	24	854x480	Yes
Test #⁠4	ffv1	aac	24	854x480	No
Test #⁠5	ffv1	PCM_f32le	60	854x480	Less

Clip #⁠2

	Video Codec	Audio Codec	FPS	Resolution	Popping?
Original File	hevc_nvenc	PCM_f32le	60	1920x1080	No
Test #⁠1	hevc_nvenc	PCM_s16le	60	1920x1080	No
Test #⁠2	hevc_nvenc	PCM_u8	60	1920x1080	No
Test #⁠3	ffv1	PCM_f32le	24	1920x1080	Yes
Test #⁠4	ffv1	PCM_f32le	24	854x480	Yes
Test #⁠5	ffv1	PCM_f32le	60	854x480	No
Test #⁠6	vp9	PCM_f32le	24	1920x1080	Yes
Test #⁠7	libx264	PCM_f32le	24	1920x1080	Yes
Test #⁠8	vp9	aac	24	1920x1080	No
Test #⁠9	vp9	libopus	24	1920x1080	No
Test #⁠10	libx264	aac	24	1920x1080	No
Test #⁠11	libx264	libopus	24	1920x1080	No

From the data above it shows that the issue seems to be limited to PCM codecs and that lower frame rates seem to exacerbate the problem, while lower memory (32, 24, 16, and 8-bit) seems to improve the issue. It also seems like it is dependent on the footage as the second clip was only affected when it was reduced to 24 fps, but the top one was affected at 24 and 60 fps.

Specs:

OS: Windows 10 Pro 64-bit | Version 10.0.19045 Build 19045 MLT Version: 7.25.0

ddennedy commented 2 months ago

I tried to reproduce Test 7 but not yet. I generated a file as follows to generate a sine wav in the specified format as this is easiest for me to hear and see a problem in a waveform: melt -profile atsc_1080p_24 tone: out=240 frequency=500 level=-10 -consumer avformat:mlt_bug-1014.mkv vcodec=libx264 acodec=pcm_f32le

Then, I play it with melt built against today's git master (version 7.25.0 means it could be anything after the 7.24.0 release since releases only have even numbers) and FFmpeg v7.0. But I do not hear a pop or glitch. Next, I try to play it like melt -chain mlt_bug-1014.mkv since Shotcut is using chains now on videos in order to support time/speed keyframes.

Does the problem appear in exports? Next, I run the following to use melt to convert to a WAV I can play and inspect in Audacity: melt mlt_bug-1014.mkv -consumer avformat:mlt_bug-1014.wav

In Audacity I zoom in and press Page Down repeatedly to inspect it and did not find damage.

How are you generating your test clips?

ddennedy commented 2 months ago

I just repeated the above for clip 2, test 3 with the ffv1 codec, and that did not reproduce. It will help if you share a Shotcut MLT XML project or melt command line that produces the problematic clip.

ArtisticTuxedo commented 2 months ago

How are you generating your test clips?

For the first clip, I just used a clip that I had on hand. The second clip that I used was a screen recording, which I was moving windows around and had audio playing in the background. For each of the tests, I ran them through ffmpeg directly to convert them into the various codecs and formats for the tests. I confirmed that the audio damage wasn't present in the files with each of the tests (with Audacity) before playing them with melt using the command .\melt.exe "file.mkv".

Does the problem appear in exports?

Yes, the issue does appear in both exports and playback. Here is a link to a google drive containing the first clip before and after export: Example The audio error appears about 1-2 seconds into the rendered file. The render was done using Kdenlive, but I was able to reproduce the same error in ShotCut.

It will help if you share a Shotcut MLT XML project or melt command line that produces the problematic clip.

Sorry, I'm not well versed in using melt directly in the command line as I usually use Kdenlive/ShotCut, but I believe that the preset I used in both programs gets interpreted as melt -profile atsc_1080p_60 input.mkv -consumer avformat:output.mp4 f=mp4 vcodec=libx264 crf=23 vbufsize=0 g=15 bf=0 acodec=aac ab=160+'k' movflags=+faststart I could be wrong about that though.

ArtisticTuxedo commented 2 months ago

I don't know if this is worth noting, but I believe that this bug report on Kdenlive is the same issue that I am describing: https://bugs.kde.org/show_bug.cgi?id=445720 Although, they mention converting the audio into a 16-bit PCM format as a workaround, but in my case it doesn't work.

ddennedy commented 2 months ago

I hear it in Original.mkv in melt around where it shows Current Position: 90, ~3.75s

ArtisticTuxedo commented 2 months ago

Play the audio of both files in audacity. Original.mkv doesn't have the audio error, while the exported file from melt.exe (Render.mp4) does. If you play Original.mkv with melt it adds the same pop sound as it did with the export.

ddennedy commented 2 months ago

Here it is in the Audacity waveform at 3.667s

ArtisticTuxedo commented 2 months ago

I am confused, that error isn't there for me in original.mkv:

ddennedy commented 2 months ago

That is from my own render output.

ArtisticTuxedo commented 2 months ago

Oops, sorry about that! My mistake.

ArtisticTuxedo commented 2 months ago

For me the error occurs around 1.617 seconds in the render:

The top track is the render output and the bottom is the original clip. It looks like it's skipping to a point a little further in the audio as the audio after the error seems to be shifted to the left.

ddennedy commented 2 months ago

This is a regression caused by the fix for #885 @bmatherly can you help take a look at this one? It appears from that fix that the pts_offset was added to deal with the situation if (*ignore > 0 && audio_used). However, in this case, the problem is caused when that is not taken affecting the following condition if (req_pts > pts) {, which causes samples to be dropped. If I do this the problem here goes away:

--- src/modules/avformat/producer_avformat.c
+++ src/modules/avformat/producer_avformat.c
@@ -3275,6 +3275,8 @@ static int decode_audio(producer_avformat self,
         memmove(audio_buffer,
                 &audio_buffer[n * channels * sizeof_sample],
                 audio_used * channels * sizeof_sample);
+    } else {
+        audio_used_at_start = 0;
     }

     // If we're behind, ignore this packet

Or this change

@@ -3315,9 +3318,10 @@ static int decode_audio(producer_avformat self,
                 ahead_threshold = 4;
             }

-            if (req_pts > pts) {
+            if (req_pts > pts + ahead_threshold) {

bmatherly commented 2 months ago

It looks like we are suffering from an accumulation of rounding errors.

The audio frame size is 4096 samples. The timebase is 1/1000. That means each audio frame is 85.3 ticks in duration. So the PTS increment dithers between 85 and 86 to average out to 85.3.

This sync test is failing by exactly one tick: if (req_pts > pts) {

This patch works for me in my testing:

@@ -3297,7 +3297,7 @@ static int decode_audio(producer_avformat self,
                                 context->streams[index]->time_base);
         int64_t int_position = llrint(timebase * pts * fps);
         int64_t req_position = llrint(timecode * fps);
-        int64_t req_pts = llrint(timecode / timebase);
+        int64_t req_pts = floor(timecode / timebase);

         mlt_log_debug(MLT_PRODUCER_SERVICE(self->parent),
                       "A pkt.pts %" PRId64 " pkt->dts %" PRId64 " req_pos %" PRId64

I think the above patch is explainable because we would not want our PTS comparison to be rounded up even by a fraction of a tick.

Alternately, we could add a PTS error threshold and set it to something higher (like maybe 5ms).

Regarding this patch....

-            if (req_pts > pts) {
+            if (req_pts > pts + ahead_threshold) {

This is a lucky coincidence. But we should not do this because PTS is in units of audio time base (1/1000 for this stream) and ahead_threshold is in units of video frames.

mltframework / mlt