meh / rust-ffmpeg

Safe FFmpeg wrapper.
Do What The F*ck You Want To Public License
458 stars 96 forks source link

Last ~0.5s always truncated when resampling audio #162

Open xd009642 opened 3 years ago

xd009642 commented 3 years ago

So this may well be a user error so I've extracted out the relevant code to create a minimal example and pushed it to here: https://github.com/xd009642/resampling_example the behaviour is very consistent though and seems to be about the same amount of data missing from the end regardless of file length (I've tried this with 3 wave files going from 44100Hz->8000Hz some stereo some mono).

This example will load the audio file at 44100Hz and resample it to 8000Hz. Whenever I do this I find that the end of the audio file gets chopped off. Below is a screenshot of audacity showing the source file as the first track and the output as the second track.

Screenshot from 2021-11-05 14-22-55

I wonder if this could be related to how I'm flushing the resampler at the end?

    audio_decoder.flush();

    while let Ok(Some(_)) = resampler.flush(&mut resampled_audio) {
        data.append(&mut get_samples(&resampled_audio));
    }

EDIT ffmpeg version info, but I've also observed this behaviour on ffmpeg 4.3.2 as well:

ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
  libavutil      55. 78.100 / 55. 78.100
  libavcodec     57.107.100 / 57.107.100
  libavformat    57. 83.100 / 57. 83.100
  libavdevice    57. 10.100 / 57. 10.100
  libavfilter     6.107.100 /  6.107.100
  libavresample   3.  7.  0 /  3.  7.  0
  libswscale      4.  8.100 /  4.  8.100
  libswresample   2.  9.100 /  2.  9.100
  libpostproc    54.  7.100 / 54.  7.100
Hyper fast Audio and Video encoder
usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile}...

Use -h to get full help or, even better, run 'man ffmpeg'
meh commented 3 years ago

Could you check if any of the flush calls are returning errors?

xd009642 commented 3 years ago

Just changed it to unwrap the one unchecked flush call as so:

    audio_decoder.flush();

    while let Some(_) = resampler.flush(&mut resampled_audio).unwrap() {
        println!("Getting some bytes");
        data.append(&mut get_samples(&resampled_audio));
    }

No error, also I put in a print in that loop and it returns None so it must think that there's no samples remaining in the resampling context.

xd009642 commented 3 years ago

Oh if the resampler returns the last of the audio in that flush call the first time would it then return None and I wouldn't add the audio to the output buffer... Let me try that.

EDIT: Nope didn't change anything, so now the changed code for the resampler is:

    println!("Flush decoder and read last bits");
    audio_decoder.flush();

    while resampler.delay().is_some() {
        println!("Flushing");
        resampler.flush(&mut resampled_audio).unwrap();
        data.append(&mut get_samples(&resampled_audio));
    }

And flushing is never printed.

xd009642 commented 3 years ago

Another observation, this time I went for an 8KHz wav and 8KHz mp3 as the source audios so the resampler shouldn't change the audio in the slightest.

This is the input wav file and output wav file. They match exactly

Screenshot from 2021-11-05 18-09-52

This is the input mp3 file and output wav file, we can see again ~0.5s is removed from the end (11.23s vs 10.52s).

Screenshot from 2021-11-05 18-10-19

Not sure if this suggests misuse of the audio decoder as mp3 decoding is more involved than wav... I believe mp3 is planar data not packed so the resampler may be doing the work to convert between the two layouts but nothing else?

meh commented 3 years ago

Yeah I feel like something is not being fully flushed somewhere.

I will try to dig in as soon as I have time, but do keep digging, it might be inside the library itself, too.

xd009642 commented 3 years ago

Yeah I've started going over the library code this afternoon and ffmpeg C examples to see if I can find anything that stands out to me. I'll also enable the trace logging but I tried that before I had the minimal example and nothing stood out

xd009642 commented 2 years ago

Output for the 44100Hz wav with the ffmpeg trace logging enabled, most of it's the probe stuff no real logging from resampler :disappointed: Also, edited first post to add version info for ffmpeg into the issue.

Probing wav score:99 size:2048
[wav @ 0x55693aeb0c60] Format wav probed with size=2048 and score=99
[wav @ 0x55693aeb0c60] Before avformat_find_stream_info() pos: 44 bytes read:67584 seeks:3 nb_streams:1
[wav @ 0x55693aeb0c60] probing stream 0 pp:32
Probing mp3 score:1 size:4096
[wav @ 0x55693aeb0c60] Probe with size=4096, packets=2469 detected mp3 with score=1
[wav @ 0x55693aeb0c60] probing stream 0 pp:31
Probing mp3 score:1 size:8192
[wav @ 0x55693aeb0c60] Probe with size=8192, packets=2470 detected mp3 with score=1
[wav @ 0x55693aeb0c60] probing stream 0 pp:30
[wav @ 0x55693aeb0c60] probing stream 0 pp:29
Probing mp3 score:1 size:16384
[wav @ 0x55693aeb0c60] Probe with size=16384, packets=2472 detected mp3 with score=1
[wav @ 0x55693aeb0c60] probing stream 0 pp:28
[wav @ 0x55693aeb0c60] probing stream 0 pp:27
[wav @ 0x55693aeb0c60] probing stream 0 pp:26
[wav @ 0x55693aeb0c60] probing stream 0 pp:25
[wav @ 0x55693aeb0c60] probing stream 0 pp:24
[wav @ 0x55693aeb0c60] probing stream 0 pp:23
[wav @ 0x55693aeb0c60] probing stream 0 pp:22
[wav @ 0x55693aeb0c60] probing stream 0 pp:21
[wav @ 0x55693aeb0c60] probing stream 0 pp:20
[wav @ 0x55693aeb0c60] probing stream 0 pp:19
[wav @ 0x55693aeb0c60] probing stream 0 pp:18
[wav @ 0x55693aeb0c60] probing stream 0 pp:17
[wav @ 0x55693aeb0c60] probing stream 0 pp:16
[wav @ 0x55693aeb0c60] probing stream 0 pp:15
[wav @ 0x55693aeb0c60] probing stream 0 pp:14
[wav @ 0x55693aeb0c60] probing stream 0 pp:13
[wav @ 0x55693aeb0c60] probing stream 0 pp:12
[wav @ 0x55693aeb0c60] probing stream 0 pp:11
[wav @ 0x55693aeb0c60] probing stream 0 pp:10
[wav @ 0x55693aeb0c60] probing stream 0 pp:9
[wav @ 0x55693aeb0c60] probing stream 0 pp:8
[wav @ 0x55693aeb0c60] probing stream 0 pp:7
[wav @ 0x55693aeb0c60] probing stream 0 pp:6
[wav @ 0x55693aeb0c60] probing stream 0 pp:5
[wav @ 0x55693aeb0c60] probing stream 0 pp:4
[wav @ 0x55693aeb0c60] probing stream 0 pp:3
[wav @ 0x55693aeb0c60] probing stream 0 pp:2
[wav @ 0x55693aeb0c60] probing stream 0 pp:1
[wav @ 0x55693aeb0c60] probed stream 0
[wav @ 0x55693aeb0c60] parser not found for codec pcm_s16le, packets or times may be invalid.
[wav @ 0x55693aeb0c60] All info found
[wav @ 0x55693aeb0c60] stream 0: start_time: -209146758205323.719 duration: -209146758205323.719
[wav @ 0x55693aeb0c60] format: start_time: -9223372036854.775 duration: -9223372036854.775 bitrate=705 kb/s
[wav @ 0x55693aeb0c60] After avformat_find_stream_info() pos: 204844 bytes read:272384 seeks:3 frames:50
Input stats
* Sample rate: 44100Hz
* Channels: 1
* Format: "s16"
Creating resampler
[SWR @ 0x55693aef0ba0] Using s16p internally between filters
Start sample reading
Flush decoder and read last bits
Write output.wav
* Sample rate: 8000Hz
* Channels: 1
* Format: s16
kcking commented 2 years ago

I think I figured out the issue. resampler.delay().is_some() is being used to detect when all of the resampler's buffers have been processed. However, resampler.delay() will return None prematurely because the delay is being requested in seconds and then rounding down to zero.

I tried forking rust-ffmpeg to prevent rounding down, but this didn't completely fix the problem because swr_get_delay will report 16 samples even when all of the output has been flushed. I think this is because the default filter looks at +/- 16 samples.

I found 2 workarounds

  1. compute the expected number of samples for how much audio has been decoded so far using av_rescale_rnd(num_decoded_samples, output_rate, input_rate, AV_ROUND_DOWN) (rounding up seemed to work in this case as well but I wanted to be conservative). run the flush loop until you have all expected samples.
  2. run the resampler.flush loop until data.len() doesn't change. (this seems simpler)

Tangentially, it might make sense to fix resampler.delay() to return whenever delay is non-zero (happy to make a PR for that if desired @meh ) instead of rounding to nearest second. However that change would make this example infinite-loop, and I'm not sure if we feel that changing this behavior would be considered a breaking change for users of rust-ffmpeg.

meh commented 2 years ago

I like workaround number 2, and I think I like delay returning a more meaningful value.

xd009642 commented 2 years ago

until data.len() doesn't change.

So there's no chance of the resampler requiring >1 flush call and two adjacent ones returning the same number of samples?

kcking commented 2 years ago

By 'data' I'm referring to the final vec of samples. If flush returns no samples, it's length wont change.

As long as the audio frame being passed to flush is non-empty, flush returning no samples should indicate there is no more buffered output.