x42 / silan

audio file [silence] analyzer
GNU General Public License v2.0
39 stars 8 forks source link

Inaccurate silence calculation for MP3 files #9

Open smlz opened 5 years ago

smlz commented 5 years ago

When testing our audio chain with different file formats I found that silan does return the different results, depending on the file format. I was expecting small disparities, but the result when analyzing an MP3 file was off by more than half a second.

The test audio starts with 200ms silence, followed by 600ms noise, followed by 1200ms silence. I used FLAC, ogg vorbis and MP3 as file formats. This was the output by silan:

$ silan padded.mp3
0.367868 Sound On
1.677347 Sound Off
$ silan padded.ogg
0.197891 Sound On
0.825397 Sound Off
$ silan padded.flac
0.200023 Sound On
0.825374 Sound Off

As you can see, the ogg vorbis and FLAC files are okay, but the MP3 is off.

I checked the individual files with a spectrum analyzer, and they seem to be okay.

MP3: spectrogram-mp3

Ogg Vorbis: spectrogram-ogg

FLAC: spectrogram-flac

Here is a zip with the three audio files: padded-audio.zip

I tried out different different silan options, but no luck so far.

The following versions were used:

$ silan --version
silan version 0.4.0
$ pkg-config --modversion sndfile
1.0.28
$ ffmpeg -version
ffmpeg version 4.0.3 Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 8 (GCC)
configuration: --prefix=/usr --bindir=/usr/bin --datadir=/usr/share/ffmpeg --docdir=/usr/share/doc/ffmpeg --incdir=/usr/include/ffmpeg --libdir=/usr/lib64 --mandir=/usr/share/man --arch=x86_64 --optflags='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' --extra-ldflags='-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld ' --extra-cflags=' ' --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libvo-amrwbenc --enable-version3 --enable-bzlib --disable-crystalhd --enable-fontconfig --enable-frei0r --enable-gcrypt --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libcdio --enable-libdrm --enable-indev=jack --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libmp3lame --enable-nvenc --enable-openal --enable-opencl --enable-opengl --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librsvg --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libvorbis --enable-libv4l2 --enable-libvidstab --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-libzvbi --enable-avfilter --enable-avresample --enable-postproc --enable-pthreads --disable-static --enable-shared --enable-gpl --disable-debug --disable-stripping --shlibdir=/usr/lib64 --enable-libmfx --enable-runtime-cpudetect
libavutil      56. 14.100 / 56. 14.100
libavcodec     58. 18.100 / 58. 18.100
libavformat    58. 12.100 / 58. 12.100
libavdevice    58.  3.100 / 58.  3.100
libavfilter     7. 16.100 /  7. 16.100
libavresample   4.  0.  0 /  4.  0.  0
libswscale      5.  1.100 /  5.  1.100
libswresample   3.  1.100 /  3.  1.100
libpostproc    55.  1.100 / 55.  1.100
x42 commented 5 years ago

I can't reproduce this here on debian/stable with ffmpeg 3.2.12 , libavcodec 57.64.101

$ ./src/silan /tmp/padded/padded.mp3 
0.191927 Sound On
0.830272 Sound Off
$ ./src/silan /tmp/padded/padded.ogg 
0.197891 Sound On
0.825397 Sound Off
$ ./src/silan /tmp/padded/padded.flac 
0.200023 Sound On
0.825374 Sound Off

seems like an ffmpeg, avcodec related issue.

smlz commented 5 years ago

Hm. I'll see if i can dig a bit deeper. Thanks for trying out.

x42 commented 5 years ago

It seems the API avcodec_decode_audio4() was already deprecated again in recent ffmpeg. Perhaps the wrapper function that ffmpeg4.x provides for the old API does not handle stereo or joint stereo correctly!?

I guess silan's ffmpeg audio-decoder needs to use the new avcodec_receive_frame() API with ffmpeg4.x ; libavcodec 58.x

smlz commented 5 years ago

I tried both, full stereo and joint stereo, and get strange values in both cases:

$ file padded-jointstereo.mp3 padded-stereo.mp3 
padded-jointstereo.mp3: Audio file with ID3 version 2.4.0, extended header, contains:MPEG ADTS, layer III, v1, 128 kbps, 44.1 kHz, JntStereo
padded-stereo.mp3:      Audio file with ID3 version 2.4.0, extended header, contains:MPEG ADTS, layer III, v1, 128 kbps, 44.1 kHz, Stereo

$ silan padded-jointstereo.mp3 
0.367868 Sound On
1.677347 Sound Off

$ silan padded-stereo.mp3 
0.093583 Sound On
1.677687 Sound Off

Here are the two MP3 files: padded-mp3s.zip

jpcima commented 5 years ago

@x42 as I tried using the audio_decoder library on my own, I found decoding to be bad on FFmpeg 4.1.2. The MP3 decoding is generating junk output.

For example is the sample recording of /usr/share/sounds/alsa/Front_Center.wav, vs LAME conversion of the same file.

wav mp3