slhck / ffmpeg-normalize

Audio Normalization for Python/ffmpeg
MIT License
1.28k stars 118 forks source link

Mp3 is quiet after normalizing #190

Closed canchanchara closed 2 years ago

canchanchara commented 2 years ago

:warning: Please read this carefully and edit the example responses! If you do not fill out this information, your bug report may be closed without comment.

Checklist (please tick all boxes)

Expected behavior I'm normalizing a mp3 with -t -14 and -lrt 11.

Actual behavior The converted mp3 is quiet.

File to reproduce: https://return0.de/interview.mp3

Command The exact command you were trying to run:

ffmpeg-normalize -t -14 -lrt 11 interview.mp3 -c:a libmp3lame -b:a 128k -o output.mp3 

Any output you get when running the command with the --debug flag:

DEBUG: Running command: ['C:\\Users\\David\\Documents\\project-tools\\ffmpeg-master-latest-win64-gpl\\bin\\ffmpeg.EXE', '-filters']
DEBUG: Parsing streams of interview.mp3
DEBUG: Running command: ['C:\\Users\\David\\Documents\\project-tools\\ffmpeg-master-latest-win64-gpl\\bin\\ffmpeg.EXE', '-i', 'interview.mp3', '-c', 'copy', '-t', '0', '-map', '0', '-f', 'null', 'NUL']
DEBUG: Stream parsing command output:
DEBUG: ffmpeg version N-108116-g50a4dff69f-20220913 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 12.1.0 (crosstool-NG 1.25.0.55_3defb7b)
  configuration: --prefix=/ffbuild/prefix --pkg-config-flags=--static --pkg-config=pkg-config --cross-prefix=x86_64-w64-mingw32- --arch=x86_64 --target-os=mingw32 --enable-gpl --enable-version3 --disable-debug --disable-w32threads --enable-pthreads --enable-iconv --enable-libxml2 --enable-zlib --enable-libfre
etype --enable-libfribidi --enable-gmp --enable-lzma --enable-fontconfig --enable-libvorbis --enable-opencl --disable-libpulse --enable-libvmaf --disable-libxcb --disable-xlib --enable-amf --enable-libaom --enable-libaribb24 --enable-avisynth --enable-libdav1d --enable-libdavs2 --disable-libfdk-aac --enable-f
fnvcodec --enable-cuda-llvm --enable-frei0r --enable-libgme --enable-libkvazaar --enable-libass --enable-libbluray --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librist --enable-libssh --enable-libtheora --enable-libvpx --enable-libwebp --enable-lv2 --enable-libmfx --enable-libopencore-amrnb -
-enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-librav1e --enable-librubberband --enable-schannel --enable-sdl2 --enable-libsoxr --enable-libsrt --enable-libsvtav1 --enable-libtwolame --enable-libuavs3d --disable-libdrm --disable-vaapi --enable-libvidstab --ena
ble-vulkan --enable-libshaderc --enable-libplacebo --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libzimg --enable-libzvbi --extra-cflags=-DLIBTWOLAME_STATIC --extra-cxxflags= --extra-ldflags=-pthread --extra-ldexeflags= --extra-libs=-lgomp --extra-version=20220913
  libavutil      57. 36.101 / 57. 36.101
  libavcodec     59. 43.100 / 59. 43.100
  libavformat    59. 31.100 / 59. 31.100
  libavdevice    59.  8.101 / 59.  8.101
  libavfilter     8. 48.100 /  8. 48.100
  libswscale      6.  8.112 /  6.  8.112
  libswresample   4.  9.100 /  4.  9.100
  libpostproc    56.  7.100 / 56.  7.100
Input #0, mp3, from 'interview.mp3':
  Metadata:
    encoded_by      : Switch Testversion © NCH Software
    genre           : Speech
    title           : <anonymized>
    date            : 2022
  Duration: 00:50:52.98, start: 0.000000, bitrate: 128 kb/s
  Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Output #0, null, to 'NUL':
  Metadata:
    encoded_by      : Switch Testversion © NCH Software
    genre           : Speech
    title           : <anonymized>
    date            : 2022
    encoder         : Lavf59.31.100
  Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
size=N/A time=-577014:32:22.77 bitrate=N/A speed=N/A    s/s speed=N/A
video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Output file is empty, nothing was encoded

DEBUG: Found duration: 3052.098 s
DEBUG: Found audio stream at index 0
INFO: Normalizing file interview.mp3 (1 of 1)
DEBUG: Running normalization for interview.mp3
DEBUG: Parsing normalization info for interview.mp3
INFO: Running first pass loudnorm filter for stream 0
DEBUG: Running command: ['C:\\Users\\David\\Documents\\project-tools\\ffmpeg-master-latest-win64-gpl\\bin\\ffmpeg.EXE', '-nostdin', '-y', '-i', 'interview.mp3', '-filter_complex', '[0:0]loudnorm=i=-14.0:lra=11.0:tp=-2.0:offset=0.0:print_format=json', '-vn', '-sn', '-f', 'null', 'NUL']
DEBUG: ffmpeg output: ffmpeg version N-108116-g50a4dff69f-20220913 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12.1.0 (crosstool-NG 1.25.0.55_3defb7b)
configuration: --prefix=/ffbuild/prefix --pkg-config-flags=--static --pkg-config=pkg-config --cross-prefix=x86_64-w64-mingw32- --arch=x86_64 --target-os=mingw32 --enable-gpl --enable-version3 --disable-debug --disable-w32threads --enable-pthreads --enable-iconv --enable-libxml2 --enable-zlib --enable-libfreet
ype --enable-libfribidi --enable-gmp --enable-lzma --enable-fontconfig --enable-libvorbis --enable-opencl --disable-libpulse --enable-libvmaf --disable-libxcb --disable-xlib --enable-amf --enable-libaom --enable-libaribb24 --enable-avisynth --enable-libdav1d --enable-libdavs2 --disable-libfdk-aac --enable-ffn
vcodec --enable-cuda-llvm --enable-frei0r --enable-libgme --enable-libkvazaar --enable-libass --enable-libbluray --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librist --enable-libssh --enable-libtheora --enable-libvpx --enable-libwebp --enable-lv2 --enable-libmfx --enable-libopencore-amrnb --e
nable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-librav1e --enable-librubberband --enable-schannel --enable-sdl2 --enable-libsoxr --enable-libsrt --enable-libsvtav1 --enable-libtwolame --enable-libuavs3d --disable-libdrm --disable-vaapi --enable-libvidstab --enabl
e-vulkan --enable-libshaderc --enable-libplacebo --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libzimg --enable-libzvbi --extra-cflags=-DLIBTWOLAME_STATIC --extra-cxxflags= --extra-ldflags=-pthread --extra-ldexeflags= --extra-libs=-lgomp --extra-version=20220913
libavutil      57. 36.101 / 57. 36.101
libavcodec     59. 43.100 / 59. 43.100
libavformat    59. 31.100 / 59. 31.100
libavdevice    59.  8.101 / 59.  8.101
libavfilter     8. 48.100 /  8. 48.100
libswscale      6.  8.112 /  6.  8.112
libswresample   4.  9.100 /  4.  9.100
libpostproc    56.  7.100 / 56.  7.100
Input #0, mp3, from 'interview.mp3':
Metadata:
encoded_by      : Switch Testversion © NCH Software
genre           : Speech
title           : <anonymized>
date            : 2022
Duration: 00:50:52.98, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
Stream #0:0 (mp3float) -> loudnorm:default
loudnorm:default -> Stream #0:0 (pcm_s16le)
Output #0, null, to 'NUL':
Metadata:
encoded_by      : Switch Testversion © NCH Software
genre           : Speech
title           : <anonymized>
date            : 2022
encoder         : Lavf59.31.100
Stream #0:0: Audio: pcm_s16le, 192000 Hz, mono, s16, 3072 kb/s
Metadata:
encoder         : Lavc59.43.100 pcm_s16le
video:0kB audio:1144868kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_loudnorm_0 @ 000001f4daa97e40]
{
"input_i" : "-0.08",
"input_tp" : "84.92",
"input_lra" : "4.20",
"input_thresh" : "-19.69",
"output_i" : "-23.10",
"output_tp" : "-2.00",
"output_lra" : "7.20",
"output_thresh" : "-33.77",
"normalization_type" : "dynamic",
"target_offset" : "9.10"
}

DEBUG: Loudnorm first pass command output: ffmpeg version N-108116-g50a4dff69f-20220913 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12.1.0 (crosstool-NG 1.25.0.55_3defb7b)
configuration: --prefix=/ffbuild/prefix --pkg-config-flags=--static --pkg-config=pkg-config --cross-prefix=x86_64-w64-mingw32- --arch=x86_64 --target-os=mingw32 --enable-gpl --enable-version3 --disable-debug --disable-w32threads --enable-pthreads --enable-iconv --enable-libxml2 --enable-zlib --enable-libfreet
ype --enable-libfribidi --enable-gmp --enable-lzma --enable-fontconfig --enable-libvorbis --enable-opencl --disable-libpulse --enable-libvmaf --disable-libxcb --disable-xlib --enable-amf --enable-libaom --enable-libaribb24 --enable-avisynth --enable-libdav1d --enable-libdavs2 --disable-libfdk-aac --enable-ffn
vcodec --enable-cuda-llvm --enable-frei0r --enable-libgme --enable-libkvazaar --enable-libass --enable-libbluray --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librist --enable-libssh --enable-libtheora --enable-libvpx --enable-libwebp --enable-lv2 --enable-libmfx --enable-libopencore-amrnb --e
nable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-librav1e --enable-librubberband --enable-schannel --enable-sdl2 --enable-libsoxr --enable-libsrt --enable-libsvtav1 --enable-libtwolame --enable-libuavs3d --disable-libdrm --disable-vaapi --enable-libvidstab --enabl
e-vulkan --enable-libshaderc --enable-libplacebo --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libzimg --enable-libzvbi --extra-cflags=-DLIBTWOLAME_STATIC --extra-cxxflags= --extra-ldflags=-pthread --extra-ldexeflags= --extra-libs=-lgomp --extra-version=20220913
libavutil      57. 36.101 / 57. 36.101
libavcodec     59. 43.100 / 59. 43.100
libavformat    59. 31.100 / 59. 31.100
libavdevice    59.  8.101 / 59.  8.101
libavfilter     8. 48.100 /  8. 48.100
libswscale      6.  8.112 /  6.  8.112
libswresample   4.  9.100 /  4.  9.100
libpostproc    56.  7.100 / 56.  7.100
Input #0, mp3, from 'interview.mp3':
Metadata:
encoded_by      : Switch Testversion © NCH Software
genre           : Speech
title           : <anonymized>
date            : 2022
Duration: 00:50:52.98, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
Stream #0:0 (mp3float) -> loudnorm:default
loudnorm:default -> Stream #0:0 (pcm_s16le)
Output #0, null, to 'NUL':
Metadata:
encoded_by      : Switch Testversion © NCH Software
genre           : Speech
title           : <anonymized>
date            : 2022
encoder         : Lavf59.31.100
Stream #0:0: Audio: pcm_s16le, 192000 Hz, mono, s16, 3072 kb/s
Metadata:
encoder         : Lavc59.43.100 pcm_s16le
video:0kB audio:1144868kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_loudnorm_0 @ 000001f4daa97e40]
{
"input_i" : "-0.08",
"input_tp" : "84.92",
"input_lra" : "4.20",
"input_thresh" : "-19.69",
"output_i" : "-23.10",
"output_tp" : "-2.00",
"output_lra" : "7.20",
"output_thresh" : "-33.77",
"normalization_type" : "dynamic",
"target_offset" : "9.10"
}

DEBUG: Loudnorm stats parsed: {"input_i": "-0.08", "input_tp": "84.92", "input_lra": "4.20", "input_thresh": "-19.69", "output_i": "-23.10", "output_tp": "-2.00", "output_lra": "7.20", "output_thresh": "-33.77", "normalization_type": "dynamic", "target_offset": "9.10"}
INFO: Running second pass for interview.mp3
DEBUG: Running command: ['C:\\Users\\David\\Documents\\project-tools\\ffmpeg-master-latest-win64-gpl\\bin\\ffmpeg.EXE', '-y', '-nostdin', '-i', 'interview.mp3', '-filter_complex', '[0:0]loudnorm=i=-14.0:lra=11.0:tp=-2.0:offset=9.1:measured_i=-0.08:measured_lra=4.2:measured_tp=84.92:measured_thresh=-19.69:li
near=true:print_format=json[norm0]', '-map_metadata', '0', '-map_metadata:s:a:0', '0:s:a:0', '-map_chapters', '0', '-map', '[norm0]', '-c:a', 'libmp3lame', '-b:a', '128k', '-c:s', 'copy', 'C:\\Users\\David\\AppData\\Local\\Temp\\39gk8km0.mp3']
DEBUG: ffmpeg output: ffmpeg version N-108116-g50a4dff69f-20220913 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12.1.0 (crosstool-NG 1.25.0.55_3defb7b)
configuration: --prefix=/ffbuild/prefix --pkg-config-flags=--static --pkg-config=pkg-config --cross-prefix=x86_64-w64-mingw32- --arch=x86_64 --target-os=mingw32 --enable-gpl --enable-version3 --disable-debug --disable-w32threads --enable-pthreads --enable-iconv --enable-libxml2 --enable-zlib --enable-libfreet
ype --enable-libfribidi --enable-gmp --enable-lzma --enable-fontconfig --enable-libvorbis --enable-opencl --disable-libpulse --enable-libvmaf --disable-libxcb --disable-xlib --enable-amf --enable-libaom --enable-libaribb24 --enable-avisynth --enable-libdav1d --enable-libdavs2 --disable-libfdk-aac --enable-ffn
vcodec --enable-cuda-llvm --enable-frei0r --enable-libgme --enable-libkvazaar --enable-libass --enable-libbluray --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librist --enable-libssh --enable-libtheora --enable-libvpx --enable-libwebp --enable-lv2 --enable-libmfx --enable-libopencore-amrnb --e
nable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-librav1e --enable-librubberband --enable-schannel --enable-sdl2 --enable-libsoxr --enable-libsrt --enable-libsvtav1 --enable-libtwolame --enable-libuavs3d --disable-libdrm --disable-vaapi --enable-libvidstab --enabl
e-vulkan --enable-libshaderc --enable-libplacebo --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libzimg --enable-libzvbi --extra-cflags=-DLIBTWOLAME_STATIC --extra-cxxflags= --extra-ldflags=-pthread --extra-ldexeflags= --extra-libs=-lgomp --extra-version=20220913
libavutil      57. 36.101 / 57. 36.101
libavcodec     59. 43.100 / 59. 43.100
libavformat    59. 31.100 / 59. 31.100
libavdevice    59.  8.101 / 59.  8.101
libavfilter     8. 48.100 /  8. 48.100
libswscale      6.  8.112 /  6.  8.112
libswresample   4.  9.100 /  4.  9.100
libpostproc    56.  7.100 / 56.  7.100
Input #0, mp3, from 'interview.mp3':
Metadata:
encoded_by      : Switch Testversion © NCH Software
genre           : Speech
title           :  <anonymized>
date            : 2022
Duration: 00:50:52.98, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
Stream #0:0 (mp3float) -> loudnorm:default
loudnorm:default -> Stream #0:0 (libmp3lame)
Output #0, mp3, to 'C:\Users\David\AppData\Local\Temp\39gk8km0.mp3':
Metadata:
TENC            : Switch Testversion © NCH Software
TCON            : Speech
TIT2            : <anonymized>
TDRC            : 2022
TSSE            : Lavf59.31.100
Stream #0:0: Audio: mp3, 48000 Hz, mono, fltp, 128 kb/s
Metadata:
encoder         : Lavc59.43.100 libmp3lame
video:0kB audio:47703kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.001220%
[Parsed_loudnorm_0 @ 000002700d18b740]
{
"input_i" : "-0.08",
"input_tp" : "84.92",
"input_lra" : "4.20",
"input_thresh" : "-19.69",
"output_i" : "-16.20",
"output_tp" : "-2.00",
"output_lra" : "5.30",
"output_thresh" : "-29.19",
"normalization_type" : "dynamic",
"target_offset" : "2.20"
}

DEBUG: Moving temporary file from C:\Users\David\AppData\Local\Temp\39gk8km0.mp3 to output.mp3
DEBUG: Normalization finished
INFO: Normalized file written to output.mp3

Environment (please complete the following information):

slhck commented 2 years ago

Hm. Just to make sure I didn't just break this. Could you please check if the previous release produces the same problem?

If yes, does it only apply to all files or just this one?

canchanchara commented 2 years ago

The old release has the same problem. I know this problem at the moment only at this file.

slhck commented 2 years ago

In that case it might again be an issue with the original filter, which I cannot do much about. I know it has its quirks and should be improved (unfortunately it has not been maintained much recently).

I will check it out tomorrow!

canchanchara commented 2 years ago

Thank you. I do batch processing. So if you can fix it, it would be perfect. If you can not fix this problem, it would be great if the programm can send an error. This would help for batch processing, to identify a problem.

slhck commented 2 years ago

Something is wrong with the entire conversion. Look at the original:

interview

vs the output:

output

Essentially, the original you had was mixed very loud to begin with. The resulting file is missing a large chunk of the audio contents.

I think you should file a bug report on https://trac.ffmpeg.org/ for this particular sample.

Unforatunately, there is not much I can do about it!

canchanchara commented 2 years ago

Is it possible to "validate" the output mp3? Sth. like: if ( x% of the output mp3 is quiet) -> throw an error

slhck commented 2 years ago

This would be possible with the silencedetect filter of ffmpeg and parsing the output in the shell, but really, it's obviously just a bug that needs to be fixed. I don't see that it's worth implementing such a check in this tool.

If you are looking for a script that implements "how many percent of a clip are silent?", and you have something started already, superuser.com may be a good resource for help.

richardpl commented 2 years ago

This is more input mp3 problem, somehow there is very huge amplitude spike 84.9dBFS right at start of audio. Can be inspected with astats or ebur128 filter. loudnorm gets somehow entirely confused about this and thus dynamics processing/scanning is giving wrong results to compensate for such huge spike. Can be fixed by using some kind of limiter just as first step in processing chain.

slhck commented 2 years ago

@richardpl Thanks for your input! Is this just this MP3 or have you seen it with others?

richardpl commented 2 years ago

I seen it first time with this mp3, but can be reproduced with any file that can store >+/-1.0 values in samples. Just need high enough single sample spike.

slhck commented 2 years ago

I guess I will close this as a one-off bug then, nothing that can be done from ffmpeg-normalize.