slhck / ffmpeg-normalize

Audio Normalization for Python/ffmpeg
MIT License
1.28k stars 118 forks source link

Wrong normalization #266

Closed slajerek closed 3 months ago

slajerek commented 3 months ago

Expected behavior I have 32-bit float 48kHz wav file that I would like to normalize and convert to mp3. Actually the wave is probably already normalized, but it does not matter - I'd like to have this in a batch convert thus some files may have low or high volumes.

Actual behavior The file is not normalized, the output volume actually is very low.

Command The exact command you were trying to run:

ffmpeg-normalize -f --debug vppr0F.wav -o slr-vppr0Fb.mp3 -c:a mp3 -b:a 256k

Any output you get when running the command with the --debug flag:

(1) mars ~/Desktop # ffmpeg-normalize -f --debug vppr0F.wav -o slr-vppr0Fb.mp3 -c:a mp3 -b:a 256k
DEBUG: Running command: /opt/homebrew/opt/ffmpeg@4/bin/ffmpeg -filters
DEBUG: Parsing streams of vppr0F.wav
DEBUG: Running command: /opt/homebrew/opt/ffmpeg@4/bin/ffmpeg -i vppr0F.wav -c copy -t 0 -map 0 -f null /dev/null
DEBUG: Stream parsing command output:
DEBUG: ffmpeg version 4.4.5 Copyright (c) 2000-2024 the FFmpeg developers
  built with Apple clang version 15.0.0 (clang-1500.3.9.4)
  configuration: --prefix='/opt/homebrew/Cellar/ffmpeg@4/4.4.5' --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-avresample --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'vppr0F.wav':
  Duration: 00:01:36.00, bitrate: 3072 kb/s
  Chapters:
    Chapter #0:0: start 0.000000, end 3.000000
    Chapter #0:1: start 3.000000, end 6.000000
    Chapter #0:2: start 6.000000, end 9.000000
    Chapter #0:3: start 9.000000, end 12.000000
    Chapter #0:4: start 12.000000, end 15.000000
    Chapter #0:5: start 15.000000, end 18.000000
    Chapter #0:6: start 18.000000, end 21.000000
    Chapter #0:7: start 21.000000, end 24.000000
    Chapter #0:8: start 24.000000, end 27.000000
    Chapter #0:9: start 27.000000, end 30.000000
    Chapter #0:10: start 30.000000, end 33.000000
    Chapter #0:11: start 33.000000, end 36.000000
    Chapter #0:12: start 36.000000, end 39.000000
    Chapter #0:13: start 39.000000, end 42.000000
    Chapter #0:14: start 42.000000, end 45.000000
    Chapter #0:15: start 45.000000, end 48.000000
    Chapter #0:16: start 48.000000, end 51.000000
    Chapter #0:17: start 51.000000, end 54.000000
    Chapter #0:18: start 54.000000, end 57.000000
    Chapter #0:19: start 57.000000, end 60.000000
    Chapter #0:20: start 60.000000, end 63.000000
    Chapter #0:21: start 63.000000, end 66.000000
    Chapter #0:22: start 66.000000, end 69.000000
    Chapter #0:23: start 69.000000, end 72.000000
    Chapter #0:24: start 72.000000, end 75.000000
    Chapter #0:25: start 75.000000, end 78.000000
    Chapter #0:26: start 78.000000, end 81.000000
    Chapter #0:27: start 81.000000, end 84.000000
    Chapter #0:28: start 84.000000, end 87.000000
    Chapter #0:29: start 87.000000, end 90.000000
    Chapter #0:30: start 90.000000, end 93.000000
    Chapter #0:31: start 93.000000, end 96.000000
  Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 48000 Hz, stereo, flt, 3072 kb/s
Output #0, null, to '/dev/null':
  Metadata:
    encoder         : Lavf58.76.100
  Chapters:
    Chapter #0:0: start 0.000000, end 0.000000
  Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 48000 Hz, stereo, flt, 3072 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
size=N/A time=00:00:00.00 bitrate=N/A speed=   0x    
video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Output file is empty, nothing was encoded (check -ss / -t / -frames parameters if used)

DEBUG: Found duration: 96.0 s
DEBUG: Found audio stream at index 0
INFO: Normalizing file vppr0F.wav (1 of 1)
DEBUG: Running normalization for vppr0F.wav
DEBUG: Parsing normalization info for vppr0F.wav
INFO: Running first pass loudnorm filter for stream 0
DEBUG: Running command: /opt/homebrew/opt/ffmpeg@4/bin/ffmpeg -hide_banner -y -i vppr0F.wav -map 0:0 -filter_complex '[0:0]loudnorm=i=-23.0:lra=7.0:tp=-2.0:offset=0.0:print_format=json' -vn -sn -f null /dev/null
DEBUG: ffmpeg output: Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'vppr0F.wav':
Duration: 00:01:36.00, bitrate: 3072 kb/s
Chapters:
Chapter #0:0: start 0.000000, end 3.000000
Chapter #0:1: start 3.000000, end 6.000000
Chapter #0:2: start 6.000000, end 9.000000
Chapter #0:3: start 9.000000, end 12.000000
Chapter #0:4: start 12.000000, end 15.000000
Chapter #0:5: start 15.000000, end 18.000000
Chapter #0:6: start 18.000000, end 21.000000
Chapter #0:7: start 21.000000, end 24.000000
Chapter #0:8: start 24.000000, end 27.000000
Chapter #0:9: start 27.000000, end 30.000000
Chapter #0:10: start 30.000000, end 33.000000
Chapter #0:11: start 33.000000, end 36.000000
Chapter #0:12: start 36.000000, end 39.000000
Chapter #0:13: start 39.000000, end 42.000000
Chapter #0:14: start 42.000000, end 45.000000
Chapter #0:15: start 45.000000, end 48.000000
Chapter #0:16: start 48.000000, end 51.000000
Chapter #0:17: start 51.000000, end 54.000000
Chapter #0:18: start 54.000000, end 57.000000
Chapter #0:19: start 57.000000, end 60.000000
Chapter #0:20: start 60.000000, end 63.000000
Chapter #0:21: start 63.000000, end 66.000000
Chapter #0:22: start 66.000000, end 69.000000
Chapter #0:23: start 69.000000, end 72.000000
Chapter #0:24: start 72.000000, end 75.000000
Chapter #0:25: start 75.000000, end 78.000000
Chapter #0:26: start 78.000000, end 81.000000
Chapter #0:27: start 81.000000, end 84.000000
Chapter #0:28: start 84.000000, end 87.000000
Chapter #0:29: start 87.000000, end 90.000000
Chapter #0:30: start 90.000000, end 93.000000
Chapter #0:31: start 93.000000, end 96.000000
Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 48000 Hz, stereo, flt, 3072 kb/s
Stream mapping:
Stream #0:0 (pcm_f32le) -> loudnorm
loudnorm -> Stream #0:0 (pcm_s16le)
Press [q] to stop, [?] for help
Output #0, null, to '/dev/null':
Metadata:
encoder         : Lavf58.76.100
Chapters:
Chapter #0:0: start 0.000000, end 3.000000
Chapter #0:1: start 3.000000, end 6.000000
Chapter #0:2: start 6.000000, end 9.000000
Chapter #0:3: start 9.000000, end 12.000000
Chapter #0:4: start 12.000000, end 15.000000
Chapter #0:5: start 15.000000, end 18.000000
Chapter #0:6: start 18.000000, end 21.000000
Chapter #0:7: start 21.000000, end 24.000000
Chapter #0:8: start 24.000000, end 27.000000
Chapter #0:9: start 27.000000, end 30.000000
Chapter #0:10: start 30.000000, end 33.000000
Chapter #0:11: start 33.000000, end 36.000000
Chapter #0:12: start 36.000000, end 39.000000
Chapter #0:13: start 39.000000, end 42.000000
Chapter #0:14: start 42.000000, end 45.000000
Chapter #0:15: start 45.000000, end 48.000000
Chapter #0:16: start 48.000000, end 51.000000
Chapter #0:17: start 51.000000, end 54.000000
Chapter #0:18: start 54.000000, end 57.000000
Chapter #0:19: start 57.000000, end 60.000000
Chapter #0:20: start 60.000000, end 63.000000
Chapter #0:21: start 63.000000, end 66.000000
Chapter #0:22: start 66.000000, end 69.000000
Chapter #0:23: start 69.000000, end 72.000000
Chapter #0:24: start 72.000000, end 75.000000
Chapter #0:25: start 75.000000, end 78.000000
Chapter #0:26: start 78.000000, end 81.000000
Chapter #0:27: start 81.000000, end 84.000000
Chapter #0:28: start 84.000000, end 87.000000
Chapter #0:29: start 87.000000, end 90.000000
Chapter #0:30: start 90.000000, end 93.000000
Chapter #0:31: start 93.000000, end 96.000000
Stream #0:0: Audio: pcm_s16le, 192000 Hz, stereo, s16, 6144 kb/s
Metadata:
encoder         : Lavc58.134.100 pcm_s16le
video:0kB audio:72000kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_loudnorm_0 @ 0x600002698b00]
{
"input_i" : "-12.37",
"input_tp" : "-0.02",
"input_lra" : "2.90",
"input_thresh" : "-22.37",
"output_i" : "-22.70",
"output_tp" : "-9.73",
"output_lra" : "2.90",
"output_thresh" : "-32.70",
"normalization_type" : "dynamic",
"target_offset" : "-0.30"
}
DEBUG: Loudnorm first pass command output: Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'vppr0F.wav':
Duration: 00:01:36.00, bitrate: 3072 kb/s
Chapters:
Chapter #0:0: start 0.000000, end 3.000000
Chapter #0:1: start 3.000000, end 6.000000
Chapter #0:2: start 6.000000, end 9.000000
Chapter #0:3: start 9.000000, end 12.000000
Chapter #0:4: start 12.000000, end 15.000000
Chapter #0:5: start 15.000000, end 18.000000
Chapter #0:6: start 18.000000, end 21.000000
Chapter #0:7: start 21.000000, end 24.000000
Chapter #0:8: start 24.000000, end 27.000000
Chapter #0:9: start 27.000000, end 30.000000
Chapter #0:10: start 30.000000, end 33.000000
Chapter #0:11: start 33.000000, end 36.000000
Chapter #0:12: start 36.000000, end 39.000000
Chapter #0:13: start 39.000000, end 42.000000
Chapter #0:14: start 42.000000, end 45.000000
Chapter #0:15: start 45.000000, end 48.000000
Chapter #0:16: start 48.000000, end 51.000000
Chapter #0:17: start 51.000000, end 54.000000
Chapter #0:18: start 54.000000, end 57.000000
Chapter #0:19: start 57.000000, end 60.000000
Chapter #0:20: start 60.000000, end 63.000000
Chapter #0:21: start 63.000000, end 66.000000
Chapter #0:22: start 66.000000, end 69.000000
Chapter #0:23: start 69.000000, end 72.000000
Chapter #0:24: start 72.000000, end 75.000000
Chapter #0:25: start 75.000000, end 78.000000
Chapter #0:26: start 78.000000, end 81.000000
Chapter #0:27: start 81.000000, end 84.000000
Chapter #0:28: start 84.000000, end 87.000000
Chapter #0:29: start 87.000000, end 90.000000
Chapter #0:30: start 90.000000, end 93.000000
Chapter #0:31: start 93.000000, end 96.000000
Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 48000 Hz, stereo, flt, 3072 kb/s
Stream mapping:
Stream #0:0 (pcm_f32le) -> loudnorm
loudnorm -> Stream #0:0 (pcm_s16le)
Press [q] to stop, [?] for help
Output #0, null, to '/dev/null':
Metadata:
encoder         : Lavf58.76.100
Chapters:
Chapter #0:0: start 0.000000, end 3.000000
Chapter #0:1: start 3.000000, end 6.000000
Chapter #0:2: start 6.000000, end 9.000000
Chapter #0:3: start 9.000000, end 12.000000
Chapter #0:4: start 12.000000, end 15.000000
Chapter #0:5: start 15.000000, end 18.000000
Chapter #0:6: start 18.000000, end 21.000000
Chapter #0:7: start 21.000000, end 24.000000
Chapter #0:8: start 24.000000, end 27.000000
Chapter #0:9: start 27.000000, end 30.000000
Chapter #0:10: start 30.000000, end 33.000000
Chapter #0:11: start 33.000000, end 36.000000
Chapter #0:12: start 36.000000, end 39.000000
Chapter #0:13: start 39.000000, end 42.000000
Chapter #0:14: start 42.000000, end 45.000000
Chapter #0:15: start 45.000000, end 48.000000
Chapter #0:16: start 48.000000, end 51.000000
Chapter #0:17: start 51.000000, end 54.000000
Chapter #0:18: start 54.000000, end 57.000000
Chapter #0:19: start 57.000000, end 60.000000
Chapter #0:20: start 60.000000, end 63.000000
Chapter #0:21: start 63.000000, end 66.000000
Chapter #0:22: start 66.000000, end 69.000000
Chapter #0:23: start 69.000000, end 72.000000
Chapter #0:24: start 72.000000, end 75.000000
Chapter #0:25: start 75.000000, end 78.000000
Chapter #0:26: start 78.000000, end 81.000000
Chapter #0:27: start 81.000000, end 84.000000
Chapter #0:28: start 84.000000, end 87.000000
Chapter #0:29: start 87.000000, end 90.000000
Chapter #0:30: start 90.000000, end 93.000000
Chapter #0:31: start 93.000000, end 96.000000
Stream #0:0: Audio: pcm_s16le, 192000 Hz, stereo, s16, 6144 kb/s
Metadata:
encoder         : Lavc58.134.100 pcm_s16le
video:0kB audio:72000kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_loudnorm_0 @ 0x600002698b00]
{
"input_i" : "-12.37",
"input_tp" : "-0.02",
"input_lra" : "2.90",
"input_thresh" : "-22.37",
"output_i" : "-22.70",
"output_tp" : "-9.73",
"output_lra" : "2.90",
"output_thresh" : "-32.70",
"normalization_type" : "dynamic",
"target_offset" : "-0.30"
}
DEBUG: Parsing loudnorm stats for stream 0
DEBUG: Loudnorm stats for stream 0 parsed: {"input_i": "-12.37", "input_tp": "-0.02", "input_lra": "2.90", "input_thresh": "-22.37", "output_i": "-22.70", "output_tp": "-9.73", "output_lra": "2.90", "output_thresh": "-32.70", "normalization_type": "dynamic", "target_offset": "-0.30"}
INFO: Running second pass for vppr0F.wav
DEBUG: Running command: /opt/homebrew/opt/ffmpeg@4/bin/ffmpeg -hide_banner -y -i vppr0F.wav -filter_complex '[0:0]loudnorm=i=-23.0:lra=7.0:tp=-2.0:offset=-0.3:measured_i=-12.37:measured_lra=2.9:measured_tp=-0.02:measured_thresh=-22.37:linear=true:print_format=json[norm0]' -map_metadata 0 -map_metadata:s:a:0 0:s:a:0 -map_chapters 0 -map '[norm0]' -c:a mp3 -b:a 256k -c:s copy /var/folders/vn/6l70ygvj61n196pmkxv4jzyw0000gn/T/tmpdtvsubog/out.mp3
DEBUG: ffmpeg output: Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'vppr0F.wav':
Duration: 00:01:36.00, bitrate: 3072 kb/s
Chapters:
Chapter #0:0: start 0.000000, end 3.000000
Chapter #0:1: start 3.000000, end 6.000000
Chapter #0:2: start 6.000000, end 9.000000
Chapter #0:3: start 9.000000, end 12.000000
Chapter #0:4: start 12.000000, end 15.000000
Chapter #0:5: start 15.000000, end 18.000000
Chapter #0:6: start 18.000000, end 21.000000
Chapter #0:7: start 21.000000, end 24.000000
Chapter #0:8: start 24.000000, end 27.000000
Chapter #0:9: start 27.000000, end 30.000000
Chapter #0:10: start 30.000000, end 33.000000
Chapter #0:11: start 33.000000, end 36.000000
Chapter #0:12: start 36.000000, end 39.000000
Chapter #0:13: start 39.000000, end 42.000000
Chapter #0:14: start 42.000000, end 45.000000
Chapter #0:15: start 45.000000, end 48.000000
Chapter #0:16: start 48.000000, end 51.000000
Chapter #0:17: start 51.000000, end 54.000000
Chapter #0:18: start 54.000000, end 57.000000
Chapter #0:19: start 57.000000, end 60.000000
Chapter #0:20: start 60.000000, end 63.000000
Chapter #0:21: start 63.000000, end 66.000000
Chapter #0:22: start 66.000000, end 69.000000
Chapter #0:23: start 69.000000, end 72.000000
Chapter #0:24: start 72.000000, end 75.000000
Chapter #0:25: start 75.000000, end 78.000000
Chapter #0:26: start 78.000000, end 81.000000
Chapter #0:27: start 81.000000, end 84.000000
Chapter #0:28: start 84.000000, end 87.000000
Chapter #0:29: start 87.000000, end 90.000000
Chapter #0:30: start 90.000000, end 93.000000
Chapter #0:31: start 93.000000, end 96.000000
Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 48000 Hz, stereo, flt, 3072 kb/s
Stream mapping:
Stream #0:0 (pcm_f32le) -> loudnorm
loudnorm -> Stream #0:0 (libmp3lame)
Press [q] to stop, [?] for help
Output #0, mp3, to '/var/folders/vn/6l70ygvj61n196pmkxv4jzyw0000gn/T/tmpdtvsubog/out.mp3':
Metadata:
TSSE            : Lavf58.76.100
Chapters:
Chapter #0:0: start 0.000000, end 3.000000
Chapter #0:1: start 3.000000, end 6.000000
Chapter #0:2: start 6.000000, end 9.000000
Chapter #0:3: start 9.000000, end 12.000000
Chapter #0:4: start 12.000000, end 15.000000
Chapter #0:5: start 15.000000, end 18.000000
Chapter #0:6: start 18.000000, end 21.000000
Chapter #0:7: start 21.000000, end 24.000000
Chapter #0:8: start 24.000000, end 27.000000
Chapter #0:9: start 27.000000, end 30.000000
Chapter #0:10: start 30.000000, end 33.000000
Chapter #0:11: start 33.000000, end 36.000000
Chapter #0:12: start 36.000000, end 39.000000
Chapter #0:13: start 39.000000, end 42.000000
Chapter #0:14: start 42.000000, end 45.000000
Chapter #0:15: start 45.000000, end 48.000000
Chapter #0:16: start 48.000000, end 51.000000
Chapter #0:17: start 51.000000, end 54.000000
Chapter #0:18: start 54.000000, end 57.000000
Chapter #0:19: start 57.000000, end 60.000000
Chapter #0:20: start 60.000000, end 63.000000
Chapter #0:21: start 63.000000, end 66.000000
Chapter #0:22: start 66.000000, end 69.000000
Chapter #0:23: start 69.000000, end 72.000000
Chapter #0:24: start 72.000000, end 75.000000
Chapter #0:25: start 75.000000, end 78.000000
Chapter #0:26: start 78.000000, end 81.000000
Chapter #0:27: start 81.000000, end 84.000000
Chapter #0:28: start 84.000000, end 87.000000
Chapter #0:29: start 87.000000, end 90.000000
Chapter #0:30: start 90.000000, end 93.000000
Chapter #0:31: start 93.000000, end 96.000000
Stream #0:0: Audio: mp3, 48000 Hz, stereo, fltp, 256 kb/s
Metadata:
encoder         : Lavc58.134.100 libmp3lame
video:0kB audio:3001kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.063819%
[Parsed_loudnorm_0 @ 0x60000242c000]
{
"input_i" : "-12.31",
"input_tp" : "-0.02",
"input_lra" : "2.90",
"input_thresh" : "-22.31",
"output_i" : "-22.94",
"output_tp" : "-10.65",
"output_lra" : "2.90",
"output_thresh" : "-32.94",
"normalization_type" : "linear",
"target_offset" : "-0.06"
}
DEBUG: Moving temporary file from /var/folders/vn/6l70ygvj61n196pmkxv4jzyw0000gn/T/tmpdtvsubog/out.mp3 to slr-vppr0Fb.mp3
DEBUG: Parsing loudnorm stats for stream 0
DEBUG: Loudnorm stats for stream 0 parsed: {"input_i": "-12.31", "input_tp": "-0.02", "input_lra": "2.90", "input_thresh": "-22.31", "output_i": "-22.94", "output_tp": "-10.65", "output_lra": "2.90", "output_thresh": "-32.94", "normalization_type": "linear", "target_offset": "-0.06"}
DEBUG: Normalization finished
INFO: Normalized file written to slr-vppr0Fb.mp3

Environment (please complete the following information):

Original wave:

file-original

Badly normalized wave:

file-badly-normalized
slhck commented 3 months ago

The default normalization target is -23 LUFS. Since your input is louder than that, it will become lowered in volume.

Perhaps you need peak normalization instead?

slhck commented 3 months ago

PS: Your ffmpeg version is quite old, please update to a newer one.