slhck / ffmpeg-normalize

Audio Normalization for Python/ffmpeg
MIT License
1.25k stars 117 forks source link

Reverting to dynamic normalization regardless of the command I try #245

Closed thennicke closed 11 months ago

thennicke commented 1 year ago

Checklist

Expected behavior Linear normalization selected (this is for an acoustic track and the compressor used in dynamic normalization sounds like trash in the loud sections)

Actual behavior Dynamic normalization selected, irrespective of whether I run the command with --keep-lra-above-loudness-range-target or with --keep-loudness-range-target, or if I manually select an output lra that is larger than the reported lra

Command The exact command you were trying to run:

fmpeg-normalize input.MOV -p --keep-lra-above-loudness-range-target -tp 0 -t -14 -n

Any output you get when running the command with the --debug flag:

DEBUG: Running command: ['/usr/bin/ffmpeg', '-filters']
DEBUG: Parsing streams of input.MOV
DEBUG: Running command: ['/usr/bin/ffmpeg', '-i', 'input.MOV', '-c', 'copy', '-t', '0', '-map', '0', '-f', 'null', '/dev/null']
DEBUG: Stream parsing command output:
DEBUG: ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 13 (GCC)
  configuration: --prefix=/usr --bindir=/usr/bin --datadir=/usr/share/ffmpeg --docdir=/usr/share/doc/ffmpeg --incdir=/usr/include/ffmpeg --libdir=/usr/lib64 --mandir=/usr/share/man --arch=x86_64 --optflags='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer' --extra-ldflags='-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 ' --extra-cflags=' -I/usr/include/rav1e' --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libvo-amrwbenc --enable-version3 --enable-bzlib --enable-chromaprint --disable-crystalhd --enable-fontconfig --enable-frei0r --enable-gcrypt --enable-gnutls --enable-ladspa --enable-libaom --enable-libdav1d --enable-libass --enable-libbluray --enable-libbs2b --enable-libcdio --enable-libdrm --enable-libjack --enable-libjxl --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libmysofa --enable-nvenc --enable-openal --enable-opencl --enable-opengl --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-libplacebo --enable-librsvg --enable-librav1e --enable-librubberband --enable-libsmbclient --enable-version3 --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libv4l2 --enable-libvidstab --enable-libvmaf --enable-version3 --enable-vapoursynth --enable-libvpx --enable-vulkan --enable-libshaderc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libxml2 --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-avfilter --enable-libmodplug --enable-postproc --enable-pthreads --disable-static --enable-shared --enable-gpl --disable-debug --disable-stripping --shlibdir=/usr/lib64 --enable-lto --enable-libvpl --enable-runtime-cpudetect
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x55fa16228600] st: 0 edit list: 1 Missing key frame while searching for timestamp: 1001
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x55fa16228600] st: 0 edit list 1 Cannot find an index entry before timestamp: 1001.
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.MOV':
  Metadata:
    major_brand     : qt  
    minor_version   : 537331968
    compatible_brands: qt  CAEP
    com.apple.quicktime.make: Canon
    com.apple.quicktime.model: Canon EOS 6D
    com.apple.quicktime.rating.user: 0.000000
    creation_time   : 2023-08-16T12:35:36.000000Z
  Duration: 00:04:24.53, start: 0.000000, bitrate: 31473 kb/s
  Stream #0:0[0x1](eng): Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 29927 kb/s, 29.97 fps, 29.97 tbr, 30k tbn (default)
    Metadata:
      creation_time   : 2023-08-16T12:35:36.000000Z
      vendor_id       : [0][0][0][0]
      timecode        : 07:51:47;17
  Stream #0:1[0x2](eng): Audio: pcm_s16le (sowt / 0x74776F73), 48000 Hz, stereo, s16, 1536 kb/s (default)
    Metadata:
      creation_time   : 2023-08-16T12:35:36.000000Z
      vendor_id       : [0][0][0][0]
      timecode        : 07:51:47;17
  Stream #0:2[0x3](eng): Data: none (tmcd / 0x64636D74), 0 kb/s (default)
    Metadata:
      creation_time   : 2023-08-16T12:35:36.000000Z
      timecode        : 07:51:47;17
Output #0, null, to '/dev/null':
  Metadata:
    major_brand     : qt  
    minor_version   : 537331968
    compatible_brands: qt  CAEP
    com.apple.quicktime.make: Canon
    com.apple.quicktime.model: Canon EOS 6D
    com.apple.quicktime.rating.user: 0.000000
    encoder         : Lavf60.3.100
  Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 29927 kb/s, 29.97 fps, 29.97 tbr, 30k tbn (default)
    Metadata:
      creation_time   : 2023-08-16T12:35:36.000000Z
      vendor_id       : [0][0][0][0]
      timecode        : 07:51:47;17
  Stream #0:1(eng): Audio: pcm_s16le (sowt / 0x74776F73), 48000 Hz, stereo, s16, 1536 kb/s (default)
    Metadata:
      creation_time   : 2023-08-16T12:35:36.000000Z
      vendor_id       : [0][0][0][0]
      timecode        : 07:51:47;17
  Stream #0:2(eng): Data: none (tmcd / 0x64636D74), 0 kb/s (default)
    Metadata:
      creation_time   : 2023-08-16T12:35:36.000000Z
      timecode        : 07:51:47;17
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
  Stream #0:2 -> #0:2 (copy)
Press [q] to stop, [?] for help
frame=    1 fps=0.0 q=-1.0 Lsize=N/A time=-00:00:00.03 bitrate=N/A speed=N/A    bits/s speed=N/A    
video:227kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

DEBUG: Found duration: 264.053 s
DEBUG: Found video stream at index 0
DEBUG: Found audio stream at index 1
INFO: Normalizing file input.MOV (1 of 1)
DEBUG: Running normalization for input.MOV
DEBUG: Parsing normalization info for input.MOV
INFO: Running first pass loudnorm filter for stream 1
DEBUG: Running command: ['/usr/bin/ffmpeg', '-hide_banner', '-y', '-i', 'input.MOV', '-filter_complex', '[0:1]loudnorm=i=-14.0:lra=7.0:tp=0.0:offset=0.0:print_format=json', '-vn', '-sn', '-f', 'null', '/dev/null']
DEBUG: ffmpeg output: [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55a986c3de80] st: 0 edit list: 1 Missing key frame while searching for timestamp: 1001
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x55a986c3de80] st: 0 edit list 1 Cannot find an index entry before timestamp: 1001.
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.MOV':
Metadata:
major_brand     : qt
minor_version   : 537331968
compatible_brands: qt  CAEP
com.apple.quicktime.make: Canon
com.apple.quicktime.model: Canon EOS 6D
com.apple.quicktime.rating.user: 0.000000
creation_time   : 2023-08-16T12:35:36.000000Z
Duration: 00:04:24.53, start: 0.000000, bitrate: 31473 kb/s
Stream #0:0[0x1](eng): Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 29927 kb/s, 29.97 fps, 29.97 tbr, 30k tbn (default)
Metadata:
creation_time   : 2023-08-16T12:35:36.000000Z
vendor_id       : [0][0][0][0]
timecode        : 07:51:47;17
Stream #0:1[0x2](eng): Audio: pcm_s16le (sowt / 0x74776F73), 48000 Hz, stereo, s16, 1536 kb/s (default)
Metadata:
creation_time   : 2023-08-16T12:35:36.000000Z
vendor_id       : [0][0][0][0]
timecode        : 07:51:47;17
Stream #0:2[0x3](eng): Data: none (tmcd / 0x64636D74), 0 kb/s (default)
Metadata:
creation_time   : 2023-08-16T12:35:36.000000Z
timecode        : 07:51:47;17
Stream mapping:
Stream #0:1 (pcm_s16le) -> loudnorm:default
loudnorm:default -> Stream #0:0 (pcm_s16le)
Press [q] to stop, [?] for help
Output #0, null, to '/dev/null':
Metadata:
major_brand     : qt
minor_version   : 537331968
compatible_brands: qt  CAEP
com.apple.quicktime.make: Canon
com.apple.quicktime.model: Canon EOS 6D
com.apple.quicktime.rating.user: 0.000000
encoder         : Lavf60.3.100
Stream #0:0: Audio: pcm_s16le, 192000 Hz, stereo, s16, 6144 kb/s
Metadata:
encoder         : Lavc60.3.100 pcm_s16le
video:0kB audio:198398kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_loudnorm_0 @ 0x55a986cace00]
{
"input_i" : "-15.19",
"input_tp" : "0.05",
"input_lra" : "22.60",
"input_thresh" : "-27.42",
"output_i" : "-12.27",
"output_tp" : "+0.00",
"output_lra" : "16.80",
"output_thresh" : "-23.17",
"normalization_type" : "dynamic",
"target_offset" : "-1.73"
}

DEBUG: Loudnorm first pass command output: [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55a986c3de80] st: 0 edit list: 1 Missing key frame while searching for timestamp: 1001
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x55a986c3de80] st: 0 edit list 1 Cannot find an index entry before timestamp: 1001.
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.MOV':
Metadata:
major_brand     : qt
minor_version   : 537331968
compatible_brands: qt  CAEP
com.apple.quicktime.make: Canon
com.apple.quicktime.model: Canon EOS 6D
com.apple.quicktime.rating.user: 0.000000
creation_time   : 2023-08-16T12:35:36.000000Z
Duration: 00:04:24.53, start: 0.000000, bitrate: 31473 kb/s
Stream #0:0[0x1](eng): Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 29927 kb/s, 29.97 fps, 29.97 tbr, 30k tbn (default)
Metadata:
creation_time   : 2023-08-16T12:35:36.000000Z
vendor_id       : [0][0][0][0]
timecode        : 07:51:47;17
Stream #0:1[0x2](eng): Audio: pcm_s16le (sowt / 0x74776F73), 48000 Hz, stereo, s16, 1536 kb/s (default)
Metadata:
creation_time   : 2023-08-16T12:35:36.000000Z
vendor_id       : [0][0][0][0]
timecode        : 07:51:47;17
Stream #0:2[0x3](eng): Data: none (tmcd / 0x64636D74), 0 kb/s (default)
Metadata:
creation_time   : 2023-08-16T12:35:36.000000Z
timecode        : 07:51:47;17
Stream mapping:
Stream #0:1 (pcm_s16le) -> loudnorm:default
loudnorm:default -> Stream #0:0 (pcm_s16le)
Press [q] to stop, [?] for help
Output #0, null, to '/dev/null':
Metadata:
major_brand     : qt
minor_version   : 537331968
compatible_brands: qt  CAEP
com.apple.quicktime.make: Canon
com.apple.quicktime.model: Canon EOS 6D
com.apple.quicktime.rating.user: 0.000000
encoder         : Lavf60.3.100
Stream #0:0: Audio: pcm_s16le, 192000 Hz, stereo, s16, 6144 kb/s
Metadata:
encoder         : Lavc60.3.100 pcm_s16le
video:0kB audio:198398kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_loudnorm_0 @ 0x55a986cace00]
{
"input_i" : "-15.19",
"input_tp" : "0.05",
"input_lra" : "22.60",
"input_thresh" : "-27.42",
"output_i" : "-12.27",
"output_tp" : "+0.00",
"output_lra" : "16.80",
"output_thresh" : "-23.17",
"normalization_type" : "dynamic",
"target_offset" : "-1.73"
}

DEBUG: Loudnorm stats parsed: {"input_i": "-15.19", "input_tp": "0.05", "input_lra": "22.60", "input_thresh": "-27.42", "output_i": "-12.27", "output_tp": "+0.00", "output_lra": "16.80", "output_thresh": "-23.17", "normalization_type": "dynamic", "target_offset": "-1.73"}
INFO: Running second pass for input.MOV
DEBUG: Keeping target loudness range in second pass loudnorm filter
DEBUG: Running command: ['/usr/bin/ffmpeg', '-hide_banner', '-y', '-i', 'input.MOV', '-filter_complex', '[0:1]loudnorm=i=-14.0:lra=22.6:tp=0.0:offset=-1.73:measured_i=-15.19:measured_lra=22.6:measured_tp=0.05:measured_thresh=-27.42:linear=true:print_format=json[norm1]', '-map_metadata', '0', '-map_metadata:s:a:0', '0:s:a:0', '-map_metadata:s:v:0', '0:s:v:0', '-map_chapters', '0', '-map', '0:0', '-c:v', 'copy', '-map', '[norm1]', '-c:a:0', 'pcm_s16le', '-c:s', 'copy', 'bug', 'normalized/input.mkv']
DEBUG: Dry mode specified, not actually running command
INFO: Normalized file written to normalized/input.mkv

operating system: Fedora 38 Workstation Python version: 3.11.4 ffmpeg version: 6.0

slhck commented 1 year ago

Thanks for providing the logs. As far as I can see the target loudness range is set correctly. The ffmpeg docs however say:

the change in integrated loudness shouldn’t result in a true peak which exceeds the target TP

You set tp=0, but measured_tp=0.05, meaning that the target is exceeded. Maybe this is the case here? (Interesting that the input file has a positive measured true peak…)

I am actually not sure if I have a good set of test clips and settings here with which I can check the linear/dynamic behavior.

thennicke commented 1 year ago

I'm happy to upload the raw audio from this video file (just me practising a piano + vocals cover) if it can help you diagnose the bug. It's not possible for me to set tp=0.05 in ffmpeg-normalize, so at the moment this program cannot do a linear normalisation on this file.

Edit: Here's the raw file. It's pcm so I had to keep it in the MOV container. 50MB in size roughly.

slhck commented 1 year ago

You can't set a TP larger than 0 for the filter, that's why that won't work. I will be actually am on vacation right now, so I won't get to this before September.

Realistically I would propose you run the raw commands through ffmpeg itself (using the debug-logged commands), and attempt to get it to normalize linearly. The ffmpeg-normalize wrapper just tries to help set the right values, and somehow I'm afraid it can't be done for that particular file.

(Maybe if you first lower the overall volume with an RMS normalization?)

thennicke commented 1 year ago

So I tried reducing the overall volume with an RMS normalisation, as you suggested, and that worked to get input_tp<=0. And yet output_lra is still not working properly: if I use --keep-loudness-range-target then it goes to 16.8 rather than the measured input of 22.60 (this must be a bug) and if I try to manually set it with -lrt 25 then it goes to 10 (another bug). So still unable to do a linear normalization using this program. Very strange behaviour. I might try using ffmpeg directly later on.

I'll leave the audio in my google drive for you to download and play with when you get back, it's no rush.

richardpl commented 1 year ago

I get linear processing, once you set high enough target LRA and enough low target Integrated:

ffmpeg -i ~/Downloads/testaudio.MOV -af loudnorm=measured_tp=0.0:measured_lra=22.8:lra=22.8:measured_i=-15.2:measured_thresh=-37.4:print_format=summary:tp=0.0 -f null -

Not a bug, it happens because target integrated loudness is set to too high value (toward zero). You can not have linear processing in that case.

There is nothing interesting in True Peak being positive (> 0), that is normal when resampling to higher rate and with audio that clipped already.

Formula is quite simple: If Target True Peak is lower than (Measured True Peak + Target Integrated - Measured Integrated) filter will use dynamic processing.

thennicke commented 11 months ago

I get linear processing, once you set high enough target LRA and enough low target Integrated:

ffmpeg -i ~/Downloads/testaudio.MOV -af loudnorm=measured_tp=0.0:measured_lra=22.8:lra=22.8:measured_i=-15.2:measured_thresh=-37.4:print_format=summary:tp=0.0 -f null -

Not a bug, it happens because target integrated loudness is set to too high value (toward zero). You can not have linear processing in that case.

There is nothing interesting in True Peak being positive (> 0), that is normal when resampling to higher rate and with audio that clipped already.

Formula is quite simple: If Target True Peak is lower than (Measured True Peak + Target Integrated - Measured Integrated) filter will use dynamic processing.

You're quite right. I completely misunderstood what the algorithm was doing and thought I had a lot more headroom than I did. Closing this because it's not actually a bug.