slhck / ffmpeg-normalize

Audio Normalization for Python/ffmpeg
MIT License
1.28k stars 118 forks source link

How to avoid a (sometimes huge) increase in file size? #272

Closed anderslundstedt closed 4 days ago

anderslundstedt commented 4 days ago

I want to normalize a file input.mkv. I use the command:

ffmpeg-normalize --verbose --print-stats --output rms_normalized_to_-10.mkv --normalization-type rms --target-level -10 input.mkv > std_out.txt 2> std_err.txt

File sizes of input.mkv and rms_normalized_to_-10.mkv, respectively:

% ls -lah *mkv
-rw-r--r-- 1 anders staff 1.5G Nov 19 17:44 input.mkv
-rw-r--r-- 1 anders staff 5.2G Nov 19 18:27 rms_normalized_to_-10.mkv

Is there a way to avoid this huge increase in file size?

Additional context

std_err.txt from the command above:

INFO: Normalizing file input.mkv (1 of 1)
INFO: Running first pass astats filter for stream 1
INFO: Running second pass for input.mkv
INFO: Adjusting stream 1 by 15.850762 dB to reach -10.0
WARNING: Adjusting will lead to clipping of 12.056229 dB
INFO: Normalized file written to rms_normalized_to_-10.mkv

std_out.txt from the command above:

[
    {
        "input_file": "input.mkv",
        "output_file": "rms_normalized_to_-10.mkv",
        "stream_id": 1,
        "ebu_pass1": null,
        "ebu_pass2": null,
        "mean": -25.850762,
        "max": -3.794533
    }
]

% ffprobe input.mkv:

ffprobe version 7.1 Copyright (c) 2007-2024 the FFmpeg developers
  built with clang version 16.0.6
  configuration: --disable-static --prefix=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-full-7.1 --target_os=darwin --arch=aarch64 --pkg-config=pkg-config --enable-gpl --enable-version3 --disable-nonfree --disable-static --enable-shared --enable-pic --disable-thumb --disable-small --enable-runtime-cpudetect --enable-gray --enable-swscale-alpha --enable-hardcoded-tables --enable-safe-bitstream-reader --enable-pthreads --disable-w32threads --disable-os2threads --enable-network --enable-pixelutils --datadir=/nix/store/whm357qimhda1c0h0sfw6611k8i70rb9-ffmpeg-full-7.1-data/share/ffmpeg --enable-ffmpeg --enable-ffplay --enable-ffprobe --bindir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-full-7.1-bin/bin --enable-avcodec --enable-avdevice --enable-avfilter --enable-avformat --enable-avutil --enable-postproc --enable-swresample --enable-swscale --libdir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-full-7.1-lib/lib --incdir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-full-7.1-dev/include --enable-doc --enable-htmlpages --enable-manpages --mandir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-full-7.1-man/share/man --enable-podpages --enable-txtpages --docdir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-full-7.1-doc/share/doc/ffmpeg --disable-alsa --disable-amf --enable-libaom --enable-appkit --enable-libaribb24 --enable-libaribcaption --enable-libass --enable-audiotoolbox --enable-avfoundation --enable-avisynth --enable-libbluray --enable-libbs2b --enable-bzlib --enable-libcaca --enable-libcdio --enable-libcelt --enable-chromaprint --enable-libcodec2 --enable-coreimage --disable-cuda --enable-cuda-llvm --disable-cuda-nvcc --disable-cuvid --enable-libdav1d --disable-libdc1394 --disable-libdrm --enable-libdvdnav --enable-libdvdread --disable-libfdk-aac --disable-ffnvcodec --enable-libflite --enable-fontconfig --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libfribidi --enable-libgme --enable-gnutls --enable-libgsm --enable-libharfbuzz --enable-iconv --enable-libilbc --disable-libjack --enable-libjxl --enable-ladspa --enable-lcms2 --enable-lzma --disable-metal --disable-libmfx --disable-libmodplug --enable-libmp3lame --enable-libmysofa --disable-libnpp --disable-nvdec --disable-nvenc --enable-openal --enable-opencl --enable-libopencore-amrnb --enable-libopencore-amrwb --disable-opengl --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-libopus --disable-libplacebo --disable-libpulse --enable-libqrencode --enable-libquirc --enable-librav1e --enable-librtmp --enable-librubberband --disable-libsmbclient --enable-sdl2 --disable-libshaderc --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-librsvg --disable-libsvtav1 --disable-libtensorflow --enable-libtheora --enable-libtwolame --disable-libv4l2 --disable-v4l2-m2m --disable-vaapi --enable-vdpau --disable-libvpl --enable-videotoolbox --enable-libvidstab --disable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --disable-vulkan --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxcb --enable-libxcb-shape --enable-libxcb-shm --enable-libxcb-xfixes --enable-libxevd --enable-libxeve --enable-xlib --enable-libxml2 --enable-libxvid --enable-libzimg --enable-zlib --enable-libzmq --enable-libzvbi --disable-debug --enable-optimizations --disable-extra-warnings --disable-stripping --cc=clang --cxx=clang++
  libavutil      59. 39.100 / 59. 39.100
  libavcodec     61. 19.100 / 61. 19.100
  libavformat    61.  7.100 / 61.  7.100
  libavdevice    61.  3.100 / 61.  3.100
  libavfilter    10.  4.100 / 10.  4.100
  libswscale      8.  3.100 /  8.  3.100
  libswresample   5.  3.100 /  5.  3.100
  libpostproc    58.  3.100 / 58.  3.100
Input #0, matroska,webm, from 'input.mkv':
  Metadata:
    creation_time   : 2021-12-31T14:24:29.000000Z
    ENCODER         : Lavf58.76.100
  Duration: 01:59:56.67, start: 0.008000, bitrate: 1732 kb/s
  Stream #0:0: Video: h264 (High), yuv420p(tv, bt470bg, progressive), 720x576 [SAR 64:45 DAR 16:9], SAR 1:1 DAR 5:4, 25 fps, 25 tbr, 1k tbn (default)
      Metadata:
        ENCODER         : Lavc58.134.100 libx264
        BPS             : 1492045
        DURATION        : 01:59:54.640000000
        NUMBER_OF_FRAMES: 179866
        NUMBER_OF_BYTES : 1341841549
        _STATISTICS_WRITING_APP: mkvmerge v56.0.0 ('Strasbourg / St. Denis') 64-bit
        _STATISTICS_WRITING_DATE_UTC: 2021-12-31 14:24:29
        _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
  Stream #0:1: Audio: vorbis, 48000 Hz, 5.1, fltp (default)
      Metadata:
        _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
        BPS             : 238470
        DURATION        : 01:59:56.661000000
        NUMBER_OF_FRAMES: 522081
        NUMBER_OF_BYTES : 214523891
        _STATISTICS_WRITING_APP: mkvmerge v56.0.0 ('Strasbourg / St. Denis') 64-bit
        _STATISTICS_WRITING_DATE_UTC: 2021-12-31 14:24:29
        ENCODER         : Lavc58.134.100

% ffprobe rms_normalized_to_-10.mkv:

ffprobe version 7.1 Copyright (c) 2007-2024 the FFmpeg developers
  built with clang version 16.0.6
  configuration: --disable-static --prefix=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-full-7.1 --target_os=darwin --arch=aarch64 --pkg-config=pkg-config --enable-gpl --enable-version3 --disable-nonfree --disable-static --enable-shared --enable-pic --disable-thumb --disable-small --enable-runtime-cpudetect --enable-gray --enable-swscale-alpha --enable-hardcoded-tables --enable-safe-bitstream-reader --enable-pthreads --disable-w32threads --disable-os2threads --enable-network --enable-pixelutils --datadir=/nix/store/whm357qimhda1c0h0sfw6611k8i70rb9-ffmpeg-full-7.1-data/share/ffmpeg --enable-ffmpeg --enable-ffplay --enable-ffprobe --bindir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-full-7.1-bin/bin --enable-avcodec --enable-avdevice --enable-avfilter --enable-avformat --enable-avutil --enable-postproc --enable-swresample --enable-swscale --libdir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-full-7.1-lib/lib --incdir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-full-7.1-dev/include --enable-doc --enable-htmlpages --enable-manpages --mandir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-full-7.1-man/share/man --enable-podpages --enable-txtpages --docdir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-full-7.1-doc/share/doc/ffmpeg --disable-alsa --disable-amf --enable-libaom --enable-appkit --enable-libaribb24 --enable-libaribcaption --enable-libass --enable-audiotoolbox --enable-avfoundation --enable-avisynth --enable-libbluray --enable-libbs2b --enable-bzlib --enable-libcaca --enable-libcdio --enable-libcelt --enable-chromaprint --enable-libcodec2 --enable-coreimage --disable-cuda --enable-cuda-llvm --disable-cuda-nvcc --disable-cuvid --enable-libdav1d --disable-libdc1394 --disable-libdrm --enable-libdvdnav --enable-libdvdread --disable-libfdk-aac --disable-ffnvcodec --enable-libflite --enable-fontconfig --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libfribidi --enable-libgme --enable-gnutls --enable-libgsm --enable-libharfbuzz --enable-iconv --enable-libilbc --disable-libjack --enable-libjxl --enable-ladspa --enable-lcms2 --enable-lzma --disable-metal --disable-libmfx --disable-libmodplug --enable-libmp3lame --enable-libmysofa --disable-libnpp --disable-nvdec --disable-nvenc --enable-openal --enable-opencl --enable-libopencore-amrnb --enable-libopencore-amrwb --disable-opengl --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-libopus --disable-libplacebo --disable-libpulse --enable-libqrencode --enable-libquirc --enable-librav1e --enable-librtmp --enable-librubberband --disable-libsmbclient --enable-sdl2 --disable-libshaderc --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-librsvg --disable-libsvtav1 --disable-libtensorflow --enable-libtheora --enable-libtwolame --disable-libv4l2 --disable-v4l2-m2m --disable-vaapi --enable-vdpau --disable-libvpl --enable-videotoolbox --enable-libvidstab --disable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --disable-vulkan --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxcb --enable-libxcb-shape --enable-libxcb-shm --enable-libxcb-xfixes --enable-libxevd --enable-libxeve --enable-xlib --enable-libxml2 --enable-libxvid --enable-libzimg --enable-zlib --enable-libzmq --enable-libzvbi --disable-debug --enable-optimizations --disable-extra-warnings --disable-stripping --cc=clang --cxx=clang++
  libavutil      59. 39.100 / 59. 39.100
  libavcodec     61. 19.100 / 61. 19.100
  libavformat    61.  7.100 / 61.  7.100
  libavdevice    61.  3.100 / 61.  3.100
  libavfilter    10.  4.100 / 10.  4.100
  libswscale      8.  3.100 /  8.  3.100
  libswresample   5.  3.100 /  5.  3.100
  libpostproc    58.  3.100 / 58.  3.100
Input #0, matroska,webm, from 'rms_normalized_to_-10.mkv':
  Metadata:
    creation_time   : 2021-12-31T14:24:29.000000Z
    ENCODER         : Lavf61.7.100
  Duration: 01:59:56.67, start: 0.021000, bitrate: 6105 kb/s
  Stream #0:0: Video: h264 (High), yuv420p(tv, bt470bg, progressive), 720x576 [SAR 64:45 DAR 16:9], SAR 1:1 DAR 5:4, 25 fps, 25 tbr, 1k tbn (default)
      Metadata:
        ENCODER         : Lavc58.134.100 libx264
        BPS             : 1492045
        NUMBER_OF_FRAMES: 179866
        NUMBER_OF_BYTES : 1341841549
        _STATISTICS_WRITING_APP: mkvmerge v56.0.0 ('Strasbourg / St. Denis') 64-bit
        _STATISTICS_WRITING_DATE_UTC: 2021-12-31 14:24:29
        _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
        DURATION        : 01:59:56.672000000
  Stream #0:1: Audio: pcm_s16le, 48000 Hz, 6 channels, s16, 4608 kb/s
      Metadata:
        _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
        BPS             : 238470
        NUMBER_OF_FRAMES: 522081
        NUMBER_OF_BYTES : 214523891
        _STATISTICS_WRITING_APP: mkvmerge v56.0.0 ('Strasbourg / St. Denis') 64-bit
        _STATISTICS_WRITING_DATE_UTC: 2021-12-31 14:24:29
        ENCODER         : Lavc58.134.100
        DURATION        : 01:59:56.661000000
slhck commented 4 days ago

From the README:

The default audio encoding method is uncompressed PCM (pcm_s16le) to avoid introducing compression artifacts. This will result in a much higher bitrate than you might want, for example if your input files are MP3s.

Some containers (like MP4) also cannot handle PCM audio. If you want to use such containers and/or keep the file size down, use -c:a and specify an audio codec (e.g., -c:a aac for ffmpeg's built-in AAC encoder).

Does this help?

slhck commented 4 days ago

See also new FAQ entry: 6a3287682eeac7cc48e86cce34ec35db8a993af5

anderslundstedt commented 3 days ago

From the README:

The default audio encoding method is uncompressed PCM (pcm_s16le) to avoid introducing compression artifacts. This will result in a much higher bitrate than you might want, for example if your input files are MP3s.

Some containers (like MP4) also cannot handle PCM audio. If you want to use such containers and/or keep the file size down, use -c:a and specify an audio codec (e.g., -c:a aac for ffmpeg's built-in AAC encoder).

Does this help?

Indeed it does! What puzzled me was that the metadata from ffprobe was incorrect, which extracting the audio proved. Perhaps this incorrect metadata should be considered a bug?

By the way, this probably just shows my ignorance of how audio codecs works but the non-metadata implies an even larger increase in audio stream size:

4608kb/s × ~2h ≈ 32GiB
slhck commented 3 days ago

The metadata gets carried over from the input. Actually, I've gone to great lengths to preserve all metadata, as that was a common request by users. What specific metadata are you referring to though?

As for file size, you forgot that the rate is in bits, so: 4608 kBit/s = 4.6 MBit/s, so that's 4.6 MBit ⨉ 3600 s ⨉ 2 h = 33120 MBit = 33.12 GBit = 4.14 GB for your entire file. (The audio stream only, obviously. The rest is video.)

anderslundstedt commented 3 days ago

The metadata gets carried over from the input. Actually, I've gone to great lengths to preserve all metadata, as that was a common request by users. What specific metadata are you referring to though?

At the very least, preserving NUMBER_OF_BYTES in the audio stream metadata for the normalized output seems undesired?

  Stream #0:1: Audio: pcm_s16le, 48000 Hz, 6 channels, s16, 4608 kb/s
      Metadata:
        _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
        BPS             : 238470
        NUMBER_OF_FRAMES: 522081
        NUMBER_OF_BYTES : 214523891
        _STATISTICS_WRITING_APP: mkvmerge v56.0.0 ('Strasbourg / St. Denis') 64-bit
        _STATISTICS_WRITING_DATE_UTC: 2021-12-31 14:24:29
        ENCODER         : Lavc58.134.100
        DURATION        : 01:59:56.661000000

As for file size, you forgot that the rate is in bits, so: 4608 kBit/s = 4.6 MBit/s, so that's 4.6 MBit ⨉ 3600 s ⨉ 2 h = 33120 MBit = 33.12 GBit = 4.14 GB for your entire file. (The audio stream only, obviously. The rest is video.)

Yes, of course, thank you.

slhck commented 2 days ago

The metadata gets carried over from the input. Actually, I've gone to great lengths to preserve all metadata, as that was a common request by users. What specific metadata are you referring to though?

At the very least, preserving NUMBER_OF_BYTES in the audio stream metadata for the normalized output seems undesired?

True. These are non-standard and added by mkvmerge. I am not sure how to deal with this easily, as there could be many such tags … maybe I'll find some time to address it.