slhck / ffmpeg-normalize

Audio Normalization for Python/ffmpeg
MIT License
1.28k stars 118 forks source link
audio ffmpeg python

ffmpeg-normalize

PyPI version Docker Image Version GitHub Actions Workflow Status

All Contributors

A utility for batch-normalizing audio using ffmpeg.

This program normalizes media files to a certain loudness level using the EBU R128 loudness normalization procedure. It can also perform RMS-based normalization (where the mean is lifted or attenuated), or peak normalization to a certain target level.

Batch processing of several input files is possible, including video files.

A very quick how-to:

  1. Install a recent version of ffmpeg
  2. Run pip3 install ffmpeg-normalize
  3. Run ffmpeg-normalize /path/to/your/file.mp4
  4. Done! 🎧 (the file will be in a folder called normalized)

Read on for more info.

Contents:


Requirements

You need Python 3.9 or higher, and ffmpeg.

ffmpeg

For instance, under Linux:

wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz
mkdir -p ffmpeg
tar -xf ffmpeg-release-amd64-static.tar.xz -C ffmpeg --strip-components=1
sudo cp ffmpeg/ffmpeg /usr/local/bin
sudo cp ffmpeg/ffprobe /usr/local/bin
sudo chmod +x /usr/local/bin/ffmpeg /usr/local/bin/ffprobe

For Windows, follow this guide.

For macOS and Linux, you can also use Homebrew:

brew install ffmpeg

Note that using distribution packages (e.g., apt install ffmpeg) is not recommended, as these are often outdated.

Installation

For Python 3 and pip:

pip3 install ffmpeg-normalize

Or download this repository, then run pip3 install ..

To later upgrade to the latest version, run pip3 install --upgrade ffmpeg-normalize.

Usage with Docker

You can use the pre-built image from Docker Hub:

docker run -v "$(pwd):/tmp" -it slhck/ffmpeg-normalize

Alternatively, download this repository and run

docker build -t ffmpeg-normalize .

Then run the container with:

docker run  -v "$(pwd):/tmp" -it ffmpeg-normalize

This will mount your current directory to the /tmp directory inside the container. Everything else works the same way as if you had installed the program locally. For example, to normalize a file:

docker run  -v "$(pwd):/tmp" -it ffmpeg-normalize /tmp/yourfile.mp4 -o /tmp/yourfile-normalized.wav

You will then find the normalized file in your current directory.

High LeveL Introduction

Please read this section for a high level introduction.

What does the program do?

The program takes one or more input files and, by default, writes them to a folder called normalized, using an .mkv container. All audio streams will be normalized so that they have the same (perceived) volume according to the EBU R128 standard. This is done by analyzing the audio streams and applying a filter to bring them to a target level. Under the hood, the program uses ffmpeg's loudnorm filter to do this.

How do I specify the input?

Just give the program one or more input files as arguments. It works with most media files, including video files.

How do I specify the output?

You don't have to specify an output file name (the default is normalized/<input>.mkv), but if you want to override it, you can specify one output file name for each input file with the -o option. In this case, the container format (e.g. .wav) will be inferred from the file name extension that you've given.

Example:

ffmpeg-normalize 1.wav 2.wav -o 1-normalized.wav 2-normalized.wav

Note that if you don't specify the output file name for an input file, the container format will be MKV, and the output will be written to normalized/<input>.mkv. The reason for choosing the MKV container is that it can handle almost any codec combination.

Using the -ext option, you can supply a different output extension common to all output files, e.g. -ext m4a. However, you need to make sure that the container supports the codecs used for the output (see below).

What will get normalized?

By default, all streams from the input file will be written to the output file. For example, if your input is a video with two language tracks and a subtitle track, both audio tracks will be normalized independently. The video and subtitle tracks will be copied over to the output file.

How will the normalization be done?

The normalization will be performed according to the EBU R128 algorithm with the loudnorm filter from FFmpeg, which was originally written by Kyle Swanson. It will bring the audio to a specified target level. This ensures that multiple files normalized with this filter will have the same perceived loudness.

What codec is chosen?

The default audio encoding method is uncompressed PCM (pcm_s16le) to avoid introducing compression artifacts. This will result in a much higher bitrate than you might want, for example if your input files are MP3s.

Some containers (like MP4) also cannot handle PCM audio. If you want to use such containers and/or keep the file size down, use -c:a and specify an audio codec (e.g., -c:a aac for ffmpeg's built-in AAC encoder).

Basic Usage

Supply one or more input files, and optionally, output file names:

ffmpeg-normalize input [input ...][-h][-o OUTPUT [OUTPUT ...]] [options]

Example:

ffmpeg-normalize 1.wav 2.wav -o 1-normalized.m4a 2-normalized.m4a -c:a aac -b:a 192k

For more information on the options ([options]) available, run ffmpeg-normalize -h, or read on.

Examples

Read the examples on the wiki.

Detailed Options

File Input/Output

General

Normalization

EBU R128 Normalization

Audio Encoding

Other Encoding Options

Input/Output Format

Environment Variables

The program additionally respects environment variables:

API

This program has a simple API that can be used to integrate it into other Python programs.

For more information see the API documentation.

FAQ

My output file is too large?

This is because the default output codec is PCM, which is uncompressed. If you want to reduce the file size, you can specify an audio codec with -c:a (e.g., -c:a aac for ffmpeg's built-in AAC encoder), and optionally a bitrate with -b:a.

For example:

ffmpeg-normalize input.wav -o output.m4a -c:a aac -b:a 192k

What options should I choose for the EBU R128 filter? What is linear and dynamic mode?

EBU R128 is a method for normalizing audio loudness across different tracks or programs. It works by analyzing the audio content and adjusting it to meet specific loudness targets. The main components are:

The normalization process involves measuring these values (input) and then applying gain adjustments to meet target levels (output), typically -23 LUFS for integrated loudness. You can also specify a target loudness range (LRA) and true peak level (TP).

Linear mode applies a constant gain adjustment across the entire audio file. This is generally preferred because:

Dynamic mode, on the other hand, can change the volume dynamically throughout the file. While this can achieve more consistent loudness, it may alter the original artistic intent and potentially introduce audible artifacts (possibly due to some bugs in the ffmpeg filter).

For most cases, linear mode is recommended. Dynamic mode should only be used when linear mode is not suitable or when a specific effect is desired. In some cases, loudnorm will still fall back to dynamic mode, and a warning will be printed to the console. Here's when this can happen:

At this time, the loudnorm filter in ffmpeg does not provide a way to force linear mode when the input loudness range exceeds the target or when the true peak would be exceeded. The --keep-loudness-range-target option can be used to keep the input loudness range target above the specified target, but it will not force linear mode in all cases. We are working on a solution to handle this automatically!

The program doesn't work because the "loudnorm" filter can't be found

Make sure you run a recent ffmpeg version and that loudnorm is part of the output when you run ffmpeg -filters. Many distributions package outdated ffmpeg versions, or (even worse), Libav's ffmpeg disguising as a real ffmpeg from the FFmpeg project.

Some ffmpeg builds also do not have the loudnorm filter enabled.

You can always download a static build from their website and use that.

If you have to use an outdated ffmpeg version, you can only use rms or peak as normalization types, but I can't promise that the program will work correctly.

Should I use this to normalize my music collection?

Generally, no.

When you run ffmpeg-normalize and re-encode files with MP3 or AAC, you will inevitably introduce generation loss. Therefore, I do not recommend running this on your precious music collection, unless you have a backup of the originals or accept potential quality reduction. If you just want to normalize the subjective volume of the files without changing the actual content, consider using MP3Gain and aacgain.

Why are my output files MKV?

I chose MKV as a default output container since it handles almost every possible combination of audio, video, and subtitle codecs. If you know which audio/video codec you want, and which container is supported, use the output options to specify the encoder and output file name manually.

I get a "Could not write header for output file" error

See the next section.

The conversion does not work and I get a cryptic ffmpeg error!

Maybe ffmpeg says something like:

Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument

Or the program says:

… Please choose a suitable audio codec with the -c:a option.

One possible reason is that the input file contains some streams that cannot be mapped to the output file, or that you are using a codec that does not work for the output file. Examples:

The default output container is .mkv as it will support most input stream types. If you want a different output container, make sure that it supports your input file's video, audio, and subtitle streams (if any).

Also, if there is some other broken metadata, you can try to disable copying over of metadata with -mn.

Finally, make sure you use a recent version of ffmpeg. The static builds are usually the best option.

What are the different normalization algorithms?

Couldn't I just run loudnorm with ffmpeg?

You absolutely can. However, you can get better accuracy and linear normalization with two passes of the filter. Since ffmpeg does not allow you to automatically run these two passes, you have to do it yourself and parse the output values from the first run.

If ffmpeg-normalize is too over-engineered for you, you could also use an approach such as featured in this Ruby script that performs the two loudnorm passes.

If you want dynamic normalization (the loudnorm default), simply use ffmpeg with one pass, e.g.:

ffmpeg -i input.mp3 -af loudnorm -c:a aac -b:a 192k output.m4a

What about speech?

You should check out the speechnorm filter that is part of ffmpeg. It is a designed to be used in one pass, so you don't need this script at all.

See the documentation for more information.

After updating, this program does not work as expected anymore!

You are probably using a 0.x version of this program. There are significant changes to the command line arguments and inner workings of this program, so please adapt your scripts to the new one. Those changes were necessary to address a few issues that kept piling up; leaving the program as-is would have made it hard to extend it. You can continue using the old version (find it under Releases on GitHub or request the specific version from PyPi), but it will not be supported anymore.

Can I buy you a beer / coffee / random drink?

If you found this program useful and feel like giving back, feel free to send a donation via PayPal.

Related Tools and Articles

(Have a link? Please propose an edit to this section via a pull request!)

Contributors

Benjamin Balder Bach
Benjamin Balder Bach

πŸ’»
Eleni Lixourioti
Eleni Lixourioti

πŸ’»
thenewguy
thenewguy

πŸ’»
Anthony Violo
Anthony Violo

πŸ’»
Eric Jacobs
Eric Jacobs

πŸ’»
kostalski
kostalski

πŸ’»
Justin Pearson
Justin Pearson

πŸ’»
ad90xa0-aa
ad90xa0-aa

πŸ’»
Mathijs
Mathijs

πŸ’»
Marc PΓΌls
Marc PΓΌls

πŸ’»
Michael V. Battista
Michael V. Battista

πŸ’»
WyattBlue
WyattBlue

πŸ’»
Jan-Frederik Schmidt
Jan-Frederik Schmidt

πŸ’»
mjhalwa
mjhalwa

πŸ’»
07416
07416

πŸ“–
sian1468
sian1468

⚠️
Panayiotis Savva
Panayiotis Savva

πŸ’»
HighMans
HighMans

πŸ’»
kanjieater
kanjieater

πŸ€”
Ahmet Sait
Ahmet Sait

πŸ’»
Add your contributions

License

The MIT License (MIT)

Copyright (c) 2015-2022 Werner Robitza

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.