slhck / ffmpeg-normalize

Audio Normalization for Python/ffmpeg
MIT License
1.28k stars 118 forks source link

Guidance for tuning for audiobooks #262

Closed kanjieater closed 4 months ago

kanjieater commented 5 months ago

:warning: Please read this carefully and edit the example responses! If you do not fill out this information, your question may be closed without comment.

Describe what you are attempting to do I'm looking for optimal settings for normalizing a few of my audiobooks that go from whispers to literal screaming.

Unfortunately, with the robust parameters supplied by the tool, I'm finding it a bit difficult to navigate what some good "set and forget" settings would be for my audiobooks. I have 100's so manually tinkering with each won't really be an option.

Additional context

Questions:

  1. Currently I'm using this for a few 64kbps bitrate audio files, but i'd like to just dynamically match whatever the bitrate is of the file so that there's not wild expansion of the output size. I was using -e="vbr 3" but unfortunately that tripled the output size. I tried using the default supplied by the codec, but that resulted in 8x size increase and a much higher bitrate as well. How could I just supply the file, and let the program take care of the getting the right bitrate?
#!/bin/sh
exec ffmpeg-normalize "$@" -f -c:a libfdk_aac --extension m4b --progress -b:a 64k --keep-lra-above-loudness-range-target
  1. --keep-lra-above-loudness-range-target I'm completely guessing if this is useful or needed

  2. Similarly, It's Reading through these, as a new person, I really don't understand how to best tune them at all, even after reading through the readme twice and looking at the examples.

-lrt LOUDNESS_RANGE_TARGET, --loudness-range-target LOUDNESS_RANGE_TARGET: EBU Loudness Range Target in LUFS (default: 7.0).

Range is 1.0 - 50.0.

--keep-loudness-range-target: Keep the input loudness range target to allow for linear normalization.

--keep-lra-above-loudness-range-target: Keep input loudness range above loudness range target.

LOUDNESS_RANGE_TARGET for input loudness range <= LOUDNESS_RANGE_TARGET or keep input loudness range target above LOUDNESS_RANGE_TARGET. as alternative to --keep-loudness-range-target to allow for linear normalization.

-tp TRUE_PEAK, --true-peak TRUE_PEAK: EBU Maximum True Peak in dBTP (default: -2.0).

I seem to have gotten ok results with the option above

ffmpeg-normalize "$@" -f -c:a libfdk_aac --extension m4b --progress -b:a 64k --keep-lra-above-loudness-range-target

But ffmpeg-normalize "$@" -f -c:a libfdk_aac --extension m4b --progress -b:a 64k, gives a warning that WARNING: Input file had loudness range of 11.7. This is larger than the loudness range target (7.0)..

I'm not really sure as a new user what I should be doing, even with the explanations provided in the messaging. In addition there are large spikes in audio where some parts have been made louder than others, that were about the same volume before.

Choose a higher target loudness range if you want linear normalization. I'm not sure if I want that or not. I just want my file to be at the same volume level.

Alternatively, use the --keep-loudness-range-target or --keep-lra-above-loudness-range-target option to keep the target loudness range from the input. Similarly, I'm not sure why or what or what I should do with this information.

  1. It seems like this program can do the things I want it to, but If you could give some guidance on how to tune it for some audiobooks that I'd like to batch up, I'd certainly appreciate it.
slhck commented 5 months ago

Currently I'm using this for a few 64kbps bitrate audio files, but i'd like to just dynamically match whatever the bitrate is of the file so that there's not wild expansion of the output size. I was using -e="vbr 3" but unfortunately that tripled the output size. I tried using the default supplied by the codec, but that resulted in 8x size increase and a much higher bitrate as well. How could I just supply the file, and let the program take care of the getting the right bitrate?

Per this guide, if you want VBR mode, do not set the bitrate, but just the -vbr option:

ffmpeg-normalize -c:a libfdk_aac -e="-vbr 3" …

--keep-lra-above-loudness-range-target I'm completely guessing if this is useful or needed

Think of proper loudness normalization as applying a constant gain to the audio file such that your target is met, not interfering with the (intrinsic) dynamic of the audio signal. If you want a target loudness range (i.e., the “height” of the signal in the overall waveform) that is lower than the original, you have to modify the original audio content in a nonlinear way.

The idea behind this (and other options) is that, in general, the normalization algorithm has two modes: dynamic and linear. Dynamic normalization is a kind of compression + limiting that is required to fit the signal within your chosen target parameters. Due to that processing, it may yield unexpected jumps in audio volume (and, AFAIK, some bugs in the ffmpeg filter make this even more prominent).

Usually dynamic mode is not wanted, and so you want to use linear mode whenever possible. This is actually the sole reason for this program, because if you just wanted a single-pass dynamic normalization, you could just run ffmpeg itself. However, linear normalization is only achievable when certain parameters are met. For this you have to run two passes, to get an understanding of the input file's characteristics.

In fact, I recently added a warning that tells you when dynamic mode is used but wasn't originally specified: https://github.com/slhck/ffmpeg-normalize/commit/fe96734aa0f9410d8d21fccd57484bf07a6e4ff2#diff-4b195a438cf80a08704fcb21f666c2e751263a944d1a898e31a3d6393ad7d378R432 — please check if you get these warnings.

For some background, see:

The problem is that there is no "one size fits all" setting that accomplishes peoples' goals for all kinds of input (e.g., music, audio books, short spoken samples, …) and output contexts (e.g., different target levels). Over time, people have had issues with different use cases, and some of the warnings were added based on that feedback.

I do realize there is no simple guide here, and that something like this would be needed from a community perspective. Some example files would also be needed. I simply haven't had the time to do this myself.

  1. Similarly, It's Reading through these, as a new person, I really don't understand how to best tune them at all, even after reading through the readme twice and looking at the examples.

-lrt LOUDNESS_RANGE_TARGET, --loudness-range-target LOUDNESS_RANGE_TARGET: EBU Loudness Range Target in LUFS (default: 7.0). Range is 1.0 - 50.0. --keep-loudness-range-target: Keep the input loudness range target to allow for linear normalization. --keep-lra-above-loudness-range-target: Keep input loudness range above loudness range target. LOUDNESS_RANGE_TARGET for input loudness range <= LOUDNESS_RANGE_TARGET or keep input loudness range target above LOUDNESS_RANGE_TARGET. as alternative to --keep-loudness-range-target to allow for linear normalization. -tp TRUE_PEAK, --true-peak TRUE_PEAK: EBU Maximum True Peak in dBTP (default: -2.0).

I seem to have gotten ok results with the option above

ffmpeg-normalize "$@" -f -c:a libfdk_aac --extension m4b --progress -b:a 64k --keep-lra-above-loudness-range-target

But ffmpeg-normalize "$@" -f -c:a libfdk_aac --extension m4b --progress -b:a 64k, gives a warning that WARNING: Input file had loudness range of 11.7. This is larger than the loudness range target (7.0)..

I'm not really sure as a new user what I should be doing, even with the explanations provided in the messaging. In addition there are large spikes in audio where some parts have been made louder than others, that were about the same volume before.

As I said above, the spike might be due to the dynamic normalization mode and, maybe, a bug in ffmpeg that I can't do much about.

Choose a higher target loudness range if you want linear normalization. I'm not sure if I want that or not. I just want my file to be at the same volume level.

Alternatively, use the --keep-loudness-range-target or --keep-lra-above-loudness-range-target option to keep the target loudness range from the input. Similarly, I'm not sure why or what or what I should do with this information.

I appreciate you mentioning that this message is unclear. There's certainly a compromise to be found with the messaging not being overly verbose but also “actionable” for end users. I will have to do some thinking.

Thinking about it, there might be a way to automatically set — at least via some heuristics — the options to ensure linear processing. That would be a longer-term effort though.

slhck commented 5 months ago

Here's a first attempt at a more high level explanation: https://github.com/slhck/ffmpeg-normalize?tab=readme-ov-file#what-options-should-i-choose-for-the-ebu-r128-filter-what-is-linear-and-dynamic-mode

kanjieater commented 5 months ago

Thanks this helps a lot. It seems that given my current use case is to fix a narrator for many books that literally goes from ear-bleeding screams to whispers, I would actually be benefiting from the --dynamic filtering instead of the recommended linear, as the artistic intention here was never properly mastered.

So I was trying out exec ffmpeg-normalize "$@" -f -c:a libfdk_aac --extension m4b --progress -b:a 64k --dynamic --loudness-range-target 5 But now i'm thinking I would just want the LRT to be as low as possible like 1?

kanjieater commented 5 months ago

And since I'm trying to encode it as compactly or smaller, and it was originally 64k, should I also be including a -b:a 64k as I just got a warning about dynamic mode automatically using 192k hz for sampling. So do i need to supply both -ar and -b:a ?

kanjieater commented 5 months ago

So i've tried exec ffmpeg-normalize "$@" -f -c:a libfdk_aac --extension m4b --progress -b:a 64k -ar 64000 --dynamic --loudness-range-target -1 Which sounds very level but has pops and clicks as an added artifact. Any way to keep the original audio quality while lowering the volume in the loud spots?

slhck commented 5 months ago

I think the loudnorm filter is inadequate for handling input issues due to the lack of compression or limiting, i.e. improper mastering. There may very well be artifacts there.

If you want better dynamic processing, I would recommend performing compression and limiting separately using graphical tools, or, if you're batch processing, with ffmpeg itself — using the acompressor and alimiter filters. Here, there's no one-size-fits-all solution. I usually adjust these settings based on the input material and personal preference (there's a reason why you usually pay for professional mixing/mastering).

The acompressor filter reduces the volume of loud sounds or amplifies quiet sounds.

ffmpeg -i input.m4b -filter:a "acompressor=threshold=-20dB:ratio=4:attack=20:release=250:makeup=2" output.m4b

The alimiter filter prevents the audio from exceeding a certain volume.

ffmpeg -i input.m4b -filter:a "alimiter=level_in=1:level_out=1:limit=0.5:attack=5:release=50" output.m4b

You can combine these filters, of course, with a comma between them.

Basically, it comes down to choosing the right threshold, attack speed, and compression ratio for the compressor, and then setting appropriate limiting parameters to catch any remaining peaks. There is also a speechnorm filter which I haven't tried yet, but it might be worth exploring for your audiobook use case. It's designed specifically for speech and might handle the extreme dynamics of your narrator better.

Keep in mind that finding the right settings will likely require some experimentation. You might want to process a small sample of your audio with different settings to find what works best before applying it to your entire library. If you're not comfortable with command-line tools or want more visual feedback, you could consider using a graphical audio editor like Audacity to get a feel for the right compression and limiting settings, and then translate those to ffmpeg commands for batch processing.

kanjieater commented 5 months ago

Speechnorm wouldn't finish processing very simple files for me. So out of desperation, I looking into vinyl emulation. If I can't get it to sound good, what if I make it sound int entionally "bad". Worked out really well actually.

The key seemed to be maxing out the limiter and compressor, then there's a free vst plugin called Vinyl. So Lo-fi seems to be my best automated option. I'd like to figure out how to get the equivalent via ffmpeg, but it seems a little complex. image image

Before and after samples: Recording.zip

kanjieater commented 5 months ago

Alright, so back to the topic, with the help of people smarter than, I managed to get a basic audiobook normalizer command set. I run this from a bash script exec ffmpeg-normalize "$@" -f -c:a libfdk_aac --extension m4b --progress -b:a 64k -ar 48000 --pre-filter "adeclip,adeclick" --dynamic --dual-mono --loudness-range-target 3

My -profile:a aac_he I did try adding -e "-profile:a aac_he" but got, ffmpeg-normalize: error: argument -e/--extra-output-options: expected one argument. Which is a little unintuitive since i only used one explicitly. But i'm not sure how to pass the -e stuff with my existing parameters.

kanjieater commented 5 months ago

I am curious why there are two passes if I'm using the --dynamic flag though?

slhck commented 4 months ago

Thanks for the updated command, that sounds useful for your use case. I would not choose a lower range target unless needed.

As for why two passes are used, this is simply because the program was built to do two passes. If you want dynamic mode you can ignore ffmpeg-normalize and just use ffmpeg directly.

The extra options require you to set -e="…" with an equals sign. This is a limitation of the Python argument parser. See the description here: https://github.com/slhck/ffmpeg-normalize?tab=readme-ov-file#inputoutput-format

kanjieater commented 4 months ago

Ok good to know. I still liked being able to use dynamic as a fallback & for the progress bar built in, but running twice seems like it should be eliminated in the dynamic fallback or explicit case.

slhck commented 4 months ago

Yes, I get that, and it would make sense that if --dynamic is specified, only run one pass is run. See: https://github.com/slhck/ffmpeg-normalize/issues/263

I'm closing this for now, but feel free to come back if you have any questions.