slhck / ffmpeg-normalize

Audio Normalization for Python/ffmpeg
MIT License
1.24k stars 116 forks source link

appropriate settings and difference from ffmpeg #38

Closed greaber closed 7 years ago

greaber commented 7 years ago

Hi, I am not reporting a bug; I just have a usage question about this script. First of all, I am confused about what the value-add is over ffmpeg. But maybe there is one because I have tried to use ffmpeg for volume normalization and the results were not very satisfactory.

My application is that I want to use librivox audiobooks as training data for a machine learning model. These audiobooks are recorded under varying conditions with varying quality.

I tried using ffmpeg with the following option: af 'dynaudnorm=c=1:r=1:p=1'. I found that this seemed to do a bad job in the following sense. After running this, the difference between the max sample and min sample in my audio files is still all over the map (ranging from about .5 to almost 2). Do you know why this happens? Anyway, I found that to my model performs far better if, after running the ffmpeg command, I rescale the audio so that the difference between the max and min samples is the same in each 30 second segment. But this seems like an unprincipled hack, and it introduces DC bias. Would your script, with some particular options, be a better solution?

Also, if you happen to know of similar scripts to remove background noise or other artifacts of amateur recordings, I would be glad to know about them.

slhck commented 7 years ago

I am confused about what the value-add is over ffmpeg.

If you want to do simple peak or RMS normalization, you first have to analyze the file, parse the ffmpeg log output, calculate the required offset to your target, then normalize the file with the volume filter. Easy enough to do once, but if you want to do that on many files, you have to write a somewhat complex script. This is basically what ffmpeg-normalize does – a wrapper around ffmpeg to call it twice and do some maths, on many files.

The addition of mappings to use the loudnorm filter (for EBU R128 normalization) was just added for convenience. Same as the option to specify extra arguments to ffmpeg, which is what you are doing with dynaudnorm. Under the hood it's using ffmpeg, so there's no benefit from using ffmpeg-normalize, really – at least in terms of audio quality or extra options that you'd get.

if you happen to know of similar scripts to remove background noise or other artifacts of amateur recordings, I would be glad to know about them

In these cases I usually play around with what's built into Premiere Pro or Cubase, as I happen to use these programs for video and audio editing. I don't know of a denoise filter for ffmpeg; a low-pass filter may help for certain types of wind noise, and an EQ with a very high Q setting can help reducing hiss or some annoying frequency, but usually you want something more adaptive.

slhck commented 7 years ago

Audacity seems to have a few more noise removal tools:

greaber commented 7 years ago

Thanks. I have about 6000 files, so I appreciate the convenience of a script :-). Maybe I will try both peak and RMS normalization and see which performs better.

slhck commented 7 years ago

I wouldn't recommend it though: these are not going to change the dynamics of the files. With peak/RMS normalization you will just make the waveform louder (or set the average dB level to another point), but perceptually nothing is going to change.

For enhancing amateur recordings, you will very likely need a combination of:

slhck commented 7 years ago

That said, you could use Audacity or any other GUI software to get a feeling for what the right EQ and compression settings should be, then apply these with ffmpeg or ffmpeg-normalize.

michaelcrossland commented 7 years ago

Hi glad to see new blood. As far as the gains over just ffmpeg. Is that this script take the guess work out of trying to find the info that need from the file so that the leveling will make you files all sound the same. Now you have two ways to use this script. 1: you can use the leveling or -L switch and set a dB like -27. It does okay but if a file has really load parts the leveling will clip the audio to get the lower parts up to the set dB that can be really bad. Now the one I have been using is the ebu leveling. It don't clip the audio it will compress the high spots so that everything works better. Lile background music in a video don't over drive the spoken parts. Oh if you need more help I'll be glad to help as much as I can. And as a side note if you use the ebu leveling you have to set the audiorate to something like 44100 or 48000. Other wise ffmpeg will resample your audio to 98000 found a lot of thing don't like that high of a sample rate. Your friend in coding, Michael Crossland

On Sep 16, 2017 10:13 AM, "greaber" notifications@github.com wrote:

Hi, I am not reporting a bug; I just have a usage question about this script. First of all, I am confused about what the value-add is over ffmpeg. But maybe there is one because I have tried to use ffmpeg for volume normalization and the results were not very satisfactory.

My application is that I want to use librivox audiobooks as training data for a machine learning model. These audiobooks are recorded under varying conditions with varying quality.

I tried using ffmpeg with the following option: af 'dynaudnorm=c=1:r=1:p=1'. I found that this seemed to do a bad job in the following sense. After running this, the difference between the max sample and min sample in my audio files is still all over the map (ranging from about .5 to almost 2). Do you know why this happens? Anyway, I found that to my model performs far better if, after running the ffmpeg command, I rescale the audio so that the difference between the max and min samples is the same in each 30 second segment. But this seems like an unprincipled hack, and it introduces DC bias. Would your script, with some particular options, be a better solution?

Also, if you happen to know of similar scripts to remove background noise or other artifacts of amateur recordings, I would be glad to know about them.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/slhck/ffmpeg-normalize/issues/38, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhMBGrJTCUc2__i63jk2Chuh6sCYqH1ks5si-WsgaJpZM4PZ2N_ .

greaber commented 7 years ago

Thanks, Michael. My target sample rate is 16000. Is that a problem for ebu leveling? My use case is a bit funny because I don't actually care that much about improving the audio quality per se; I just want to make the files more similar to each other so that the model will have an easier time of learning.

slhck commented 7 years ago

My target sample rate is 16000. Is that a problem for ebu leveling?

No, as it's just the output sample rate. You can set that using -ar and everything should be fine.

greaber commented 7 years ago

Why does it default to not changing audio at a level below .5? Can I just set the threshold to 0 instead? My thought is that if you have the exact same audio but just at two different volume levels before normalization, after normalization they should be the same.

slhck commented 7 years ago

Because it's not such a huge difference. If your peak sits at -26.5 already and you'd go to -27 dB, it'd not worth be re-coding the file. But in your case setting the threshold to 0 might work better.

greaber commented 7 years ago

Actually, for the rms method, it looks like I have to set a negative threshold or it still will sometimes give the "cannot run adjustment, file should be skipped" error. This is because it refuses to run if the adjustment is less than or equal to the threshold, but the adjustment can be zero. Perhaps it would be more intuitive if it used a less than comparison (or if below threshold files were not considered errors -- I guess if you are just processing one file you would want to see a message in this case, but if you are processing many it is not natural to consider being below threshold an error).

slhck commented 7 years ago

Ah, I see how this will fail, yeah. It would make sense for the tool to always normalize the input file if the threshold is 0. I'll implement that tomorrow and push a new version.

michaelcrossland commented 7 years ago

No that should be fine. Just got to put -e "-ar 16000" right before you tell it what file to work on. ffmpeg has a bug where if you don't give it a sample rate it will re sample the audio to 96000. So far the ffmpeg devs have yet to look in to why this is happening as if you use the -L it will keep the bitrate and sample rate of the input file. Just wanted to let you know about that before it bits you in the back side.

On Sep 16, 2017 10:34 AM, "greaber" notifications@github.com wrote:

Thanks, Michael. My target sample rate is 16000. Is that a problem for ebu leveling? My use case is a bit funny because I don't actually care that much about improving the audio quality per se; I just want to make the files more similar to each other so that the model will have an easier time of learning.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/slhck/ffmpeg-normalize/issues/38#issuecomment-329975703, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhMBC8ApG5D-iNQbqlb-QJZCJCg50Kcks5si-qXgaJpZM4PZ2N_ .

michaelcrossland commented 7 years ago

As for what you want you can go into the scripts folder for this script and just edit the stuff at the top that looks like it's part of the help printout of you call for it. Because the dev uses a thing in python where it gets it's default settings from there. I use to set the level dB to -32 so I didn't have to use a whole lot of switches in my batch files. If you need help with editing the file let know. Because I had that where it would keep telling that it would not do anything to my file because of that 0.5 dB so just to get it to stop I changed it in the main script file so that it's now set to 0 dB. That's why I don't update the script all that often. Because when I update I have to reread the script code to see if I'll brake it if I set something where new code is not expecting my changes.

On Sep 16, 2017 1:12 PM, "greaber" notifications@github.com wrote:

Actually, for the rms method, it looks like I have to set a negative threshold or it still will sometimes give the "cannot run adjustment, file should be skipped" error. This is because it refuses to run if the adjustment is less than or equal to the threshold, but the adjustment can be zero. Perhaps it would be more intuitive if it used a less than comparison (or if below threshold files were not considered errors -- I guess if you are just processing one file you would want to see a message in this case, but if you are processing many it is not natural to consider being below threshold an error).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/slhck/ffmpeg-normalize/issues/38#issuecomment-329985671, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhMBOx4Tk02-7yt5i9-2c7zI-Jet4NQks5sjA-DgaJpZM4PZ2N_ .

slhck commented 7 years ago

Michael, in case you require a specific feature, feel free to let me know and I'll see if I can implement it. Then you don't need to keep a modified local copy and can use the latest version.

I'll add the option to remove the threshold.

michaelcrossland commented 7 years ago

I'm working on trying to make a GUI front end for your script in Autoit. So at first it will only be able to be used on windows. Once I get some more free time I'll look at trying to make a Java GUI that "should" work with most OSs that can run your script. Main reason for making the GUI is that on the fly the gui will make a temp modded ver of the script then once it's done it will remove the modded ver and put the main untouched script back in place.

On Sep 16, 2017 1:38 PM, "Werner Robitza" notifications@github.com wrote:

Michael, in case you require a specific feature, feel free to let me know and I'll see if I can implement it. Then you don't need to keep a modified local copy and can use the latest version.

I'll add the option to remove the threshold.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/slhck/ffmpeg-normalize/issues/38#issuecomment-329987045, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhMBDtEjNxIRclGWwTSLWN3HzbW9Wl5ks5sjBWXgaJpZM4PZ2N_ .

greaber commented 7 years ago

Well, ffmpeg-normalize seems to be working fine for me, but I am a little surprised by how wide the dynamic range in my files seems to remain. With the rms and ebu methods, it is similar to what I had before using dynaudnorm, and, as I reported in the beginning, I found that after using dynaudnorm I had to manually scale the dynamic range to a fixed value to get good results from my model even though that seemed hacky and introduced DC bias. I would have naively expected that with peak norm the dynamic range would be constant and with rms and ebu it would be close to constant, but instead I see the following, and I wonder if you can comment on why the range is so big with rms and ebu and even non-trivial with peak:

$ cat check_audio.py
import librosa
import sys

for arg in sys.argv[1:]:
    a, _ = librosa.load(arg, sr=16000)
    print(a.max() - a.min())

$ find peak-normed-librispeech -name \*.wav | xargs python check_audio.py
0.10022
0.0958557
0.0927124
0.115479
0.0877991
0.0984497
0.102112
0.10556
0.0994568
0.101532
0.0952148
0.0980835
^C
$ find rms-normed-librispeech -name *.wav |xargs python check_audio.py
0.701508
1.08823
0.744934
1.99997
1.91446
1.69077
1.44275
1.57993
1.37277
1.02707
1.15796
1.35397
1.22177
1.28516
1.65161
1.54269
0.569031
^C
$ find ebu-normed-librispeech -name *.wav |xargs python check_audio.py
0.726471
0.987671
0.801392
1.55356
1.02281
1.39194
1.26715
1.37625
1.31027
1.17642
1.17883
1.3053
1.26468
1.0932
1.59436
1.52618
0.591888
^C
slhck commented 7 years ago

I think there's a misunderstanding about what normalization does. You look at the general peak of the file (or calculate the RMS value) and then introduce a general lift or attenuation of the waveform so that the peak (or RMS) sits at the target value. The same goes for EBU R128 – just that here, the target has a different meaning altogether (as it's measured in LUFS and not dBFS).

If you are looking for a file that has a lower dynamic range, where normalization is done on shorter windows, you need the dynaudnorm filter – perhaps you need to set the frame length to f=10?.

greaber commented 7 years ago

Hmm, that is how I understood normalization, but I would expect in that case that peak normalization would make the dynamic range be a fixed value the same for every file. And while I wouldn't expect that to be true for RMS normalization (or I guess EBU, although I don't really understand what EBU is), still I wouldn't expect such huge variation in the dynamic range given that all the files are just about 10 minutes of someone speaking.

michaelcrossland commented 7 years ago

That has to due if the files are of a person speaking in a hall like a college class setting. In there you have alot of high peaks of the speaker's voice. Then when they stop speaking the dB floor drops past -32 dB when that happens any leveling tool will not help much because most leveling tools use a over all bass estimate of high and low dB in the whole file to try to get it over all as close to the volume that you want in dB. But if you have spots where due to echo's or poorly setup sound system and then the speaker stop speaking for a few seconds then you base low dB noise floor it below -32 dB when that happens most leveler's will crop the lower noise floor dB to -32 dB. Because most people have trouble hearing audio pass -32 dB without amplification. That way you will need to run a low pass filter before you level the file that will help. Try to keep the low pass set at -30 to -26 dB. That will rase the lower noise floor by removeing the lower blank dB part's so they now are higher dB in those parts. Then most leveling software should be able to now do it thing. By the way you may need to raise you sampling rate that you files are done in and reduce the bitrate of the audio file from 320 kilobytes per second to say 96 kilobytes per second. That will save more file room and keeping more of the audio as it was recorded. If you reduce the sampling rate from 44.1K or 44100 to say 36K or 36000. You are not compassing the audio what are doing it remove a lot of the audio info. So that is part of way you have those high swings because once that information is lost no software can make it better. I hope this helps. If not fill free to email back to me and I'll be glad to help you out more. Your friend in coding, Michael Crossland

On Sep 17, 2017 7:39 AM, "greaber" notifications@github.com wrote:

Well, ffmpeg-normalize seems to be working fine for me, but I am a little surprised by how wide the dynamic range in my files seems to remain. With the rms and ebu methods, it is similar to what I had before using dynaudnorm, and, as I reported in the beginning, I found that after using dynaudnorm I had to manually scale the dynamic range to a fixed value to get good results from my model even though that seemed hacky and introduced DC bias. I would have naively expected that with peak norm the dynamic range would be constant and with rms and ebu it would be close to constant, but instead I see the following, and I wonder if you can comment on why the range is so big with rms and ebu and even non-trivial with peak:

$ cat check_audio.py import librosa import sys

for arg in sys.argv[1:]: a, _ = librosa.load(arg, sr=16000) print(a.max() - a.min())

$ find peak-normed-librispeech -name *.wav | xargs python check_audio.py 0.10022 0.0958557 0.0927124 0.115479 0.0877991 0.0984497 0.102112 0.10556 0.0994568 0.101532 0.0952148 0.0980835 ^C $ find rms-normed-librispeech -name .wav |xargs python check_audio.py 0.701508 1.08823 0.744934 1.99997 1.91446 1.69077 1.44275 1.57993 1.37277 1.02707 1.15796 1.35397 1.22177 1.28516 1.65161 1.54269 0.569031 ^C $ find ebu-normed-librispeech -name .wav |xargs python check_audio.py 0.726471 0.987671 0.801392 1.55356 1.02281 1.39194 1.26715 1.37625 1.31027 1.17642 1.17883 1.3053 1.26468 1.0932 1.59436 1.52618 0.591888 ^C

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/slhck/ffmpeg-normalize/issues/38#issuecomment-330040979, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhMBLp8f4US8TlIQ0sczXoImkESrbOBks5sjRL0gaJpZM4PZ2N_ .

slhck commented 7 years ago

If you have 10 minutes of someone speaking, there will be pauses in the audio that are very low in volume. The dynamic range is the difference between the maximum volume and the minimum volume, hence, there will be quite some dynamic range in your file.

Peak normalization (e.g., normalizing to a peak value of 0 dBFS in the following example) only lifts the gain, but the dynamic range of the file will be the same:

image

If you want to minimize the dynamic range you need a heavy compressor or some other form of dynamic normalization where only parts (windows) of that waveform are normalized. Note that if you do that too extremely you'll just amplify the noise floor of the recording, meaning that you'll get a lot of random noise in speaker pauses.

(Sorry for the simple drawing.)

michaelcrossland commented 7 years ago

That's why I said to try first if he can get a higher sampled recording. Then 2nd run a light low pass of -30 to -26. -30 to -28 is best as it don't rase the noise floor to much. But rase it just what's needed to get a better leveling done.

On Sep 17, 2017 9:34 AM, "Werner Robitza" notifications@github.com wrote:

If you have 10 minutes of someone speaking, there will be pauses in the audio that are very low in volume. The dynamic range is the difference between the maximum volume and the minimum volume, hence, there will be quite some dynamic range in your file.

Peak normalization (e.g., normalizing to a peak value of 0 dBFS in the following example) only lifts the gain, but the dynamic range of the file will be the same:

[image: image] https://user-images.githubusercontent.com/582444/30521800-a3f7f3a8-9bc5-11e7-8b50-569d51b71352.png

If you want to minimize the dynamic range you need a heavy compressor or some other form of dynamic normalization where only parts (windows) of that waveform are normalized. Note that if you do that too extremely you'll just amplify the noise floor of the recording, meaning that you'll get a lot of random noise in speaker pauses.

(Sorry for the simple drawing.)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/slhck/ffmpeg-normalize/issues/38#issuecomment-330052921, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhMBBIOoPyImAMcLTpbigkzxp3zUzC_ks5sjS4LgaJpZM4PZ2N_ .

michaelcrossland commented 7 years ago

Only reason I say to try that is because back in the day I use to help do audio for a podcast group most of you may have heard of. TWiT. I worked on one the podcasts that never made it to videos. It was because the guy doing the main hosting moved on to a new job so they let drop. Kind of sucked loved work with Leo and Steve.

On Sep 17, 2017 9:40 AM, "Michael Crossland" michaelcrossland@gmail.com wrote:

That's why I said to try first if he can get a higher sampled recording. Then 2nd run a light low pass of -30 to -26. -30 to -28 is best as it don't rase the noise floor to much. But rase it just what's needed to get a better leveling done.

On Sep 17, 2017 9:34 AM, "Werner Robitza" notifications@github.com wrote:

If you have 10 minutes of someone speaking, there will be pauses in the audio that are very low in volume. The dynamic range is the difference between the maximum volume and the minimum volume, hence, there will be quite some dynamic range in your file.

Peak normalization (e.g., normalizing to a peak value of 0 dBFS in the following example) only lifts the gain, but the dynamic range of the file will be the same:

[image: image] https://user-images.githubusercontent.com/582444/30521800-a3f7f3a8-9bc5-11e7-8b50-569d51b71352.png

If you want to minimize the dynamic range you need a heavy compressor or some other form of dynamic normalization where only parts (windows) of that waveform are normalized. Note that if you do that too extremely you'll just amplify the noise floor of the recording, meaning that you'll get a lot of random noise in speaker pauses.

(Sorry for the simple drawing.)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/slhck/ffmpeg-normalize/issues/38#issuecomment-330052921, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhMBBIOoPyImAMcLTpbigkzxp3zUzC_ks5sjS4LgaJpZM4PZ2N_ .

michaelcrossland commented 7 years ago

Just as heads up there was a freeware that the guy that do the TED talk podcasts it was called levelater. I'll see if I can find a link to download it if you run you file through it then run the ffmpeg-normalize on the new file it outputs you should get what you're looking to do.

On Sep 17, 2017 9:44 AM, "Michael Crossland" michaelcrossland@gmail.com wrote:

Only reason I say to try that is because back in the day I use to help do audio for a podcast group most of you may have heard of. TWiT. I worked on one the podcasts that never made it to videos. It was because the guy doing the main hosting moved on to a new job so they let drop. Kind of sucked loved work with Leo and Steve.

On Sep 17, 2017 9:40 AM, "Michael Crossland" michaelcrossland@gmail.com wrote:

That's why I said to try first if he can get a higher sampled recording. Then 2nd run a light low pass of -30 to -26. -30 to -28 is best as it don't rase the noise floor to much. But rase it just what's needed to get a better leveling done.

On Sep 17, 2017 9:34 AM, "Werner Robitza" notifications@github.com wrote:

If you have 10 minutes of someone speaking, there will be pauses in the audio that are very low in volume. The dynamic range is the difference between the maximum volume and the minimum volume, hence, there will be quite some dynamic range in your file.

Peak normalization (e.g., normalizing to a peak value of 0 dBFS in the following example) only lifts the gain, but the dynamic range of the file will be the same:

[image: image] https://user-images.githubusercontent.com/582444/30521800-a3f7f3a8-9bc5-11e7-8b50-569d51b71352.png

If you want to minimize the dynamic range you need a heavy compressor or some other form of dynamic normalization where only parts (windows) of that waveform are normalized. Note that if you do that too extremely you'll just amplify the noise floor of the recording, meaning that you'll get a lot of random noise in speaker pauses.

(Sorry for the simple drawing.)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/slhck/ffmpeg-normalize/issues/38#issuecomment-330052921, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhMBBIOoPyImAMcLTpbigkzxp3zUzC_ks5sjS4LgaJpZM4PZ2N_ .

michaelcrossland commented 7 years ago

http://www.conversationsnetwork.org/levelator They have stopped work on this software but it still does work just fine. They have windows and os x and Linux/Unix versions of the app. I know on os x and windows you run it then drag and drop your file on the app window and then in about for a 2 hour file takes about 10 to 15 minutes or so. It maybe faster now with better CPUs and hard drives and hardware. Don't know have not need to use it as i just do audio in videos now not straight audio files. But last time I did use it worked great. Hope this helps.

On Sep 17, 2017 9:52 AM, "Michael Crossland" michaelcrossland@gmail.com wrote:

Just as heads up there was a freeware that the guy that do the TED talk podcasts it was called levelater. I'll see if I can find a link to download it if you run you file through it then run the ffmpeg-normalize on the new file it outputs you should get what you're looking to do.

On Sep 17, 2017 9:44 AM, "Michael Crossland" michaelcrossland@gmail.com wrote:

Only reason I say to try that is because back in the day I use to help do audio for a podcast group most of you may have heard of. TWiT. I worked on one the podcasts that never made it to videos. It was because the guy doing the main hosting moved on to a new job so they let drop. Kind of sucked loved work with Leo and Steve.

On Sep 17, 2017 9:40 AM, "Michael Crossland" michaelcrossland@gmail.com wrote:

That's why I said to try first if he can get a higher sampled recording. Then 2nd run a light low pass of -30 to -26. -30 to -28 is best as it don't rase the noise floor to much. But rase it just what's needed to get a better leveling done.

On Sep 17, 2017 9:34 AM, "Werner Robitza" notifications@github.com wrote:

If you have 10 minutes of someone speaking, there will be pauses in the audio that are very low in volume. The dynamic range is the difference between the maximum volume and the minimum volume, hence, there will be quite some dynamic range in your file.

Peak normalization (e.g., normalizing to a peak value of 0 dBFS in the following example) only lifts the gain, but the dynamic range of the file will be the same:

[image: image] https://user-images.githubusercontent.com/582444/30521800-a3f7f3a8-9bc5-11e7-8b50-569d51b71352.png

If you want to minimize the dynamic range you need a heavy compressor or some other form of dynamic normalization where only parts (windows) of that waveform are normalized. Note that if you do that too extremely you'll just amplify the noise floor of the recording, meaning that you'll get a lot of random noise in speaker pauses.

(Sorry for the simple drawing.)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/slhck/ffmpeg-normalize/issues/38#issuecomment-330052921, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhMBBIOoPyImAMcLTpbigkzxp3zUzC_ks5sjS4LgaJpZM4PZ2N_ .

greaber commented 7 years ago

I may have been misusing the term "dynamic range". All I meant by that is the difference between the max and min of the signal. I would have thought that peak normalization would either make the difference between the max and min or the maximum of the absolute value of the signal constant across files, but it apparently doesn't do either of these. Also, I would have though rms norm would fix the mean of the square of the signal, but it doesn't do that. (After norming, most of the files have this value around .0025, but some outliers are only half that.)

slhck commented 7 years ago

All I meant by that is the difference between the max and min of the signal

That is the common definition of dynamic range. Shifting the dB level by a fixed amount is not going to change the dynamic range.

Also, I would have though rms norm would fix the mean of the square of the signal, but it doesn't do that. (After norming, most of the files have this value around .0025, but some outliers are only half that.)

0.0025 seems like a quite good approximation though. Perhaps due to rounding errors?

greaber commented 7 years ago

Michael, maybe I will check out Levelator, but I need something I can use from the command line to process a big batch of files. In general, it could indeed be interesting to experiment with applying compression to the files although I think compression is typically not idempotent (i.e., if you pass a compressed signal back through the compressor it will get compressed more, right?), and normalization should be idempotent (although I haven't verified that ffmpeg-normalize is), which is a useful property if you are trying to make a set of files more similar to one another.

slhck commented 7 years ago

if you pass a compressed signal back through the compressor it will get compressed more, right?

Depends on whether you hit the threshold of the second compression stage, but generally, yes.

Simple peak / RMS normalization is idempotent, yes.

slhck commented 7 years ago

By the way, @greaber, I've implemented the change to the threshold parameter. Setting it to 0 should normalize all files regardless of the calculated output.

michaelcrossland commented 7 years ago

As for the app I pointed out if you're not on windows you can run you files to the app from the command line on OS X and the *inx based OSes. As for windows you could make a autoit or autohotkey script that would take and drag and drop the file you tell it to. And have it check a 2nd folder where you have the app output the new file once the new file is there and not locked then have the script get the next file so on and so on. I don't think there is any command line interface for the app. I maybe wrong on that. Don't know as when I used it I was on OS X at the time. God I'm getting to dang old now. Lol I remember trying to do this kind of stuff under windows 3.1 LMAO. As for the 2nd part if you run ffmpeg-normalize on a file it has to use a filter to do any of the normalize on your files. So yes you are re-encoding your files i.e. compressing your files. To a new mp3 or aac or mp4a or FLAC or org or what ever your audio codec your using except if you're going from a wav file to a wav file then there is no compressing of you audio data going on there.

On Sep 17, 2017 10:59 AM, "Werner Robitza" notifications@github.com wrote:

if you pass a compressed signal back through the compressor it will get compressed more, right?

Depends on whether you hit the threshold of the second compression stage, but generally, yes.

Simple peak / RMS normalization is idempotent, yes.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/slhck/ffmpeg-normalize/issues/38#issuecomment-330058517, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhMBDlcW2nVidnaImtdQzyHJ-iWFhPdks5sjUH1gaJpZM4PZ2N_ .

greaber commented 7 years ago

A few of the files are around .00125 instead of .0025, and some others are like .0018.

slhck commented 7 years ago

Given that this is in a dB range, I'd consider this to be sufficiently precise and I don't think that there's any way to get it to be exactly 0 – at least not with these tools.

michaelcrossland commented 7 years ago

Not with ffmpeg. If you want that level you will need to get a 1500+ buck sound gate / mixer sound board and rerun your audio output through the sound gate and then re record the output of the mixer. That's the only way to get that fine gain control of your levels.

On Sep 17, 2017 11:16 AM, "Werner Robitza" notifications@github.com wrote:

Given that this is in a dB range, I'd consider this to be sufficiently precise and I don't think that there's any way to get it to be exactly 0 – at least not with these tools.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/slhck/ffmpeg-normalize/issues/38#issuecomment-330059533, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhMBFqZhCeNtgAFXLK6F42jPSCcTr8uks5sjUXWgaJpZM4PZ2N_ .

greaber commented 7 years ago

Peak normalization (e.g., normalizing to a peak value of 0 dBFS in the following example) only lifts the gain, but the dynamic range of the file will be the same

I am confused by this. I thought the transformation would be to multiplicatively scale the signal so that the max of the absolute value is some fixed value. But this is evidently not what it does, and doing that would certainly change the dynamic range, defined as the difference between the max and min of the signal. (It would make it constant in the ideal case where the max and min of the signal is the same.) Although when we say that a piece of music has a lot of dynamic range, we usually mean something different, that there are quiet parts and loud parts, and just changing the volume doesn't change the dynamic range in this sense.

michaelcrossland commented 7 years ago

Only reason I know that is because Leo got one those for his shows right before he went to full videos podcasts. He tried to talk me into try my hand at running the board beside the fact I was moving back to Midwest to be with my family. I looked at it and told him what hell did he think I was a dang octopus because you'd need that many hands to make it work or have some ninja like reflexes to because to move all of the slides when they need to be moved. That thing had over 150 slides and over 45 of they where being used at any one give time on any of his shows.

On Sep 17, 2017 11:21 AM, "Michael Crossland" michaelcrossland@gmail.com wrote:

Not with ffmpeg. If you want that level you will need to get a 1500+ buck sound gate / mixer sound board and rerun your audio output through the sound gate and then re record the output of the mixer. That's the only way to get that fine gain control of your levels.

On Sep 17, 2017 11:16 AM, "Werner Robitza" notifications@github.com wrote:

Given that this is in a dB range, I'd consider this to be sufficiently precise and I don't think that there's any way to get it to be exactly 0 – at least not with these tools.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/slhck/ffmpeg-normalize/issues/38#issuecomment-330059533, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhMBFqZhCeNtgAFXLK6F42jPSCcTr8uks5sjUXWgaJpZM4PZ2N_ .

greaber commented 7 years ago

Given that this is in a dB range, I'd consider this to be sufficiently precise and I don't think that there's any way to get it to be exactly 0 – at least not with these tools.

The numbers I was giving were the mean of the square of the RMS normed signal. So they would never be zero unless the signal was zero. But I thought they should be fixed, not varying by a factor of two. That is what would happen if you just multiplicatively scaled the signal to fix the mean of the square. So I am not understanding what RMS normalization does I guess.

slhck commented 7 years ago

Peak normalization is only adding (or removing) a constant gain from all samples, thus changing the volume, but not the dynamic range. In simple math terms, when you do peak normalization, you perform an additive / subtractive operation. For example, your input is:

To normalize it to a peak level of 1.0, you have to add 0.2, thus:

Shown graphically, this is what peak normalization does:

image

You take the peak value of the signal and add or substract whatever you need to get to the new target peak value. All samples will be changed accordingly. Note that this curve is of course the volume curve of the signal, not the pure waveform.

What I guess you are thinking about is called compression / expansion, i.e. changing the dynamic range of the signal:

image

In this case you'd expand the dynamic range so that the minimum value is still the same, and the maximum value is now higher than the previous maximum. This is not what ffmpeg-normalize does and it is not what the ffmpeg volume filter does. You need the compand filter to accomplish that.

michaelcrossland commented 7 years ago

RMS just raises up your audio in the file to a set dB you give it. That's why if let's say you only have a 5dB head room in you file 5dB head room being the over all max dB gain that can be applied to the file before clipping happens. So if you you over dB in the file is at -31 dB then you could only take your file over all up to -26 dB before upper wave form clipping happens. The EBU uses the same thing as RMS but it looks at the full file so it knows where it should not apply any gain so that upper / overhead clipping don't happen. In those spots where it can apply gain it then uses a form of compression so that the really load part is closer in volume to the lower parts around it. That's what most TV shows do to mix in there sound tracks into the shows. It's the local TV stations that take run a bias to rase the volume of the background soundtracks in shows so that they sound like they have 5.1 sound from a 2.0 sound track. That's why I 7se the EBU to remix the sound from the TV shows I get from BitTorrent.

On Sep 17, 2017 11:28 AM, "greaber" notifications@github.com wrote:

Given that this is in a dB range, I'd consider this to be sufficiently precise and I don't think that there's any way to get it to be exactly 0 – at least not with these tools.

The numbers I was giving were the mean of the square of the RMS normed signal. So they would never be zero unless the signal was zero. But I thought they should be fixed, not varying by a factor of two. That is what would happen if you just multiplicatively scaled the signal to fix the mean of the square. So I am not understanding what RMS normalization does I guess.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/slhck/ffmpeg-normalize/issues/38#issuecomment-330060311, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhMBHrIrEAk0NVYiuOorWrqNWKx_LmQks5sjUjBgaJpZM4PZ2N_ .

greaber commented 7 years ago

But your chart is showing the log of the absolute value of the signal, right? You are not just adding a DC bias to the signal. And if you add to the log, that is multiplying the original signal. Or am I totally confused?

greaber commented 7 years ago

Or maybe the "volume curve" is something more complicated than this and that is my misunderstanding?

slhck commented 7 years ago

Eh, I see now where the confusion might be. In that chart I am totally oversimplifying, with the y-axis being just "volume". All the normalization happens in the dB realm, and this is what I've been talking about the entire time. You calculate the peak in terms of dBFS and the volume filter adds/subtracts volume in dB. This has nothing to do with what's actually done at a signal level, and of course signal modifications are log-transformed to dB.

greaber commented 7 years ago

What I was thinking was that these normalization techniques (or at least peak and RMS) just multiply the signal by a constant, and the difference between them is just how the constant is calculated. In the case of RMS it would be calculated so that the RMS of the signal itself would be fixed. But clearly this is wrong. In the case of peak, I had some different conjectures about how it would be calculated and still don't precisely understand (or even understand if I am right that normalization just multiplies by a constant).

slhck commented 7 years ago

I can't explain it any differently from what I tried above, and Wikipedia also says about audio normalization that it is:

the application of a constant amount of gain to an audio recording to bring the average or peak amplitude to a target level (the norm). Because the same amount of gain is applied across the entire recording, the signal-to-noise ratio and relative dynamics are unchanged.

Thus, your raw signal is not multiplied by a constant, but you add or subtract to the signal values whatever is needed to change the overall dB level by that constant (based on the logarithmic relationship between dBFS level and sample value).

The difference between peak and RMS normalization is not how that constant is calculated but which volume target you are trying to hit in the end. Here, RMS = "average" in Wikipedia terms.

michaelcrossland commented 7 years ago

I believe he is looking for what is called a "sound gate" / "dynamic compressor" and yes you could try to do it in something like audacity but he really needs to do that with a mixing board. Because once you have recorded the sound it's really hard to apply that kind of stuff after the fact. Because he wants to "condition" his sound. That's why movies have a total of 16 audio sources. Because each one is capturing a very specific part of the audio sound field frequency. That's why it takes so dang long to setup and shoot a scene. Most of the time is taken up by audio setup and lightning. I have first hand experience with that trying to make my own films i now know why there so dang many sound house's listed on most movies.

On Sep 17, 2017 12:09 PM, "Werner Robitza" notifications@github.com wrote:

I can't explain it any differently from what I tried above, and Wikipedia also says about audio normalization that it is:

the application of a constant amount of gain to an audio recording to bring the average or peak amplitude to a target level (the norm). Because the same amount of gain is applied across the entire recording, the signal-to-noise ratio and relative dynamics are unchanged.

Thus, your raw signal is not multiplied by a constant, but you add or subtract to the signal values whatever is needed to change the overall dB level by that constant (based on the logarithmic relationship between dBFS level and sample value).

The difference between peak and RMS normalization is not how that constant is calculated but which volume target you are trying to hit in the end.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/slhck/ffmpeg-normalize/issues/38#issuecomment-330063016, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhMBGoqpDetKMX2OmQEYwm82fJ9o6r6ks5sjVJcgaJpZM4PZ2N_ .