Feature: Auto-normalize volume

mpogue2 commented 11 months ago

See #948 and #944 and #922 for discussions about how to handle variability of volume in music files. I also tried implementing ReplayGain a while ago, and removed it, because it did not work well (and it took extra time for the analysis).

I think the best approach we have come up with so far in our discussions:

is cheap to calculate
does a reasonable job of bumping up low-volume songs
avoids introducing clipping

has this implementation:

When the waveform is fully in memory at load time, scan to calculate PEAK and RMS (note that the PEAK part is already done now to calculate the max in each audio segment for the svgWaveformSlider's waveform display, we would just add a calculation on the way through for track RMS, and for overall track PEAK)
calculate boost to get PEAK to 1.0 (call it PEAKBoost)
calculate boost to get RMS to 0dB (call it RMSBoost)
Scale the audio at playback time by min(PEAKboost, RMSBoost). This gets us as much boost as possible (of the RMSBoost), while ensuring that we don't introduce any clipping.
Scale the BgPixmap waveform by the same scaling factor to indicate to the user what's going on
Retire the Pan/EQ Compensation knob (no longer needed)

Gero5 commented 11 months ago

I would like that feature. keep that in mind: #955

Gero5 commented 11 months ago

Retire the Pan/EQ Compensation knob (no longer needed)

this is a separate issue for me. when implemented for a normalized song, we are still missing 3 dB of gain.

mpogue2 commented 11 months ago

@Gero5 I'm not sure I understand your last comment... If we normalize it with RMS, most songs will end up being 3dB higher than today, and it will match the volume of QuickTimePlayer.

Now that I think about it more, I don't think that RMS scaling to 0dB works (that will introduce clipping for sure, since most all of the peaks will exceed the RMS average value). It would have to be RMS scaling to something like -8dB, limited by Peak scaling so as to not exceed 1.0 full-scale.

So alternatives: 1) output = source limit_so_that_peaks_don't_exceed_1.0(boost_toRMS-8dB) 2) output = source limit_so_that_peaks_don't_exceed_1.0(alwaysadd+3dB) 3) output = source * limit_so_that_peaks_don't_exceed_1.0(Pan/EQ_compensation_boost_dial)

I don't think we can do 4) output = source limit_so_that_peaks_don't_exceed_1.0(boost_toRMS-8dB) Pan/EQ_compensation_boost_dial because that would always clip

Is that right? Or are you thinking of some other equation?

Gero5 commented 11 months ago

I'm not sure I understand your last comment...

Problem No 1 : source level #954 #948 #922 Music files are recorded at different levels. modern music is usually cranked to the 0 dB limit as much as possible with several compression cycles and limiters. most music pieces produced by square dance labels are not. so the square dance music is a lower level. you have to adjust a lot on the mixer or in my case at the loudspeaker to have that at an even level during a dance.

if we could normalize the low level records to a higher level as discussed that would solve problem No. 1.

Problem No 2: output level #821 #944 Even when playing songs that have a 0 dB level on the music file the Square Desk music player is 3 dB lower in output level compared to almost all other players. So when you are calling together with other callers/cuers/instructors and you pull the jack plug out of the other computer and then into the Mac you have to adjust the music level to double it, if even possible. To be done/redone every time you switch computers which is annoying. As well I am loosing performance, e.g. when calling outdoors or in a tent. In those cases I need the full performance of my active box. 3 dB means doubling the sound pressure (am I right?) and makes big difference ist those cases. I could bring in another loudspeaker or amplifier with a higher performance, but from a technical standpoint it is not necessary. It is not the wrong amplifier, it is not a too low amplifier's gain the cause is too low output signal from the Mac when using SquareDesk. Therefore I had used another player in those cases so far. From an engineering standpoint transferring a signal at a -3 dB level is also bad for the signal to noise ratio. But that is more a theoretical aspect.

the Pan/EQ gain compensation will fix problem No.2.

BUT fixing one problem does not fix the other. for problem No. 1 compare the loudness from song to song. For problem No.2 to compare player output to another player's output.

Gero5 commented 11 months ago

@mpogue2 I'm not sure I understand the calculation logic:

I assume that the PEAKBoost analysis and the RMSBoost analysis both return a constant gain factor for the whole music. it does not change at certain points of the song.

calculate boost to get PEAK to 1.0 (call it PEAKBoost)

so we are at a 100 % level, right?

calculate boost to get RMS to 0dB (call it RMSBoost)

So "RMS to 0 dB"/RMSBoost is the reciprocal value of RMS?

As far as I understand RMS: applied to sine wave with a peak at 0 dB / 1.0 / 100 %:

PEAK returns a value of 1.0
RMS is 0,707. "RMS to 0 dB" is a factor of 1,41.

by the nature of the "RMS to 0dB" function it will always return a higher value than PEAK. the RMS factor alone brings the music into clipping problems. so what ist the benefit of that?

then this factor goes on top? why? now I assume the song is at a 141 % peak level. or is the song not recalculated at all so far?

Scale the audio at playback time by min(PEAKboost, RMSBoost).

I thought scale it once at load time and cache the values instead of the original music would be enough. easier to draw an proceed the rest. I do not understand why the min()-function is used here.

This gets us as much boost as possible (of the RMSBoost), while ensuring that we don't introduce any clipping.

Isn't the min function returning the minimum boost of the two choices? and why is that as much as possible?

doesn't the min(PEAKboost, RMSBoost) always return the PEAKBoost factor? to what degree is RMS helpful?

mpogue2 commented 11 months ago

Yes, I made a mistake when I said "RMS boost to 0dB", because RMS is always less than PEAK, so this would always cause clipping as well.

Let's use a concrete example:

Test audio file            LUFS            RMS             Peak
------------------------------------------------------------------------
Baby SOURCE (Youtube)     -6.18           -8.6             1.0  (0dB)
Baby QuickTimePlayer      -6.01 (+0.2dB)  -8.4  (+0.2dB)   1.0  (0dB)
Baby SquareDesk           -9.01 (-2.8dB) -11.4  (-2.8dB)   0.88 (-1.1dB)

Chirp SOURCE (Audacity)   +1.52           -3.0             1.0   (0dB)
Chirp QuickTimePlayer     +1.49 (0dB)     -3.0 (0dB)       1.0   (0dB)
Chirp SquareDesk          -1.52 (-3dB)    -6.0 (-3dB)      0.709 (-3dB)
------------------------------------------------------------------------

LUFS uses K-weighting curves to weight by frequency as per human ear
RMS treats all frequencies equally

numbers in (parentheses) are relative to the volume of the source material

Look at "Baby", the question is how much should it be boosted?

a) If we look at just RMS with a target of say -8dB, a boost of +3.4dB would be good. But, a +3.4dB boost would push PEAK over 1.0 (by 2.3dB), causing audible clipping. b) If we look at just PEAK, +1.1dB would be the most we could do (to avoid clipping). But, that doesn't push RMS up high enough to match the source (RMS goes to -10.3dB instead of target -8dB). c) If we look at both RMS and PEAK, we would push up by min(+1.1dB, +3.4dB) = +1.1dB, which puts the peaks right at 1.0. But, RMS is still too low.

And, if we add in an additional Pan/EQ compensation of 3dB, then: a) clips b) clips c) clips

The situation with Chirp is different, though!

a) RMS alone (target -8dB): push DOWN by 2dB, does NOT cause clipping. b) PEAK alone: push UP by 3dB, does NOT cause clipping, but RMS is now -3dB, way over our target of -8 c) BOTH: min(-2.0,3.0) = -2, so we get correct RMS without clipping, but peaks are low.

Why is Chirp that different from Baby? I don't know.

The Chirp has coherent L and R channels, while Baby does not?
EQ is not a pass through, when knobs are all at 0dB?

Gero5 commented 11 months ago

pls take into account that this issue is about source normalization. you are mixing that with the pan compensation that is not helpful. the examples are focussing on two other aspects from my view. 1) SquareDesk has a -3 dB lower output than other players because of pan/Eq 2) the 0.88 dB peak for Baby shows that there is something unexpected happening in the audio processing. which probably is a bug that should be fixed

if a song is normalized the peak output value on a normal player is 0 dB. played on SquareDesk the peak output is -3 dB if the audio is processed correctly.

you cannot compensate 1) at source level, because that would mean you proceeding a song that is +3dB. levels over 1.0 are not possible.

Look at "Baby", the question is how much should it be boosted?

due to this issue (nomalization) not at all. because the source is already at a 0 dB level.

mpogue2 commented 11 months ago

I think I understand better now what you're saying, thanks for the explanation...

For 1, I am tempted to get rid of the Pan/EQ knob, and just move to a Balance control style of Pan. That should fix the -3dB problem, and it's simpler than having a knob.

For 2, I'll try it with EQ on, but Pan == Balance, and see what we get. I don't have any good alternatives for EQ, other than disabling it when the B/M/T knobs are at center position. QuickTimePlayer does not do EQ, so there will be some differences.

mpogue2 commented 11 months ago

And the funniest thing: I forgot that when I made the switch to DarkMode, I eliminated the Mix (Pan) control entirely, since nobody really uses it! There's no knob or slider for PAN in Dark Mode.

So, in old Light Mode, the Pan will now be Balance (no 3dB drop, if Pan is centered).
In the new Dark Mode, Pan is now explicitly disabled (even if somebody had set Pan for an Individual song), and so no 3dB drop due to Pan.

Updated SquareDesk after commit 9dfa427234886707c9763bbdea8553b3f985eb2d :

and, LUFS/RMS/peak out of SquareDesk now match QuickTimePlayer when B/M/T EQ are all at zero.

mpogue2 commented 11 months ago

Leaving this one open, since I still have yet to implement auto-normalization.

mpogue2 commented 11 months ago

Initial implementation of auto-normalization of track peak to 0.0dB: e852db1aa6e7c951d496dfba515089e58823dde2 .

New persistent menu item: Music > Normalize Track Audio. When enabled, will find the highest peak in the file (positive or negative) and will normalize playback so that the peak is at 1.0 (or -1.0, if it's a negative peak). Audio data remains unchanged, both in the file, and in memory -- this is only a playback time effect. Waveform display is updated accordingly to show that normalization has occurred.

Note that the Music > Normalize Track Audio is a toggle, and you can toggle it back and forth during playback to hear (and see) what it does. There might be other GUI affordances we could use to indicate this (like a little "N" badge or something), but this was a simple approach to start out with, and I think it's the most intuitive for non-audio-geeks.

The track peak is calculated only once. Applying the normalizationFactor to the music is essentially zero additional CPU cycles (because we're already scaling for volume, pan, fade, etc, and this factor is just rolled into the single multiply that's done already).

It seems to work great in my testing of an intentionally low-volume track that I made in Audacity. Note that the waveform scales dynamically, and also the VU Meter is dynamically showing high or low output volume accordingly.

TrackNormalizeDemo

mpogue2 commented 11 months ago

Note: the Pan/EQ compensation knob is still in there, but I don't think it's needed, with both the 3dB Pan Law loss removed and the auto-peak-normalization implemented.

mpogue2 / SquareDesk

Feature: Auto-normalize volume #954