signal normalization seems to be inconsistent

tyiannak / pyAudioAnalysis

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

Apache License 2.0

5.76k stars 1.18k forks source link

signal normalization seems to be inconsistent #311

Open tluocs opened 3 years ago

tluocs commented 3 years ago

In ShortTermFeatures.py, feature_extraction() has the following: dc_offset = signal.mean() signal_max = (np.abs(signal)).max() signal = (signal - dc_offset) / (signal_max + 0.0000000001) Still in ShortTermFeatures.py, spectrogram() has the following: dc_offset = signal.mean() maximum = (np.abs(signal)).max() signal = (signal - dc_offset) / (maximum - dc_offset) Why in one case the denominator is subtracted by mean but in the other case is not? (Is that intentional or a mistake?)

Tronic commented 3 years ago

FWIW, both seem flawed in that they may cause clipping. The first one divides by maximum amplitude (+ 1e-10) failing to consider dc_offset. The second attempts to subtract dc_offset even though it may just as well have been a negative value.

I'm not particularly convinced of DC normalisation being of any use, but if you insist on doing it, do it before the scaling step (and perhaps it should be a power average rather than simple mean). Secondly, for most applications it makes more sense to adjust for nominal volume level, instead of maximum peak.

Tronic commented 3 years ago

I believe that this would accomplish the original intention, with the notion that the signal array is now modified in place.

signal -= signal.mean()              # Remove DC offset (by amplitude, not power)
signal /= abs(signal).max() + 1e-10  # Maximise, but avoid NaNs if only silence

tluocs commented 3 years ago

I do not see the point of doing signal -= signal.mean()

tyiannak commented 3 years ago

Thanx a lot for the comment both! @Tronic's right both normalizations are actually a bit wrong and can lead to clipped signals depending on the initial signal range.

I will add a two-step normalization as the sample code above by @Tronic

wrt your question why dc is important: non-zero-dc signals (this can be due to several reasons such as sound acquisition hardware calibration or whatever), influence features such as zero crossing rate.

tluocs commented 3 years ago

If the DC component should always be 0 (given that calibration is perfect), then subtracting signal.mean() is correct. Otherwise, if the signal by its origin does have non-zero mean (not sure if there are examples for this), then we should not subtract it.

It might be hard to tell whether a non-zero mean is caused by calibration or is truly part of the signal itself. In that case, we may have to go with the most probable scenario.

Tronic commented 3 years ago

I have never seen true DC offset in recorded signals. That is simply not happening with any modern hardware due to hardware DC filtering. It could occur in poorly synthesised signals but frankly I've rarely seen that either. Maybe more importantly, very short signal fragments may appear to have DC offset due to low frequency tones that do not appear in multiple full wavelengths in a very short period of time. In any case, a low cut filter is a better way to go, as that gives more control over what precisely is being removed, and doesn't make analysis such as counting zero crossings depend on the timing of the signal frames.

tyiannak commented 3 years ago

@Tronic i've very rarely seen non-zero dc offset but I can remember the following times:

recording on raspberry PI (i believe it was the 2nd raspberry with an onboard soundcard): result was a very strange decreasing positive DC offset (for several seconds)
call center data (I'm working on call center speech analytics): step-function-like dc offsets again for several seconds (though i do not know if this was indeed a hardware issue or a file encoding/transcoding issue of the call center system itself).