Open tluocs opened 3 years ago
FWIW, both seem flawed in that they may cause clipping. The first one divides by maximum amplitude (+ 1e-10) failing to consider dc_offset. The second attempts to subtract dc_offset even though it may just as well have been a negative value.
I'm not particularly convinced of DC normalisation being of any use, but if you insist on doing it, do it before the scaling step (and perhaps it should be a power average rather than simple mean). Secondly, for most applications it makes more sense to adjust for nominal volume level, instead of maximum peak.
I believe that this would accomplish the original intention, with the notion that the signal
array is now modified in place.
signal -= signal.mean() # Remove DC offset (by amplitude, not power)
signal /= abs(signal).max() + 1e-10 # Maximise, but avoid NaNs if only silence
I do not see the point of doing signal -= signal.mean()
Thanx a lot for the comment both! @Tronic's right both normalizations are actually a bit wrong and can lead to clipped signals depending on the initial signal range.
I will add a two-step normalization as the sample code above by @Tronic
wrt your question why dc is important: non-zero-dc signals (this can be due to several reasons such as sound acquisition hardware calibration or whatever), influence features such as zero crossing rate.
If the DC component should always be 0 (given that calibration is perfect), then subtracting signal.mean()
is correct. Otherwise, if the signal by its origin does have non-zero mean (not sure if there are examples for this), then we should not subtract it.
It might be hard to tell whether a non-zero mean is caused by calibration or is truly part of the signal itself. In that case, we may have to go with the most probable scenario.
I have never seen true DC offset in recorded signals. That is simply not happening with any modern hardware due to hardware DC filtering. It could occur in poorly synthesised signals but frankly I've rarely seen that either. Maybe more importantly, very short signal fragments may appear to have DC offset due to low frequency tones that do not appear in multiple full wavelengths in a very short period of time. In any case, a low cut filter is a better way to go, as that gives more control over what precisely is being removed, and doesn't make analysis such as counting zero crossings depend on the timing of the signal frames.
@Tronic i've very rarely seen non-zero dc offset but I can remember the following times:
In ShortTermFeatures.py, feature_extraction() has the following:
dc_offset = signal.mean() signal_max = (np.abs(signal)).max() signal = (signal - dc_offset) / (signal_max + 0.0000000001)
Still in ShortTermFeatures.py, spectrogram() has the following:dc_offset = signal.mean() maximum = (np.abs(signal)).max() signal = (signal - dc_offset) / (maximum - dc_offset)
Why in one case the denominator is subtracted by mean but in the other case is not? (Is that intentional or a mistake?)