pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.54k stars 652 forks source link

lfilter function numerical stability #676

Open jimchen90 opened 4 years ago

jimchen90 commented 4 years ago

🐛 Bug

When using lfilter function in torchaudio/functional.py, I input two groups of biquad coefficient parameters (one group is normalized by a0, and the other is unnormalized). The maximum error between two output waveforms is 6e-4.

To Reproduce

Steps to reproduce the behavior:

I have a group of coefficients calculated from the bass biquad filter:

[b0, b1, b2, a0, a1, a2] = 
[27.26210639724472 -37.770844011587144 14.561748502365816 20.54382514976342 -39.77708440115872 19.27378936027553]

After normalization by a0:

[b0/a0, b1/a0, b2/a0, a0/a0, a1/a0, a2/a0] = 
[1.3270219250069244 -1.8385497216920241 0.7088138842796521 1.0 -1.9362063350513272 0.9381791959272727]

The details of coefficient calculation are shown in Additional Test (list below). Then I compare the output waveforms and the number of elements that has error (absolute difference of output values) larger than 1e-4 is 12603. The maximum error between two waveforms is 6e-4.

filename = 'whitenoise.wav'
noise_filepath = common_utils.get_asset_path(filename)
waveform, sample_rate = torchaudio.load(noise_filepath, normalization=True)
device = waveform.device
dtype = waveform.dtype

output_waveform1 = F.lfilter(
waveform,
torch.tensor([a0, a1, a2], dtype=dtype, device=device),
torch.tensor([b0, b1, b2], dtype=dtype, device=device),
)

output_waveform2 = F.lfilter(
waveform,
torch.tensor([a0/a0, a1/a0, a2/a0], dtype=dtype, device=device),
torch.tensor([b0/a0, b1/a0, b2/a0], dtype=dtype, device=device),
)

atol = 1e-4
diff = abs(output_waveform2 - output_waveform1)
print((diff > atol).sum())
v, i = diff.topk(1)
print(v)

output result: 12603 6e-4

Expected behavior

There is negligible error between two output waveforms.

Environment

Additional context

The maximum error deceases when gain decreases. The method to calculate the coefficients of bass (biquad) filter:

def bass_biquad_coefficient(
        #waveform: Tensor,
        sample_rate: int,
        gain: float,
        central_freq: float = 100,
        Q: float = 1
) -> Tensor:
    r"""Design a bass tone-control effect.  Similar to SoX implementation.
    Args:
        waveform (Tensor): audio waveform of dimension of `(..., time)`
        sample_rate (int): sampling rate of the waveform, e.g. 44100 (Hz)
        gain (float): desired gain at the boost (or attenuation) in dB.
        central_freq (float, optional): central frequency (in Hz). (Default: ``100``)
        Q (float, optional): https://en.wikipedia.org/wiki/Q_factor (Default: ``0.707``).
    Returns:
        Tensor: Waveform of dimension of `(..., time)`
    References:
        http://sox.sourceforge.net/sox.html
        https://www.w3.org/2011/audio/audio-eq-cookbook.html#APF
    """
    w0 = 2 * math.pi * central_freq / sample_rate
    alpha = math.sin(w0) / 2 / Q
    A = math.exp(gain / 40 * math.log(10))

    temp1 = 2 * math.sqrt(A) * alpha
    temp2 = (A - 1) * math.cos(w0)
    temp3 = (A + 1) * math.cos(w0)

    b0 = A * ((A + 1) - temp2 + temp1)
    b1 = 2 * A * ((A - 1) - temp3)
    b2 = A * ((A + 1) - temp2 - temp1)
    a0 = (A + 1) + temp2 + temp1
    a1 = -2 * ((A - 1) + temp3)
    a2 = (A + 1) + temp2 - temp1

    return [b0, b1, b2, a0, a1, a2]

central_freq = 1000
s = 1
gain = 40
sample_rate = 44100
A = math.exp(gain / 40 * math.log(10))
Q = 1/math.sqrt(2 + (A + 1/A) * (1/s - 1)) 

[b0, b1, b2, a0, a1, a2] = bass_biquad_coefficient(sample_rate, gain, central_freq, Q)
cpuhrsch commented 4 years ago

sox might be using doubles - so this won't match the performance. However, float32 is used more widely and can get us better performance.

cpuhrsch commented 4 years ago

Should we consider returning normalized coefficients by default? If yes, how do we choose the normalization coefficient? What are the most common downstream tasks these coefficients are used for and do those dependent on the relative or absolute size of the coefficients?

Maybe we can just normalize them to be within -1 and 1 just like our waveforms and then add a flag to allow the user to turn this off. Do other libraries do this?

cpuhrsch commented 3 years ago

I'd like to quote @yoyololicon's comment here as reference for potential future work.

"""

Currently, the custom backward function is actually just filter the gradients in reversed direction with the same parameters. If there is numerical problems in forward pass, it's very likely to happen in backward.

The instability of lfilter in torchaudio can be a result of:

  1. Using float32 single precision (#676). This is unavoidable in PyTorch. I also only do gradcheck with double precision.
  2. Using Direct Form instead of Transposed Form II. Direct Form is simple to implement (just a convolution and a IIR filter), but is well known to have stability issues; Transposed Form II is good for real-time application, but implement it efficiently in offline application is non-trivial.
  3. Very high order filter (with order > 2). A solution is factorizing the filter into cascaded second-order filters, and is also why SciPy recommend using scipy.signal.sosfilt instead of scipy.signal.lfilter in their documentation.

In future I would recommend implementing a custom second-order filter using Transposed Form II, and factorize the filter coefficients using root finding algorithm when higher order filter is needed.

"""