iamycy / golf

A DDSP-based neural voice synthesiser.
https://iamycy.github.io/golf2-demo/
MIT License
109 stars 8 forks source link

Batched conversion of cascaded biquads to high-order filter #5

Open SuperKogito opened 4 months ago

SuperKogito commented 4 months ago

Is there some documentation of this function / the math behind it? https://github.com/yoyololicon/golf/blob/52f50e7341f769d49e6bddbbe887c149c2b9a413/models/utils.py#L444-L460

I am trying to extend it to work on a batched input (num_batches, num_sections, num_biquad_coeffs=6) and I am not sure how to proceed.

yoyolicoris commented 4 months ago

Hi @SuperKogito,

Just transpose dims 0 and 1 so it becomes (num_sections, num_batches, num_coeffs), and then coeff_product can be applied to it. The output will be (num_batches, num_coeffs * num_sections - 1).

SuperKogito commented 4 months ago

Unfortunately this fails at the conv1d

bx
Out[19]: 
tensor([[[ 1.0000, -1.6978,  0.7266],
         [ 1.0000, -1.8332,  0.8801]],

        [[ 1.0000, -1.6978,  0.7266],
         [ 1.0000, -1.8332,  0.8801]],

        [[ 1.0000, -1.6978,  0.7266],
         [ 1.0000, -1.8332,  0.8801]],

        [[ 1.0000, -1.6978,  0.7266],
         [ 1.0000, -1.8332,  0.8801]]])

def coeff_product(polynomials):
    n = len(polynomials)
    if n == 1:
        return polynomials

    c1 = coeff_product(polynomials[n // 2 :])
    c2 = coeff_product(polynomials[: n // 2])
    if c1.shape[1] > c2.shape[1]:
        c1, c2 = c2, c1
    weight = c1.unsqueeze(1).flip(2)
    prod = F.conv1d(
        c2.unsqueeze(0),
        weight,
        padding=weight.shape[2] - 1,
        groups=c2.shape[0],
    ).squeeze(0)
    return prod

bx.shape
Out[21]: torch.Size([4, 2, 3])

bx.transpose(0, 1).shape
Out[22]: torch.Size([2, 4, 3])

coeff_product(bx.transpose(0, 1))
Traceback (most recent call last):

  File "D:\Users\am\AppData\Local\Temp\ipykernel_23352\2398578349.py", line 1, in <module>
    coeff_product(bx.transpose(0, 1))

  File "D:\Users\am\AppData\Local\Temp\ipykernel_23352\657916472.py", line 11, in coeff_product
    prod = F.conv1d(

RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [1, 1, 4, 3]

when using

    prod = F.conv1d(
        c2,
        weight,
        padding=weight.shape[2] - 1,
        groups=c2.shape[0],
    ).squeeze(0)

The code executes but the result does not have the right shape nor the correct values.

yoyolicoris commented 4 months ago

It works on my side.

>>> x = torch.randn(2, 4, 3)
>>> x
tensor([[[-1.5509,  0.1190,  1.2193],
         [-1.7256, -0.8840,  0.7147],
         [ 1.1037,  0.4033, -0.9190],
         [-0.5180,  0.6319, -0.7792]],

        [[ 0.3549,  1.5161,  0.4884],
         [-1.5760,  0.6141, -0.0958],
         [-0.2039,  1.0868, -1.1043],
         [ 1.1191,  0.3513, -0.4821]]])
>>> coeff_product(x)
tensor([[-0.5504, -2.3091, -0.1443,  1.9067,  0.5955],
        [ 2.7196,  0.3335, -1.5039,  0.5236, -0.0685],
        [-0.2251,  1.1173, -0.5931, -1.4442,  1.0149],
        [-0.5797,  0.5252, -0.4004, -0.5783,  0.3756]])
>>> coeff_product(x)
SuperKogito commented 4 months ago

Which pytorch and torchaudio versions are you using ?