lucidrains / naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
MIT License
1.26k stars 100 forks source link

WaveNet #23

Closed yiwei0730 closed 1 year ago

yiwei0730 commented 1 year ago

@lucidrains I saw the thesis. There is a part in 4.2.-> Specifically, we use a FiLM layer [38] at every 3 WaveNet layers to fuse the condition information processed by the second Q-K-V attention in the prompting mechanism in the diffusion model.

but i saw your model use the Film layer in any layers

class WavenetResBlock(nn.Module): def init( self, dim, *, dilation, kernel_size = 3, skip_conv = False, dim_cond_mult = None ): super().init()

    self.cond = exists(dim_cond_mult)
    self.to_time_cond = None

    if self.cond:
        self.to_time_cond = nn.Linear(dim * dim_cond_mult, dim * 2)

    self.conv = CausalConv1d(dim, dim, kernel_size, dilation = dilation)
    self.res_conv = CausalConv1d(dim, dim, 1)
    self.skip_conv = CausalConv1d(dim, dim, 1) if skip_conv else None

def forward(self, x, t = None):

    if self.cond:
        assert exists(t)
        t = self.to_time_cond(t)
        t = rearrange(t, 'b c -> b c 1')
        t_gamma, t_beta = t.chunk(2, dim = -2)

    res = self.res_conv(x)

    x = self.conv(x)

    if self.cond: # need to applt the layers%3 
        x = x * t_gamma + t_beta

    x = x.tanh() * x.sigmoid()

    x = x + res

    skip = None
    if exists(self.skip_conv):
        skip = self.skip_conv(x)

    return x, skip

Is this a wrong edit or is a specific trick to run the wavenet ?

lucidrains commented 1 year ago

won't hurt from more conditioning, but i can make it a hyperparameter (how many layers per film condition)

yiwei0730 commented 1 year ago

Thanks for your reply,it may be a good effect of the naturalspeech 2. Is this github code can train enough or just need more programming?

lucidrains commented 1 year ago

@yiwei0730 it needs more work still