jaywalnut310 / glow-tts

A Generative Flow for Text-to-Speech via Monotonic Alignment Search
MIT License
667 stars 150 forks source link

A reshape problem in InvConvNear #46

Open shahuzi opened 3 years ago

shahuzi commented 3 years ago

Hi, @jaywalnut310 。I'm trying to understand the glow-tts by reading the code. And I am a little bit confused about this piece of code in InvConvNear。

https://github.com/jaywalnut310/glow-tts/blob/13e997689d643410f5d9f1f9a73877ae85e19bc2/modules.py#L214-L215

So if the purpose is reshape the input x from [b,c,t] to [b, self.n_split, c // self.n_split, t],what's the purpose of the L214?

shahuzi commented 3 years ago

I know the purpose of https://github.com/jaywalnut310/glow-tts/blob/13e997689d643410f5d9f1f9a73877ae85e19bc2/modules.py#L214 is some kind of shuffle channel now, but I still not understand why this step is required.

bear-boy commented 12 months ago

I know the purpose of

https://github.com/jaywalnut310/glow-tts/blob/13e997689d643410f5d9f1f9a73877ae85e19bc2/modules.py#L214

is some kind of shuffle channel now, but I still not understand why this step is required.

对,这是一种channel shuffle的操作。至于为什么需要channel shuffle,其实作者论文中有提及:To allow channel mixing in each group, the same number of channels are extracted from one half of the feature map separated by coupling layers and the other half, respectively。具体来说,就是因为glow-tts里Inverse 11 conv是分组卷积的形式,失去了原本glow模型中Inverse 11 conv做channel shuffle的功能,所以这里需要“手动”做channel shuffle,把affine coupling中保持不变的一部分和参与运算的一部分在channel维上重组,部分的实现原本glow中Inverse 1*1 conv的功能。