Probably the inconsistency between paper and the code?

yusuke-ai commented 3 weeks ago

Hi,

Thank you for the awesome work!

I'm reading your paper and the code. And maybe it has some inconsistency? The paper says

T-V encoder contains a few residual convolution blocks, but we employ Layer Normalization (LN) instead of IN to preserve temporal relationships in each instance

but the code below doesn't contain such code. https://github.com/winddori2002/DEX-TTS/blob/main/DEX-TTS/model/ref_encoder.py#L131

Should I add layer normalization to the code or is it ok to leave it without LN?

Thank you!

winddori2002 commented 3 weeks ago

Hi, thanks for your interest.

For TV encoder, replacing BN in the TVEncoderBlock with LN worked better. You can check the TVEncoderBlock and BasicConv class.

yusuke-ai commented 3 weeks ago

Thank you for the reply! OK. I will check.

winddori2002 / DEX-TTS

Probably the inconsistency between paper and the code? #7