Question about order of operations: nar_audio_prenet and nar_audio_position

lifeiteng / vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

https://lifeiteng.github.io/valle/index.html

Apache License 2.0

1.99k stars 320 forks source link

Closed Misha24-10 closed 5 months ago

Misha24-10 commented 7 months ago

y_pos = self.nar_audio_position(y_emb)
y_pos = self.nar_audio_prenet(y_pos)

I am uncertain about the order of operations here. Should nar_audio_prenet be applied before nar_audio_position in this context?

lifeiteng commented 5 months ago

You can try different order, but the recommended configuration here is nn.Identity()

self.nar_audio_prenet = nn.Identity()