ming024 / FastSpeech2

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
MIT License
1.78k stars 530 forks source link

Discrepancy in the Number of Decoder Layers #226

Open shreeshailgan opened 6 months ago

shreeshailgan commented 6 months ago

In Section 3.1, under Model Configuration, the paper states that the decoder consists of 4 FFT Transformer blocks. However, the provided checkpoints (and the model.yaml configs) have 6 FFT Transformer blocks in the decoder. Why this discrepancy? Did you later observe improvements in perfomance using 6 Decoder blocks instead of 4?