In Section 3.1, under Model Configuration, the paper states that the decoder consists of 4 FFT Transformer blocks. However, the provided checkpoints (and the model.yaml configs) have 6 FFT Transformer blocks in the decoder.
Why this discrepancy? Did you later observe improvements in perfomance using 6 Decoder blocks instead of 4?
In
Section 3.1
, underModel Configuration
, the paper states that the decoder consists of 4 FFT Transformer blocks. However, the provided checkpoints (and themodel.yaml
configs) have 6 FFT Transformer blocks in the decoder. Why this discrepancy? Did you later observe improvements in perfomance using 6 Decoder blocks instead of 4?