ReferenceEncoder is only use one rnn layers, is different form paper https://arxiv.org/pdf/1812.04342.pdf, which say: "We use the same architecture and hyperparameters for reference encoder as Wang et al. [4] which consists of six 2-D convolutional layers followed by a GRU layer. "
https://github.com/yanggeng1995/vae_tacotron/blob/11a062ffc5534c44e6963b2ab62fa5b503b2835b/models/modules.py#L30
ReferenceEncoder is only use one rnn layers, is different form paper https://arxiv.org/pdf/1812.04342.pdf, which say: "We use the same architecture and hyperparameters for reference encoder as Wang et al. [4] which consists of six 2-D convolutional layers followed by a GRU layer. "