Unofficial Pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (https://arxiv.org/pdf/2401.01498.pdf)
This project will be updated slowly.
I'm clutching my master's thesis, this project may be stopped for a month or two.
1 A100 80GB
https://k2-fsa.github.io/k2/installation/index.html
Kim M, Jeong M, Choi B J, et al. Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction[J]. arXiv preprint arXiv:2401.01498, 2024.
Kim M, Jeong M, Choi B J, et al. Transduce and speak: Neural transducer for text-to-speech with semantic token prediction[C]//2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2023: 1-7.