tuan3w / cnn_vocoder

A fast cnn-based vocoder
MIT License
78 stars 13 forks source link

how can you make it speak written text? #9

Open marshonhuckleberry opened 4 years ago

tuan3w commented 4 years ago

Hi @marshonhuckleberry , Thanks for interesting in this work. This is just a vocoder, not a full text-to-speech system, which converts audio features into sound. I worked on this repo in about 2018. At this time, vocoders were too slow to generate sound (i.e. wavenet). It's just a hobby project, and I'm no longer working on this anymore. If you interest in tts, please use other repos like mozilla/tts or espnet,... Thanks.

marshonhuckleberry commented 4 years ago

afterwards you could tell how, i do a collections of vocoders old and new to evaluate them, ur code might be usefull to developers

tuan3w commented 4 years ago

My vocoder needs input is the spectrogram of audio, so you need to generate it somehow (i.e. train neural network to predict spectrogram given text). After that, it's easy to follow the guide to generate audio:

#  generate spectrogram of audio
$ python gen_spec.py -i sample.wav -o out.npz

# synthesis  audio from spectrogram
$ python synthesis.py --model_path path/to/checkpoint \
                      --spec_path out.npz \
                      --out_path out.wav