rild / TIL

today i learn
0 stars 0 forks source link

EFFICIENTLY TRAINABLE TEXT-TO-SPEECH SYSTEM BASED ON DEEP CONVOLUTIONAL NETWORKS WITH GUIDED ATTENTION #31

Open rild opened 7 years ago

rild commented 7 years ago

https://arxiv.org/abs/1710.08969

ダウンロードした

rild commented 7 years ago

一晩で Attention ネットワークの学習を終えることができる

品質は完璧ではないので、ハイパーパラメータの調整や、DL のテクニックを使って品質の改善ができるかも

今後の展望 emotional/non-linguistic/personalized speech synthesis singing voice synthesis music synthesis etc.

rild commented 7 years ago

attention の学習に時間がかかる → Guided Attention Loss を計算する?ことで高速化

rild commented 7 years ago

Demo page

github

rild commented 7 years ago

RNN:

a vanilla encode-decoder model cannot encode too long se- quence into a fixed-length vector effectively