EFFICIENTLY TRAINABLE TEXT-TO-SPEECH SYSTEM BASED ON DEEP CONVOLUTIONAL NETWORKS WITH GUIDED ATTENTION

rild / TIL

today i learn

0 stars 0 forks source link

Open rild opened 7 years ago

rild commented 7 years ago

ダウンロードした

rild commented 7 years ago

一晩で Attention ネットワークの学習を終えることができる

品質は完璧ではないので、ハイパーパラメータの調整や、DL のテクニックを使って品質の改善ができるかも

今後の展望 emotional/non-linguistic/personalized speech synthesis singing voice synthesis music synthesis etc.

rild commented 7 years ago

attention の学習に時間がかかる → Guided Attention Loss を計算する？ことで高速化

rild commented 7 years ago

rild commented 7 years ago

RNN:

a vanilla encode-decoder model cannot encode too long se- quence into a fixed-length vector effectively