Closed rild closed 7 years ago
WaveNet の元の手法らしい
誤差関数が異なる. 二乗誤差ではなく, softmax に基づく, cross entropy を用いている.
- ガウス分布を仮定した平均ベクトルの誤差最小問題 が,
- 音声信号の振幅値の多クラス分類問題 に置き換えることができるようになる.
liked from http://sergeiturukin.com/2017/03/22/training-experiments.html
Ultimately, Deepvoice text-to-speech (TTS) model is very intriguing but since there is no publicly available implementation (yet) one can’t perform experiments with it.
that article also contain Speech Synthesis demo
Audio synthesis
The robotic nature of the voice comes from the pipeline structure and the phoneme model; the audio synthesis component alone generates much more natural clips. The following are clips using the audio synthesis module, but using features from the ground truth audio instead of the phoneme model.
We answer this question in the affirmative and demonstrate efficient, faster-than-real-time
WaveNet
inference kernels that produce high-quality 16 kHz audio and realize a400X speedup
over previous WaveNet inference implementations (Paine et al., 2016).
Hacker's News
ASJSpring2017
性能を良くしようと複雑なことをすると、ネットワーク自体の表現能力はあがりますが、最適化がむずかしくなるというトレードオフがある
where the convolution is applied along each row
Here we re-arrange the traditional cuboid order of computations in MD-LSTM in pyramidal fashion
PyraMiD-LSTM
Abst
deepvoice reference
静的音声合成とその弱点について
入力(text)出力(acoustic realization)をDNNで関連付ける
HMM が Text-To-Speech の確率分布を基にした決定木によって音声信号のマッピングを行うと考えると, 決定木はDNNに置き換えることが可能
In a typical system, there are normally around 50 different types of contexts [12]
- Abst
- Intro
Twitter で回ってきた NIPS2016 / Architectural Complexity Measures of Recurrent Neural Networks
くっ
日本音響学会春季大会
WaveNetについて 知らない単語が複数