rild / TIL

today i learn
0 stars 0 forks source link

論文マラソン3週目 #6

Closed rild closed 7 years ago

rild commented 7 years ago

日本音響学会春季大会

WaveNetについて 知らない単語が複数

rild commented 7 years ago

条件付き確率

softmax ref1

rild commented 7 years ago

WaveNet の元の手法らしい


誤差関数が異なる. 二乗誤差ではなく, softmax に基づく, cross entropy を用いている.

  • ガウス分布を仮定した平均ベクトルの誤差最小問題 が,
  • 音声信号の振幅値の多クラス分類問題 に置き換えることができるようになる.

rild commented 7 years ago

liked from http://sergeiturukin.com/2017/03/22/training-experiments.html

Ultimately, Deepvoice text-to-speech (TTS) model is very intriguing but since there is no publicly available implementation (yet) one can’t perform experiments with it.

that article also contain Speech Synthesis demo

Audio synthesis


demo link

The robotic nature of the voice comes from the pipeline structure and the phoneme model; the audio synthesis component alone generates much more natural clips. The following are clips using the audio synthesis module, but using features from the ground truth audio instead of the phoneme model.


We answer this question in the affirmative and demonstrate efficient, faster-than-real-time WaveNet inference kernels that produce high-quality 16 kHz audio and realize a 400X speedup over previous WaveNet inference implementations (Paine et al., 2016).


Hacker's News

rild commented 7 years ago

ASJSpring2017

rild commented 7 years ago


rild commented 7 years ago

where the convolution is applied along each row

Cornel Uni Lib


Here we re-arrange the traditional cuboid order of computations in MD-LSTM in pyramidal fashion

PyraMiD-LSTM

Abst

rild commented 7 years ago

ASJ journal No. 4 2017 p210


サービスとして利用可能になっている sample concatenation, 波形合成型 についての調査

rild commented 7 years ago

deepvoice reference



In a typical system, there are normally around 50 different types of contexts [12]

  • Abst
  • Intro
rild commented 7 years ago
rild commented 7 years ago

Twitter で回ってきた NIPS2016 / Architectural Complexity Measures of Recurrent Neural Networks

rild commented 7 years ago

くっ