論文マラソン３週目

rild commented 7 years ago

3-1 JP WaveNet における音声波形量子化法の評価 ∗ ◎橘健太郎 (NICT), 戸田智基 (名大/NICT), 志賀芳則, 河井恒 (NICT)

日本音響学会春季大会

WaveNetについて知らない単語が複数

rild commented 7 years ago

条件付き確率

softmax ref1

rild commented 7 years ago

3-2 https://arxiv.org/abs/1601.06759 Pixcel Recurrent Neural Networks

WaveNet の元の手法らしい

誤差関数が異なる. 二乗誤差ではなく, softmax に基づく, cross entropy を用いている.

ガウス分布を仮定した平均ベクトルの誤差最小問題が,

音声信号の振幅値の多クラス分類問題に置き換えることができるようになる.

RNN と CNN の改善を試みた
より大きく, 複雑なデータをモデル化する為の, 確率的生成モデル

rild commented 7 years ago

3-3 Deepvoice https://arxiv.org/pdf/1702.07825.pdf

liked from http://sergeiturukin.com/2017/03/22/training-experiments.html

Ultimately, Deepvoice text-to-speech (TTS) model is very intriguing but since there is no publicly available implementation (yet) one can’t perform experiments with it.

that article also contain Speech Synthesis demo

Audio synthesis

demo link

The robotic nature of the voice comes from the pipeline structure and the phoneme model; the audio synthesis component alone generates much more natural clips. The following are clips using the audio synthesis module, but using features from the ground truth audio instead of the phoneme model.

We answer this question in the affirmative and demonstrate efficient, faster-than-real-time WaveNet inference kernels that produce high-quality 16 kHz audio and realize a 400X speedup over previous WaveNet inference implementations (Paine et al., 2016).

Hacker's News

https://news.ycombinator.com/item?id=13756489

rild commented 7 years ago

3-4 JP DNNを用いた聴覚障害者の音声合成の検討* ☆北村毅(神戸大)，滝口哲也(神戸大/JST さきがけ)，有木康雄(神戸大)

ASJSpring2017

rild commented 7 years ago

3-5 ResNet Cornell Uni Lib

skip connection を使って、入力で出力を初期化することで階層を深くしても劣化が起きにくくした？
- 欠点を見つけられなかった（Abst と Intro のみ)
これまでは、階層を深くすることで性能がよくなることが知られていたが、深すぎると勾配消失と勾配爆発の問題があった

[Survey]Identity Mappings in Deep Residual Networks

性能を良くしようと複雑なことをすると、ネットワーク自体の表現能力はあがりますが、最適化がむずかしくなるというトレードオフがある

rild commented 7 years ago

3-6 The first type is the Row LSTM layer

where the convolution is applied along each row

Cornel Uni Lib

Here we re-arrange the traditional cuboid order of computations in MD-LSTM in pyramidal fashion

PyraMiD-LSTM

CNNは畳み込み範囲が狭かった
MD-RNNはより広範囲の情報を掃き出すことができたので性能向上に繋がった
- ただしGPUでの並列演算を行いにくい弱点があった
並列処理しやすい関数を提案する

Abst

rild commented 7 years ago

3-7 JP-EN ss old(2007):: VOCALOID – Commercial singing synthesizer based on sample concatenation pdf link

ASJ journal No. 4 2017 p210

サービスとして利用可能になっている sample concatenation, 波形合成型についての調査

Abst
1. Synthesis Engine

rild commented 7 years ago

3-8 STATISTICAL PARAMETRIC SPEECH SYNTHESIS USING DEEP NEURAL NETWORKS pdf link

deepvoice reference

静的音声合成とその弱点について
- 確率密度を用いた決定木を使ったモデル化は, 複雑なコンテキストの依存関係を効率よくモデル化できない.
入力(text)出力(acoustic realization)をDNNで関連付ける
HMM が Text-To-Speech の確率分布を基にした決定木によって音声信号のマッピングを行うと考えると, 決定木はDNNに置き換えることが可能
- 隠れ層に十分なユニットがある場合

In a typical system, there are normally around 50 different types of contexts [12]

Abst

Intro

rild commented 7 years ago

XX 海藻ｲｹﾙﾜｧ http://www.nature.com/nature/journal/v464/n7290/abs/nature08937.html

rild commented 7 years ago

3-9 Architectural Complexity Measures of Recurrent Neural Networks

Twitter で回ってきた NIPS2016 / Architectural Complexity Measures of Recurrent Neural Networks

rild commented 7 years ago

くっ

rild / TIL

論文マラソン３週目 #6

In a typical system, there are normally around 50 different types of contexts [12]