Streaming ASRモデルの学習 - Githubissues

reazon-research / ReazonSpeech

Massive open Japanese speech corpus

https://research.reazon.jp/projects/ReazonSpeech/

Apache License 2.0

239 stars 18 forks source link

Streaming ASRモデルの学習 #10

Open fujimotos opened 1 year ago

fujimotos commented 1 year ago

このチケットのゴール

現在はConformer-Transformerモデルで推論を行っているが、時間長に対し2乗の計算量が必要になる。
このため、64GBのメモリを搭載したGPUでも一度に推論できるのは概ね数分である。
より長い音声に対応したモデルを検証し、より長い音声の推論をサポートする。

参考

ESPnet2 real streaming Transformer demonstration https://espnet.github.io/espnet/notebook/espnet2_streaming_asr_demo.html
RNNベースのCSJ学習レシピ https://github.com/espnet/espnet/blob/master/egs2/csj/asr1/conf/tuning/train_asr_rnn.yaml