k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
515 stars 103 forks source link

Add contextual-baising for transducer models #407

Closed pkufool closed 1 year ago

pkufool commented 1 year ago

Contextual-biasing results for non-streaming transducer model.

Decode with contexts phrases

python sherpa/bin/offline_transducer_asr.py --tokens exp-zh/data/lang_char/tokens.txt --modeling-unit char --nn-model exp-zh/exp/cpu_jit_epoch_4_avg_1_torch.
1.11.0.pt --context-score 2 --decoding-method modified_beam_search --contexts $'\346\226\207\346\243\256\347\211\271\345\215\241\347\264\242/\350\222\213\345\2
17\213\345\215\232/\346\234\261\347\253\213\346\245\240/\345\221\250\346\234\233\345\206\233' exp-zh/test_wavs/3.wav exp-zh/test_wavs/4.wav exp-zh/test_wavs/5.
wav exp-zh/test_wavs/6.wav                                                                                                                                     
2023-06-19 19:22:31,743 INFO [offline_transducer_asr.py:414] {'nn_model': 'exp-zh/exp/cpu_jit_epoch_4_avg_1_torch.1.11.0.pt', 'tokens': 'exp-zh/data/lang_char/
tokens.txt', 'sample_rate': 16000, 'feat_dim': 80, 'decoding_method': 'modified_beam_search', 'num_active_paths': 4, 'bpe_model': '', 'modeling_unit': 'char', 
'contexts': '文森特卡索/蒋友博/朱立楠/周望军', 'context_score': 2.0, 'max_contexts': 8, 'max_states': 64, 'allow_partial': True, 'LG': '', 'ngram_lm_scale': 0.
01, 'beam': 4, 'use_gpu': False, 'num_threads': 1, 'sound_files': ['exp-zh/test_wavs/3.wav', 'exp-zh/test_wavs/4.wav', 'exp-zh/test_wavs/5.wav', 'exp-zh/test_w
avs/6.wav']}                                                                                                                                                   
[I] /star-kw/kangwei/code/sherpa/sherpa/cpp_api/offline-recognizer-transducer-impl.h:137:void sherpa::OfflineRecognizerTransducerImpl::WarmUp() 2023-06-19 19:2
2:34.042 WarmUp begins
[I] /star-kw/kangwei/code/sherpa/sherpa/cpp_api/offline-recognizer-transducer-impl.h:150:void sherpa::OfflineRecognizerTransducerImpl::WarmUp() 2023-06-19 19:2
2:34.347 WarmUp ended
Contexts list: ['文森特卡索', '蒋友博', '朱立楠', '周望军']
exp-zh/test_wavs/3.wav
{"text":"文森特卡索是全球知名的法国性格派演员","timestamps":"[0.00,0.16,0.80,1.32,1.76,2.04,2.64,2.88,3.28,3.56,3.92,4.44,4.68,5.16,6.00,6.32,6.96,7.20]","toke
ns":["文","森","特","卡","索","是","全","球","知","名","的","法","国","性","格","派","演","员"]}
exp-zh/test_wavs/4.wav
{"text":"蒋友博被拍到带着女儿出游","timestamps":"[0.00,0.20,0.96,1.52,1.84,2.16,2.40,2.68,2.92,3.12,3.44,3.72]","tokens":["蒋","友","博","被","拍","到","带","
着","女","儿","出","游"]}
exp-zh/test_wavs/5.wav
{"text":"周望军就落实控股价了","timestamps":"[0.00,0.12,0.88,1.24,1.60,1.92,2.84,3.16,3.32,3.56]","tokens":["周","望","军","就","落","实","控","股","价","了"]}
exp-zh/test_wavs/6.wav
{"text":"朱立楠在上市见面会上表示","timestamps":"[0.00,0.12,0.80,1.20,1.52,1.76,2.00,2.16,2.32,2.60,2.80,3.04]","tokens":["朱","立","楠","在","上","市","见","
面","会","上","表","示"]}

Decode without contexts phrases

python sherpa/bin/offline_transducer_asr.py --tokens exp-zh/data/lang_char/tokens.txt --modeling-unit char --nn-model exp-zh/exp/cpu_jit_epoch_4_avg_1_torch.
1.11.0.pt --decoding-method modified_beam_search exp-zh/test_wavs/3.wav exp-zh/test_wavs/4.wav exp-zh/test_wavs/5.wav exp-zh/test_wavs/6.wav
2023-06-19 19:22:05,053 INFO [offline_transducer_asr.py:414] {'nn_model': 'exp-zh/exp/cpu_jit_epoch_4_avg_1_torch.1.11.0.pt', 'tokens': 'exp-zh/data/lang_char/
tokens.txt', 'sample_rate': 16000, 'feat_dim': 80, 'decoding_method': 'modified_beam_search', 'num_active_paths': 4, 'bpe_model': '', 'modeling_unit': 'char',
'contexts': '', 'context_score': 1.5, 'max_contexts': 8, 'max_states': 64, 'allow_partial': True, 'LG': '', 'ngram_lm_scale': 0.01, 'beam': 4, 'use_gpu': False
, 'num_threads': 1, 'sound_files': ['exp-zh/test_wavs/3.wav', 'exp-zh/test_wavs/4.wav', 'exp-zh/test_wavs/5.wav', 'exp-zh/test_wavs/6.wav']}
[I] /star-kw/kangwei/code/sherpa/sherpa/cpp_api/offline-recognizer-transducer-impl.h:137:void sherpa::OfflineRecognizerTransducerImpl::WarmUp() 2023-06-19 19:2
2:07.751 WarmUp begins
[I] /star-kw/kangwei/code/sherpa/sherpa/cpp_api/offline-recognizer-transducer-impl.h:150:void sherpa::OfflineRecognizerTransducerImpl::WarmUp() 2023-06-19 19:2
2:08.162 WarmUp ended
exp-zh/test_wavs/3.wav
{"text":"文森特卡所是全球知名的法国性格派演员","timestamps":"[0.00,0.16,1.04,1.32,1.76,2.04,2.64,2.88,3.28,3.56,3.92,4.44,4.68,5.16,6.00,6.32,6.96,7.20]","toke
ns":["文","森","特","卡","所","是","全","球","知","名","的","法","国","性","格","派","演","员"]}
exp-zh/test_wavs/4.wav
{"text":"蒋永伯被拍到带着女儿出游","timestamps":"[0.00,0.20,0.96,1.52,1.88,2.16,2.40,2.68,2.92,3.12,3.44,3.72]","tokens":["蒋","永","伯","被","拍","到","带","
着","女","儿","出","游"]}
exp-zh/test_wavs/5.wav
{"text":"周望君就落实控股价了","timestamps":"[0.00,0.16,0.88,1.28,1.64,1.92,2.84,3.16,3.32,3.56]","tokens":["周","望","君","就","落","实","控","股","价","了"]}
exp-zh/test_wavs/6.wav
{"text":"朱立南在上市见面会上表示","timestamps":"[0.00,0.16,0.84,1.24,1.52,1.76,2.00,2.16,2.32,2.60,2.80,3.04]","tokens":["朱","立","南","在","上","市","见","
面","会","上","表","示"]}
pkufool commented 1 year ago

Contextual-biasing results for streaming transducer model

Decode without context phrases

python sherpa/bin/online_transducer_asr.py --tokens exp-zh-zip/data/lang_char/tokens.txt --modeling-unit char --nn-model exp-zh-zip/zipformer/exp_L_causal_co
ntext_2/jit_script_chunk_16_left_128.pt --decoding-method modified_beam_search exp-zh/test_wavs/3.wav exp-zh/test_wavs/4.wav exp-zh/test_wavs/5.wav exp-zh/test
_wavs/6.wav
2023-06-20 16:59:57,945 INFO [online_transducer_asr.py:408] {'nn_model': 'exp-zh-zip/zipformer/exp_L_causal_context_2/jit_script_chunk_16_left_128.pt', 'tokens
': 'exp-zh-zip/data/lang_char/tokens.txt', 'sample_rate': 16000, 'feat_dim': 80, 'decoding_method': 'modified_beam_search', 'num_active_paths': 4, 'bpe_model':
 '', 'modeling_unit': 'char', 'contexts': '', 'context_score': 1.5, 'max_contexts': 8, 'max_states': 64, 'allow_partial': True, 'LG': '', 'ngram_lm_scale': 0.0
1, 'beam': 4, 'use_gpu': False, 'num_threads': 1, 'sound_files': ['exp-zh/test_wavs/3.wav', 'exp-zh/test_wavs/4.wav', 'exp-zh/test_wavs/5.wav', 'exp-zh/test_wa
vs/6.wav']}
[I] /star-kw/kangwei/code/sherpa/sherpa/cpp_api/online-recognizer.cc:477:void sherpa::OnlineRecognizer::OnlineRecognizerImpl::WarmUp() 2023-06-20 17:00:00.236
WarmUp begins
[W BinaryOps.cpp:601] Warning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' f
unction NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (function o
perator())
[I] /star-kw/kangwei/code/sherpa/sherpa/cpp_api/online-recognizer.cc:500:void sherpa::OnlineRecognizer::OnlineRecognizerImpl::WarmUp() 2023-06-20 17:00:00.469
WarmUp ended
----------
exp-zh/test_wavs/3.wav
吴文森特考阿所是全球知名的法国性格派演员
----------
exp-zh/test_wavs/4.wav
蒋有伯被拍到带着女儿出游
----------
exp-zh/test_wavs/5.wav
周望君就落实控股价
----------
exp-zh/test_wavs/6.wav
朱莉楠在上市见面会上表示

Decode with contexts phrases

python sherpa/bin/online_transducer_asr.py --tokens exp-zh-zip/data/lang_char/tokens.txt --modeling-unit char --nn-model exp-zh-zip/zipformer/exp_L_causal_c$ntext_2/jit_script_chunk_16_left_128.pt --context-score 2 --contexts $'\346\226\207\346\243\256\347\211\271\345\215\241\347\264\242/\350\222\213\345\217\213\3$5\215\232/\346\234\261\347\253\213\346\245\240/\345\221\250\346\234\233\345\206\233' --decoding-method modified_beam_search exp-zh/test_wavs/3.wav exp-zh/test_
wavs/4.wav exp-zh/test_wavs/5.wav exp-zh/test_wavs/6.wav
2023-06-20 17:00:07,568 INFO [online_transducer_asr.py:408] {'nn_model': 'exp-zh-zip/zipformer/exp_L_causal_context_2/jit_script_chunk_16_left_128.pt', 'tokens
': 'exp-zh-zip/data/lang_char/tokens.txt', 'sample_rate': 16000, 'feat_dim': 80, 'decoding_method': 'modified_beam_search', 'num_active_paths': 4, 'bpe_model':
 '', 'modeling_unit': 'char', 'contexts': '文森特卡索/蒋友博/朱立楠/周望军', 'context_score': 2.0, 'max_contexts': 8, 'max_states': 64, 'allow_partial': True,
'LG': '', 'ngram_lm_scale': 0.01, 'beam': 4, 'use_gpu': False, 'num_threads': 1, 'sound_files': ['exp-zh/test_wavs/3.wav', 'exp-zh/test_wavs/4.wav', 'exp-zh/te
st_wavs/5.wav', 'exp-zh/test_wavs/6.wav']}
[I] /star-kw/kangwei/code/sherpa/sherpa/cpp_api/online-recognizer.cc:477:void sherpa::OnlineRecognizer::OnlineRecognizerImpl::WarmUp() 2023-06-20 17:00:10.455
WarmUp begins
[W BinaryOps.cpp:601] Warning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' f
unction NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (function o
perator())
[I] /star-kw/kangwei/code/sherpa/sherpa/cpp_api/online-recognizer.cc:500:void sherpa::OnlineRecognizer::OnlineRecognizerImpl::WarmUp() 2023-06-20 17:00:10.711
WarmUp ended
Contexts list: ['文森特卡索', '蒋友博', '朱立楠', '周望军']
----------
exp-zh/test_wavs/3.wav
吴文森特卡索是全球知名的法国性格派演员
----------
exp-zh/test_wavs/4.wav
蒋友博被拍到带着女儿出游
----------
exp-zh/test_wavs/5.wav
周望军就落实控股价
----------
exp-zh/test_wavs/6.wav
朱立楠在上市见面会上表示