k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
483 stars 97 forks source link

Support byte level bpe (bbpe) models. #462

Closed pkufool closed 10 months ago

pkufool commented 10 months ago

This PR support transducer models trained with byte level bpe units, see https://github.com/k2-fsa/icefall/pull/986 and https://github.com/k2-fsa/icefall/pull/1033 . Here is the decoding log, when you are using the models trained with byte level bpe, you just need to provide --use-bbpe=True then everything works as it is, note: only support the models trained with icefall, because the mapping table is hard-coded in the source code. Enjoy!

The logs belows uses this model https://huggingface.co/pkufool/icefall_asr_aishell_pruned_transducer_stateless7_bbpe

python sherpa/bin/offline_transducer_asr.py --nn-model exp-zh-bbpe/exp/cpu_jit.pt --use-bbpe true --use-gpu true --tokens exp-zh-bbpe/data/lang_bbpe_500/tokens.txt --decoding-method greedy_search  exp-zh-bbpe/test_waves/BAC0
09S0764W0122.wav exp-zh-bbpe/test_waves/BAC009S0764W0121.wav                                                                                                   
2023-08-30 12:29:00,392 INFO [offline_transducer_asr.py:437] {'nn_model': 'exp-zh-bbpe/exp/cpu_jit.pt', 'tokens': 'exp-zh-bbpe/data/lang_bbpe_500/tokens.txt', 
'sample_rate': 16000, 'feat_dim': 80, 'use_bbpe': True, 'decoding_method': 'greedy_search', 'num_active_paths': 4, 'bpe_model': '', 'modeling_unit': 'char', 'c
ontexts': '', 'context_score': 1.5, 'temperature': 1.0, 'max_contexts': 8, 'max_states': 64, 'allow_partial': True, 'LG': '', 'ngram_lm_scale': 0.01, 'beam': 4
, 'use_gpu': True, 'num_threads': 1, 'sound_files': ['exp-zh-bbpe/test_waves/BAC009S0764W0122.wav', 'exp-zh-bbpe/test_waves/BAC009S0764W0121.wav']}            
[I] /star-kw/kangwei/code/sherpa/sherpa/cpp_api/offline-recognizer-transducer-impl.h:151:void sherpa::OfflineRecognizerTransducerImpl::WarmUp() 2023-08-30 12:2
9:09.846 WarmUp begins                                                                                                                                         
[I] /star-kw/kangwei/code/sherpa/sherpa/cpp_api/offline-recognizer-transducer-impl.h:164:void sherpa::OfflineRecognizerTransducerImpl::WarmUp() 2023-08-30 12:2
9:10.009 WarmUp ended                                                                                                                                          
exp-zh-bbpe/test_waves/BAC009S0764W0122.wav                                                                                                                    
{"text":" 一 二 线 城 市 虽 然 也 处 于 调 整 中","timestamps":"[0.00,0.52,0.80,1.12,1.16,1.60,1.64,1.76,1.88,2.12,2.32,2.52,2.76,3.00,3.32]","tokens":[" ƋŞġ",
" ƋŠĭ"," ƎŠť"," ƌńį"," ƌŞģ"," Ə","ļ","ţ"," ƎĥŜ"," Ƌşń"," ƌŊĥ"," ƋŠį"," ƏŖĤ"," ƍĸŚ"," ƋŞœ"]}                                                                    
exp-zh-bbpe/test_waves/BAC009S0764W0121.wav                                    
{"text":" 甚 至 出 现 交 易 几 乎 停 滞 的 情 况","timestamps":"[0.00,0.04,0.32,0.52,0.88,1.12,1.44,1.64,1.84,2.00,2.04,2.12,2.20,2.36,2.44,2.48,2.64,2.68,2.76,2.88,3.08,3.36,3.52,3.56]","tokens":[" Ǝ","ķ","Ľ"," ƏĨř"," ƌĨŠ"," ƎįŖ"," ƋŠŊ"," ƍĻĶ"," ƌ","Ĩ","Ņ"," Ƌş","į"," ƌ","Ģ","Ł"," ƍ","š","Ń"," ƎĽĥ"," ƍĤĦ"," ƌ","ħ","
ś"]}
pkufool commented 10 months ago

@csukuangfj Too many black errors, I think they don't relate to this PR, what should I do, reformat them?

csukuangfj commented 10 months ago

@csukuangfj Too many black errors, I think they don't relate to this PR, what should I do, reformat them?

I think you can ignore the python style issues first.