Closed zhaoyinjiang9825 closed 9 months ago
https://github.com/Slyne/ctc_decoder#adding-language-model Please refer to this.
ok,Let me try!
Hello, in X86_GPU deployment, I added lm in convert_start_server.sh as follows: onnx_model_dir=/ws/onnx_model model_repo=/ws/model_repo lm_path=/workspace/ctc_decoder/swig/kenlm/lm/test.arpa
python3 scripts/convert.py --config=$onnx_model_dir/train.yaml --vocab=$onnx_model_dir/words.txt \ --model_repo=$model_repo --onnx_model_dir=$onnx_model_dir --lm_path=$lm_path tritonserver --model-repository=/ws/model_repo --pinned-memory-pool-byte-size=1024000000 --cuda-memory-pool-byte-size=0:1024000000
Then, start the server and client, and this error occurred in the identification, can you take a look?
server:
I0727 05:46:31.098252 152 grpc_server.cc:4421] Started GRPCInferenceService at 0.0.0.0:8001 I0727 05:46:31.098804 152 http_server.cc:3113] Started HTTPService at 0.0.0.0:8000 I0727 05:46:31.140517 152 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002 E0727 05:46:43.709127 152 python.cc:1968] Stub process is unhealthy and it will be restarted. Loading the LM will be faster if you build a binary file. Reading /workspace/ctc_decoder/swig/kenlm/lm/test.arpa ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
Successfully load language model! Initialized Rescoring! E0727 05:46:59.511372 152 python.cc:1968] Stub process is unhealthy and it will be restarted. Loading the LM will be faster if you build a binary file. Reading /workspace/ctc_decoder/swig/kenlm/lm/test.arpa ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
Successfully load language model! Initialized Rescoring!
client: root@localhost:/ws/client/test# python3 client.py --audio_file=/ws/test_data/5.wav --url=localhost:8001 Exception in thread Thread-3: Traceback (most recent call last): File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python3.8/multiprocessing/pool.py", line 576, in _handle_results task = get() File "/usr/lib/python3.8/multiprocessing/connection.py", line 251, in recv return _ForkingPickler.loads(buf.getbuffer()) TypeError: init() missing 1 required positional argument: 'msg'
@yuekaizhang
Hello, in X86_GPU deployment, I added lm in convert_start_server.sh as follows: onnx_model_dir=/ws/onnx_model model_repo=/ws/model_repo lm_path=/workspace/ctc_decoder/swig/kenlm/lm/test.arpa
Convert config.pbtxt in model_repo and move models
python3 scripts/convert.py --config=$onnx_model_dir/train.yaml --vocab=$onnx_model_dir/words.txt --model_repo=$model_repo --onnx_model_dir=$onnx_model_dir --lm_path=$lm_path tritonserver --model-repository=/ws/model_repo --pinned-memory-pool-byte-size=1024000000 --cuda-memory-pool-byte-size=0:1024000000
Then, start the server and client, and this error occurred in the identification, can you take a look?
server:
I0727 05:46:31.098252 152 grpc_server.cc:4421] Started GRPCInferenceService at 0.0.0.0:8001 I0727 05:46:31.098804 152 http_server.cc:3113] Started HTTPService at 0.0.0.0:8000 I0727 05:46:31.140517 152 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002 E0727 05:46:43.709127 152 python.cc:1968] Stub process is unhealthy and it will be restarted. Loading the LM will be faster if you build a binary file. Reading /workspace/ctc_decoder/swig/kenlm/lm/test.arpa ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
Successfully load language model! Initialized Rescoring! E0727 05:46:59.511372 152 python.cc:1968] Stub process is unhealthy and it will be restarted. Loading the LM will be faster if you build a binary file. Reading /workspace/ctc_decoder/swig/kenlm/lm/test.arpa ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
Successfully load language model! Initialized Rescoring!
client: root@localhost:/ws/client/test# python3 client.py --audio_file=/ws/test_data/5.wav --url=localhost:8001 Exception in thread Thread-3: Traceback (most recent call last): File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/lib/python3.8/threading.py", line 870, in run self._target(*self._args, self._kwargs) File "/usr/lib/python3.8/multiprocessing/pool.py", line 576, in _handle_results task = get() File "/usr/lib/python3.8/multiprocessing/connection.py", line 251, in recv return _ForkingPickler.loads(buf.getbuffer()) TypeError: init**() missing 1 required positional argument: 'msg'
@yuekaizhang
This error looks like client can't establish connection with the server. Could you first verify it works without LM models? When you start server and client container, make sure you have done the port mapping e.g. --net host or -p 8001:8001.
Or you could do test in the same container. Install below in your server container:
apt-get install -y libsndfile1
pip3 install soundfile grpcio-tools tritonclient
triton works well without the LM model and successfully returns the recognition result, but the above error will be reported after adding the language model! @yuekaizhang
triton works well without the LM model and successfully returns the recognition result, but the above error will be reported after adding the language model! @yuekaizhang
Could you try to print information to debug? For example, scoring/model.py, before calling ctc_prefix_beam_search, you should get same encoder_out results with or without LM. If you could narrow down the error is related to ctc_decode, maybe you could modify this script https://github.com/Slyne/ctc_decoder/blob/master/swig/test/test_zh.py to see if it is related to your arpa file.
Hi @yuekaizhang , i have tried to use add LM with triton decoder, it loads successfully but got no o/p or weird o/p.
I have tried 2 types of LM:
Can you please help me with this how to use lm with triton decoder? thanks in advance.
Hi @yuekaizhang , i have tried to use add LM with triton decoder, it loads successfully but got no o/p or weird o/p.
I have tried 2 types of LM:
- tokenized LM: converted text corpus to token sequence and trained ngram model with kenlm tool, used training units.txt with LM. it decoded weird o/p.
- word LM: traditional word based ngram model with training units.txt and words.txt as well. it decoded no o/p.
Can you please help me with this how to use lm with triton decoder? thanks in advance.
Hi, could you offer more details about the weird o/p?
For English model, if you are using bpe, it should works the same way as char modeling unit. Please don't set the space_id. You may would like try test it with space_id =-1 https://github.com/Slyne/ctc_decoder/blob/master/swig/test/test_zh.py#L50
Hi, Sharing Results from both LM. I have trained hindi model, and tried space_id =-1 as well but didn't work.
without LM - working well Took:1.472s Get response from 1th chunk: Took:0.023s Get response from 2th chunk: ▁सहयोग Took:0.054s Get response from 3th chunk: ▁सहयोग▁बीमार▁अमृत Took:0.039s Get response from 4th chunk: ▁सहयोग▁बीमार▁अमृत▁को▁नहीं Took:0.023s Get response from 5th chunk: ▁सहयोग▁बीमार▁अमृत▁को▁नहीं▁छोड▁पाता Took:0.024s Get response from 6th chunk: ▁सहयोग▁बीमार▁अमृत▁को▁नहीं▁छोड▁पाता▁जबकि Took:0.022s Get response from 7th chunk: ▁सहयोग▁बीमार▁अमृत▁को▁नहीं▁छोड▁पाता▁जबकि▁डॉक्टर▁उसे Took:0.101s Get response from 8th chunk: ▁सहयोग▁बीमार▁अमृत▁को▁नहीं▁छोड▁पाता▁जबकि▁डॉक्टर▁उसे▁करुणा Took:0.020s Get response from 9th chunk: ▁सहयोग▁बीमार▁अमृत▁को▁नहीं▁छोड▁पाता▁जबकि▁डॉक्टर▁उसे▁करुणा▁का▁भय▁भी Took:0.069s Get response from 10th chunk: ▁सहयोग▁बीमार▁अमृत▁को▁नहीं▁छोड▁पाता▁जबकि▁डॉक्टर▁उसे▁करुणा▁का▁भय▁भी▁दिखाता▁है
with tokenized LM - weird o/p Took:1.492s Get response from 1th chunk: Took:0.024s Get response from 2th chunk: य Took:0.024s Get response from 3th chunk: यमू Took:0.024s Get response from 4th chunk: यमू Took:0.023s Get response from 5th chunk: यमूचप Took:0.024s Get response from 6th chunk: यमूचप Took:0.023s Get response from 7th chunk: यमूचपाे Took:0.025s Get response from 8th chunk: यमूचपाे Took:0.020s Get response from 9th chunk: यमूचप Took:0.035s Get response from 10th chunk: यमूचपतखए
With word LM - No o/p Took:0.255s Get response from 1th chunk: Took:0.025s Get response from 2th chunk: Took:0.028s Get response from 3th chunk: Took:0.032s Get response from 4th chunk: Took:0.067s Get response from 5th chunk: Took:0.043s Get response from 6th chunk: Took:0.025s Get response from 7th chunk: Took:0.028s Get response from 8th chunk: Took:0.027s Get response from 9th chunk: Took:0.035s Get response from 10th chunk:
Hi @yuekaizhang, default value of space_id was -1 only if space is not on the symbol table. https://github.com/wenet-e2e/wenet/blob/main/runtime/gpu/model_repo_stateful/wenet/1/wenet_onnx_model.py#L34
my words.txt file format:
<blank> 0
<unk> 1
अ 2
अँकल 3
अँकी 4
अँकुर 5
अँकुराने 6
.
.
.
ॠषिकेष 164750
ॲँधेरे 164751
ॲडीटीवज 164752
<sos/eos> 164753
LM format:
\data\
ngram 1=164755
ngram 2=2619689
ngram 3=7567245
\1-grams:
-6.4466453 <unk> 0
0 <s> -1.7118986
-1.8495009 </s> 0
-6.3155007 वसाओँ -0.1331459
-2.835205 तथा -0.38555816
-5.2239923 तेलों -0.33262685
-1.898256 के -0.9816179
-6.3155007 हाइड्रोजनीकरण -0.1331459
-3.72396 हेतु -0.31991008
.
.
.
-2.7174118 <s> गोदरेज प्रॉपर्टीज़
-0.5138279 स्पीकर्ज़ ऐंड ट्वीटर्स
-1.322996 बनवारी लाल कंछल
-0.20808947 डाकघर चौराहा धोबीघाट
-0.20948076 शिवकुटी में अपट्रान
\end\
please let me know if i need to make any changes in words.txt or LM.
What modeling unit you are using? If your are using words, please set the space id accroding to your vocab. If your are using bpe or char, space_id sets to -1 is good. Also, ngram should choose the same modeling unit with the asr model.
@yuekaizhang, I have used bpe units while training. I have tried using same bpe units with word LM and bpe tokenized LM with space_id=-1 but no luck, later i have tried word units the same as words.txt with TLG.fst.
I have a question on using word base LM, as units are in bpe format and LM has words. how bpe indexes will match with LM words indexes. please help me to get clarity on this as well.
Hi @yuekaizhang, can you please suggest how to fix it?
This issue has been automatically closed due to inactivity.
I see "Add language model: set --lm_path in the convert_start_server.sh. Notice the path of your language model is the path in docker." in the x86_GPU guide doc。 please how can i do it!