wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit
https://wenet-e2e.github.io/wenet/
Apache License 2.0
4.07k stars 1.06k forks source link

streaming triton error #1303

Closed murphypei closed 2 years ago

murphypei commented 2 years ago

Describe the bug Decoder onnx model output tensor is empty which cause dl_pack error. @Slyne

image

To Reproduce Just run streaming triton server as runtime/GPU README, no error occured.

client test: python3 client.py --audio_file=test.wav --url=localhost:8001 --model_name=streaming_wenet --streaming

Expected behavior Correct output

Screenshots image

image

Desktop (please complete the following information):

Slyne commented 2 years ago

Are all the test audios causing the same error ?

Could you help check the server side and show the log/screen output on server side? Not sure if decoder has been set up correctly.

Please also provide a test audio file if that is possible.

murphypei commented 2 years ago

Are all the test audios causing the same error ?

Could you help check the server side and show the log/screen output on server side? Not sure if decoder has been set up correctly.

Please also provide a test audio file if that is possible.

  1. yes
  2. emmm, I also think it's quite possible, so may you have a look of my server log, thanks. https://drive.google.com/file/d/1_arhDIkYAM3ZjND0ClhWuFRjWjMZFh9n/view?usp=sharing
yuekaizhang commented 2 years ago

Are all the test audios causing the same error ? Could you help check the server side and show the log/screen output on server side? Not sure if decoder has been set up correctly. Please also provide a test audio file if that is possible.

  1. yes
  2. emmm, I also think it's quite possible, so may you have a look of my server log, thanks. https://drive.google.com/file/d/1_arhDIkYAM3ZjND0ClhWuFRjWjMZFh9n/view?usp=sharing

Could you try to trace back the error as early as possible? Maybe you could print the inputs of self.batch_rescoring call, see if the score_hyps is empty or not.

murphypei commented 2 years ago

Are all the test audios causing the same error ? Could you help check the server side and show the log/screen output on server side? Not sure if decoder has been set up correctly. Please also provide a test audio file if that is possible.

  1. yes
  2. emmm, I also think it's quite possible, so may you have a look of my server log, thanks. https://drive.google.com/file/d/1_arhDIkYAM3ZjND0ClhWuFRjWjMZFh9n/view?usp=sharing

Could you try to trace back the error as early as possible? Maybe you could print the inputs of self.batch_rescoring call, see if the score_hyps is empty or not.

thanks for you reply, score_hyps has data:

image

print code

print("=> rescore_hyps:\n", rescore_hyps)
print("=> rescore_encoder_hist:\n", rescore_encoder_hist)
print("=> rescore_encoder_lens:\n", rescore_encoder_lens)
print("=> max_length:\n", max_length)
best_index = self.batch_rescoring(rescore_hyps, rescore_encoder_hist,
                                  rescore_encoder_lens, max_length)
yuekaizhang commented 2 years ago

Are all the test audios causing the same error ? Could you help check the server side and show the log/screen output on server side? Not sure if decoder has been set up correctly. Please also provide a test audio file if that is possible.

  1. yes
  2. emmm, I also think it's quite possible, so may you have a look of my server log, thanks. https://drive.google.com/file/d/1_arhDIkYAM3ZjND0ClhWuFRjWjMZFh9n/view?usp=sharing

Could you try to trace back the error as early as possible? Maybe you could print the inputs of self.batch_rescoring call, see if the score_hyps is empty or not.

thanks for you reply, score_hyps has data:

image

print code

print("=> rescore_hyps:\n", rescore_hyps)
print("=> rescore_encoder_hist:\n", rescore_encoder_hist)
print("=> rescore_encoder_lens:\n", rescore_encoder_lens)
print("=> max_length:\n", max_length)
best_index = self.batch_rescoring(rescore_hyps, rescore_encoder_hist,
                                  rescore_encoder_lens, max_length)

Thanks. Could you please give me a minimal essentials to reproduce the issue? Maybe your dockerfile , model_repo configs (if you modified any of them), include one test audio would be better.

murphypei commented 2 years ago

Are all the test audios causing the same error ? Could you help check the server side and show the log/screen output on server side? Not sure if decoder has been set up correctly. Please also provide a test audio file if that is possible.

  1. yes
  2. emmm, I also think it's quite possible, so may you have a look of my server log, thanks. https://drive.google.com/file/d/1_arhDIkYAM3ZjND0ClhWuFRjWjMZFh9n/view?usp=sharing

Could you try to trace back the error as early as possible? Maybe you could print the inputs of self.batch_rescoring call, see if the score_hyps is empty or not.

thanks for you reply, score_hyps has data: image print code

print("=> rescore_hyps:\n", rescore_hyps)
print("=> rescore_encoder_hist:\n", rescore_encoder_hist)
print("=> rescore_encoder_lens:\n", rescore_encoder_lens)
print("=> max_length:\n", max_length)
best_index = self.batch_rescoring(rescore_hyps, rescore_encoder_hist,
                                  rescore_encoder_lens, max_length)

Thanks. Could you please give me a minimal essentials to reproduce the issue? Maybe your dockerfile , model_repo configs (if you modified any of them), include one test audio would be better.

OK, I will prepare these data today, thank you.

murphypei commented 2 years ago

@yuekaizhang I reproduce this problem by these steps:

  1. build and run the docker file(both server and client in one container)
    
    FROM nvcr.io/nvidia/tritonserver:22.03-py3
    LABEL maintainer="NVIDIA"
    LABEL repository="tritonserver"

fix public key error

RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb RUN dpkg -i cuda-keyring_1.0-1_all.deb

RUN apt-get update RUN apt-get install -fy cmake make swig RUN pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113 RUN pip3 install -v kaldifeat RUN pip3 install pyyaml onnx onnxruntime-gpu typeguard

for client

RUN apt-get install -y libsndfile1 RUN pip3 install soundfile grpcio-tools tritonclient

WORKDIR /workspace RUN git clone https://github.com/Slyne/ctc_decoder.git && cd ctc_decoder/swig && bash setup.sh COPY ./scripts scripts

run docker container:

```sh
docker run --name wenet-triton --gpus '"device=3"' -it --shm-size=1g --ulimit memlock=-1 -v <wenet-dir>:/ws/wenet -v <20210618_u2pp_conformer_exp dir>:/ws/model wenet-tritonserver:22.03-py3 /bin/bash

the 20210618_u2pp_conformer_exp was downloaded from wenet github repo.

  1. into the container
export model_dir=/ws/model
export PYTHONPATH=$PYTHONPATH:/ws/wenet

export onnx model

cd /ws/wenet/
python3 wenet/bin/export_onnx_gpu.py --config=$model_dir/train.yaml --checkpoint=$model_dir/final.pt --cmvn_file=$model_dir/global_cmvn --ctc_weight=0.3 --reverse_weight=0.3 --output_onnx_dir=$model_dir/onnx_gpu --streaming

convert config files

cd /ws/wenet/runtime/GPU
python3 scripts/convert.py --config=$model_dir/train.yaml --vocab=$model_dir/words.txt --model_repo=/ws/wenet/runtime/GPU/model_repo_stateful/ --onnx_model_dir=$model_dir/onnx_gpu

start server

cd /ws/wenet/runtime/GPU
tritonserver --model-repository=/ws/wenet/runtime/GPU/model_repo_stateful/ --pinned-memory-pool-byte-size=1024000000 --cuda-memory-pool-byte-size=0:1024000000

run client test in another terminal

cd /ws/wenet/runtime/GPU
python3 client.py --audio_file=chinese_test.wav --url=localhost:8001 --model_name=streaming_wenet --streaming

audio file: https://drive.google.com/file/d/1xsodix-CW9dDHZn0pFEnpVMSfyngFbh1/view?usp=sharing

If you encounter any problems during these operation, please contact me, thank you.

yuekaizhang commented 2 years ago

@yuekaizhang I reproduce this problem by these steps:

  1. build and run the docker file(both server and client in one container)
FROM nvcr.io/nvidia/tritonserver:22.03-py3
LABEL maintainer="NVIDIA"
LABEL repository="tritonserver"

# fix public key error
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
RUN dpkg -i cuda-keyring_1.0-1_all.deb

RUN apt-get update 
RUN apt-get install -fy cmake make swig
RUN pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
RUN pip3 install -v kaldifeat
RUN pip3 install pyyaml onnx onnxruntime-gpu typeguard 

# for client
RUN apt-get install -y libsndfile1
RUN pip3 install soundfile grpcio-tools tritonclient

WORKDIR /workspace
RUN git clone https://github.com/Slyne/ctc_decoder.git && cd ctc_decoder/swig && bash setup.sh
COPY ./scripts scripts

run docker container:

docker run --name wenet-triton --gpus '"device=3"' -it --shm-size=1g --ulimit memlock=-1 -v <wenet-dir>:/ws/wenet -v <20210618_u2pp_conformer_exp dir>:/ws/model wenet-tritonserver:22.03-py3 /bin/bash

the 20210618_u2pp_conformer_exp was downloaded from wenet github repo.

  1. into the container
export model_dir=/ws/model
export PYTHONPATH=$PYTHONPATH:/ws/wenet

export onnx model

cd /ws/wenet/
python3 wenet/bin/export_onnx_gpu.py --config=$model_dir/train.yaml --checkpoint=$model_dir/final.pt --cmvn_file=$model_dir/global_cmvn --ctc_weight=0.3 --reverse_weight=0.3 --output_onnx_dir=$model_dir/onnx_gpu --streaming

convert config files

cd /ws/wenet/runtime/GPU
python3 scripts/convert.py --config=$model_dir/train.yaml --vocab=$model_dir/words.txt --model_repo=/ws/wenet/runtime/GPU/model_repo_stateful/ --onnx_model_dir=$model_dir/onnx_gpu

start server

cd /ws/wenet/runtime/GPU
tritonserver --model-repository=/ws/wenet/runtime/GPU/model_repo_stateful/ --pinned-memory-pool-byte-size=1024000000 --cuda-memory-pool-byte-size=0:1024000000

run client test in another terminal

cd /ws/wenet/runtime/GPU
python3 client.py --audio_file=chinese_test.wav --url=localhost:8001 --model_name=streaming_wenet --streaming

audio file: https://drive.google.com/file/d/1xsodix-CW9dDHZn0pFEnpVMSfyngFbh1/view?usp=sharing

If you encounter any problems during these operation, please contact me, thank you.

Reproduced it, thx. I will debug it tomorrow. I recommend you use --fp16 when export to onnx , which works for me.

yuekaizhang commented 2 years ago

It should work now. #1314