modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
5.97k stars 647 forks source link

使用client请求triton服务器失败 #704

Closed standyyyy closed 1 year ago

standyyyy commented 1 year ago

使用client请求triton失败,以下是一些简要配置情况。 OS: [e.g. linux] linux Python/C++ Version:client环境python3.9 Package Version:pytorch、torchaudio、modelscope、funasr version(pip list) Model:infer_pipeline Command:python3 client/decode_manifest_triton.py \ --server-addr $serveraddr \ --compute-cer \ --model-name infer_pipeline \ --num-tasks $num_task \ --manifest-filename $manifest_path Details:无 ,按照文档操作,文档地址是https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/triton_gpu Error log:tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] in ensemble 'infer_pipeline', Failed to process the request(s) for model instance 'scoring_0_0', message: Failed to open the cudaIpcHandle. error: invalid resource handle

服务端这边按照文档上的命令 ,已经搭建了,在容器中也映射了端口出来 。 image

自己本机用的是windows ,triton是docker启的。下面是本机的cuda环境 image image

LauraGPT commented 1 year ago

@yuekaizhang Please help to solve this issue.

yuekaizhang commented 1 year ago

Failed to open the cudaIpcHandle. error: invalid resource handle

tritonserver --model-repository /workspace/model_repo_paraformer_large_offline \ --pinned-memory-pool-byte-size=512000000.

Please remove cuda pool size and try again.

https://github.com/triton-inference-server/server/issues/5798

standyyyy commented 1 year ago

尝试了,还是会报错,和上面报错一样。

standyyyy commented 1 year ago

不知道是不是因为是windows发起的请求,看到下面说CUDA shared memory is not supported on Windows:

standyyyy commented 1 year ago

测试了,用企业级别显卡测试过,是通的。