Error when running triton server with whisper model

k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi

https://k2-fsa.github.io/sherpa

Apache License 2.0

474 stars 97 forks source link

Error when running triton server with whisper model #522

Open jackNhat opened 6 months ago

jackNhat commented 6 months ago

When i ran client.py, i got errror message: tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] in ensemble 'whisper', Failed to process the request(s) for model instance 'scorer_0', message: AssertionError: <EMPTY MESSAGE> How to fix? I ran triton server with whisper model verson large-v2

jwkyeongzz commented 5 months ago

I got same issue. but it work properly.

Error env: windows (ubuntu 20.04) worksation ( intel xeon gold 6246 / rtx 3090 )
success pc :: centox 7.9 server ( intel xeon gold 5218 / v100 )
Up to 7 channels can be operated simultaneously. ( v100 32G)

csukuangfj commented 5 months ago

@yuekaizhang

Could you have a look at this issue?

yuekaizhang commented 5 months ago

I got same issue. but it work properly.

Error env: windows (ubuntu 20.04) worksation ( intel xeon gold 6246 / rtx 3090 )

success pc :: centox 7.9 server ( intel xeon gold 5218 / v100 )

Up to 7 channels can be operated simultaneously. ( v100 32G)

@jwkyeongzz You mean using V100 is good. The issue only happened with RTX3090 GPU ?

yuekaizhang commented 5 months ago

When i ran client.py, i got errror message: tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] in ensemble 'whisper', Failed to process the request(s) for model instance 'scorer_0', message: AssertionError: <EMPTY MESSAGE> How to fix? I ran triton server with whisper model verson large-v2

@jackNhat May I ask what's your GPU's name? Also, would you mind attaching more details? e.g. how to reproduce the error.

jwkyeongzz commented 5 months ago

I got same issue. but it work properly.

Error env: windows (ubuntu 20.04) worksation ( intel xeon gold 6246 / rtx 3090 )

success pc :: centox 7.9 server ( intel xeon gold 5218 / v100 )

Up to 7 channels can be operated simultaneously. ( v100 32G)

@jwkyeongzz You mean using V100 is good. The issue only happened with RTX3090 GPU ?

I thought the test environment might be the problem. At first, since the error environment was in Windows' virtual Ubuntu 20.04, it was assumed that there was a problem with cuda memory allocation. In addition, it seems that it may have occurred due to insufficient memory of the RTX 3090. Therefore, it seems that the rtx3090 is not necessarily the problem.