Open StephennFernandes opened 2 months ago
@yuekaizhang Could you have a look?
The engine plan file is not compatible with this version of TensorRT, expecting library version 9.2.0.5 got 9.3.0.1, please rebuild.
@StephennFernandes Seems you build engines and run engines in different envs. Would you mind building and runnning in the same docker container e.g. soar97/triton-whisper:24.01.complete?
@yuekaizhang i got it working thanks a ton for your assistance. also. noticed that we cannot do inference for longer audio files. beyond 30s
@yuekaizhang i got it working thanks a ton for your assistance. also. noticed that we cannot do inference for longer audio files. beyond 30s
@StephennFernandes Since whisper could only process audios smaller than 30s, you need to implement a VAD segmenter like this project https://github.com/shashikg/WhisperS2T/tree/main. Welcome to contribute :D
@yuekaizhang thanks for the heads up. already on it.
@yuekaizhang hey not like this error done anything bad for the deployment. as far as i have seen my triton deployment works fine. but i have this weird error log that pops up when i deploy my model. do you seem to know what it means ?
I0412 08:38:16.910492 1416 pinned_memory_manager.cc:275] Pinned memory pool is created at '0x76d5da000000' with size 2048000000
I0412 08:38:16.911524 1416 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 4096000000
I0412 08:38:16.915451 1416 model_lifecycle.cc:469] loading: whisper:1
[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024040200
free(): invalid pointer
[user-DSA7TGX-424R:01427] *** Process received signal ***
[user-DSA7TGX-424R:01427] Signal: Aborted (6)
[user-DSA7TGX-424R:01427] Signal code: (-6)
[user-DSA7TGX-424R:01427] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7b8114a16520]
[user-DSA7TGX-424R:01427] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7b8114a6a9fc]
[user-DSA7TGX-424R:01427] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7b8114a16476]
[user-DSA7TGX-424R:01427] [ 3] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7b81149fc7f3]
[user-DSA7TGX-424R:01427] [ 4] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x89676)[0x7b8114a5d676]
[user-DSA7TGX-424R:01427] [ 5] /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa0cfc)[0x7b8114a74cfc]
[user-DSA7TGX-424R:01427] [ 6] /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa2a44)[0x7b8114a76a44]
[user-DSA7TGX-424R:01427] [ 7] /usr/lib/x86_64-linux-gnu/libc.so.6(free+0x73)[0x7b8114a79453]
[user-DSA7TGX-424R:01427] [ 8] /opt/tritonserver/backends/python/triton_python_backend_stub(+0x70064)[0x64e9f6599064]
[user-DSA7TGX-424R:01427] [ 9] /opt/tritonserver/backends/python/triton_python_backend_stub(+0x25e13)[0x64e9f654ee13]
[user-DSA7TGX-424R:01427] [10] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7b81149fdd90]
[user-DSA7TGX-424R:01427] [11] /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7b81149fde40]
[user-DSA7TGX-424R:01427] [12] /opt/tritonserver/backends/python/triton_python_backend_stub(+0x26b75)[0x64e9f654fb75]
[user-DSA7TGX-424R:01427] *** End of error message ***
I0412 08:38:22.776815 1416 python_be.cc:2381] TRITONBACKEND_ModelInstanceInitialize: whisper_0_0 (CPU device 0)
[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024040200
I0412 08:38:30.125002 1416 model_lifecycle.cc:835] successfully loaded 'whisper'
I0412 08:38:30.125259 1416 server.cc:607]
Hi there, I have been finetuning whisper models using huggingface. Further to convert the model to TensorRT_LLM format, i use a HF script that converts the models from its HF format to the original OpenAI format. i then follow your instructions and convert the OAI model to TensorRT_LLM format. which happens successfully.
however when i follow the further steps on launching the Triton inference server using the
launch_server.sh
script. i get the following error:the following is the stack trace of the entire log post launching the bash script.