evanxqs commented 1 month ago

There's a problem that in the latest code from tensorRT-LLM in triton of sherpa，lack of build.py script to do large-v3/v2 model building. But in the older has no such issue. Is there anyone can help to solve it ?

1.Commit as below:

2.ReadMe in project https://github.com/k2-fsa/sherpa/tree/master/triton/whisper

We already have a clone of TensorRT-LLM inside container, so no need to clone it.

cd /workspace/TensorRT-LLM/examples/whisper

take large-v3 model as an example

wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt

Build the large-v3 model using a single GPU with plugins.

python3 build.py --output_dir whisper_large_v3 --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin --enable_context_fmha

csukuangfj commented 1 month ago

@yuekaizhang Could you have a look?

yuekaizhang commented 1 month ago

@evanxqs Thanks, I would update with latest trtllm-build code this day.

k2-fsa / sherpa

There's no build.py in latest triton whisper example folder #600

We already have a clone of TensorRT-LLM inside container, so no need to clone it.

take large-v3 model as an example

Build the large-v3 model using a single GPU with plugins.