Closed evanxqs closed 1 month ago
There's a problem that in the latest code from tensorRT-LLM in triton of sherpa,lack of build.py script to do large-v3/v2 model building. But in the older has no such issue. Is there anyone can help to solve it ?
1.Commit as below:
2.ReadMe in project https://github.com/k2-fsa/sherpa/tree/master/triton/whisper
cd /workspace/TensorRT-LLM/examples/whisper
wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt
python3 build.py --output_dir whisper_large_v3 --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin --enable_context_fmha
@yuekaizhang Could you have a look?
@evanxqs Thanks, I would update with latest trtllm-build code this day.
There's a problem that in the latest code from tensorRT-LLM in triton of sherpa,lack of build.py script to do large-v3/v2 model building. But in the older has no such issue. Is there anyone can help to solve it ?
1.Commit as below:
2.ReadMe in project https://github.com/k2-fsa/sherpa/tree/master/triton/whisper
We already have a clone of TensorRT-LLM inside container, so no need to clone it.
cd /workspace/TensorRT-LLM/examples/whisper
take large-v3 model as an example
wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt
Build the large-v3 model using a single GPU with plugins.
python3 build.py --output_dir whisper_large_v3 --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin --enable_context_fmha