TRT-LLM handles Whisper with two different Module subclasses: WhisperEncoder and DecoderModel. We save their engines two subfolders encoder/engines and decoder/engines, hence some refactors in hub.py and runtime.py to accomodate multiple-engines models
Can be tested with TP=1, PP=1 with the example in
examples/automatic-speech-recognition/
:There remains warning logs from TRT-LLM as
since
dtype
is sometimes not passed in TRT-LLM codebase: https://github.com/NVIDIA/TensorRT-LLM/blob/66ca3378c61efa3154ed34a48cfc362351405eef/tensorrt_llm/models/enc_dec/model.py#L1644-L1645TRT-LLM handles Whisper with two different
Module
subclasses:WhisperEncoder
andDecoderModel
. We save their engines two subfoldersencoder/engines
anddecoder/engines
, hence some refactors in hub.py and runtime.py to accomodate multiple-engines models