huggingface / optimum-nvidia

Apache License 2.0
887 stars 86 forks source link

Add back the ability to build Whisper from Transformers checkpoints #101

Closed fxmarty closed 6 months ago

fxmarty commented 6 months ago

Can be tested with TP=1, PP=1 with the example in examples/automatic-speech-recognition/:

CUDA_VISIBLE_DEVICES=0 python3 whisper.py openai/whisper-tiny.en tiny_whisper

There remains warning logs from TRT-LLM as

[03/22/2024-03:31:19] [TRT-LLM] [W] Parameter dtype is None, using default dtype: DataType.FLOAT, it is recommended to always specify dtype explicitly

since dtype is sometimes not passed in TRT-LLM codebase: https://github.com/NVIDIA/TensorRT-LLM/blob/66ca3378c61efa3154ed34a48cfc362351405eef/tensorrt_llm/models/enc_dec/model.py#L1644-L1645

TRT-LLM handles Whisper with two different Module subclasses: WhisperEncoder and DecoderModel. We save their engines two subfolders encoder/engines and decoder/engines, hence some refactors in hub.py and runtime.py to accomodate multiple-engines models

fxmarty commented 6 months ago

One test runner OOM at build