huggingface / optimum-nvidia

Apache License 2.0
867 stars 86 forks source link

Ability to build Whisper encoder/decoder TRT engine #70

Closed fxmarty closed 6 months ago

fxmarty commented 7 months ago

This PR adds support for building Whisper encoder/decoder TRT engines from Transformers checkpoints.

CUDA_VISIBLE_DEVICES=0 python llama.py meta-llama/Llama-2-7b-chat-hf new_llama --hub-token mytoken

works as it used to.

CUDA_VISIBLE_DEVICES=0 python whisper.py openai/whisper-tiny.en whisper_trt

for now only builds the encoder / decoder engine.

Left to do:

Implementing the runtime itself is left to a followup PR.

fxmarty commented 7 months ago

@mfuntowicz Let me know if you prefer the runtime, logits matching tests, etc. to be implemented in this PR as well.