Closed fxmarty closed 6 months ago
This PR adds support for building Whisper encoder/decoder TRT engines from Transformers checkpoints.
CUDA_VISIBLE_DEVICES=0 python llama.py meta-llama/Llama-2-7b-chat-hf new_llama --hub-token mytoken
works as it used to.
CUDA_VISIBLE_DEVICES=0 python whisper.py openai/whisper-tiny.en whisper_trt
for now only builds the encoder / decoder engine.
Left to do:
TensorRTEngineBuilder
TensorRTForCausalLMEngineBuilder
# TODO
Implementing the runtime itself is left to a followup PR.
@mfuntowicz Let me know if you prefer the runtime, logits matching tests, etc. to be implemented in this PR as well.
This PR adds support for building Whisper encoder/decoder TRT engines from Transformers checkpoints.
works as it used to.
for now only builds the encoder / decoder engine.
Left to do:
TensorRTEngineBuilder
replaced byTensorRTForCausalLMEngineBuilder
, etc.)# TODO
, mostly comprehension/refactorization neededImplementing the runtime itself is left to a followup PR.