Ability to build Whisper encoder/decoder TRT engine

This PR adds support for building Whisper encoder/decoder TRT engines from Transformers checkpoints.

CUDA_VISIBLE_DEVICES=0 python llama.py meta-llama/Llama-2-7b-chat-hf new_llama --hub-token mytoken

works as it used to.

CUDA_VISIBLE_DEVICES=0 python whisper.py openai/whisper-tiny.en whisper_trt

for now only builds the encoder / decoder engine.

Left to do:

[x] Remove breaking changes (TensorRTEngineBuilder replaced by TensorRTForCausalLMEngineBuilder, etc.)
[x] Cleanup the many added # TODO, mostly comprehension/refactorization needed
[x] Make sure Whisper encoder build uses static min/opt/max shapes
[ ] Test engine build with sharding.

Implementing the runtime itself is left to a followup PR.

huggingface / optimum-nvidia