huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.56k stars 464 forks source link

Auto-TensorRT engine compilation, or improved documentation for it #842

Open fxmarty opened 1 year ago

fxmarty commented 1 year ago

Feature request

For decoder models with cache, it can be painful to manually compile the TensorRT engine as ONNX Runtime does not give options to specify shapes. The engine build could maybe be done automatically.

The current doc is only for use_cache=False, which is not very interesting. It could be improved to show how to pre-build the TRT with use_cache=True.

References: https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/gpu#tensorrt-engine-build-and-warmup https://github.com/microsoft/onnxruntime/issues/13559

Motivation

TensorRT is fast

Your contribution

will work on it sometime

fxmarty commented 1 year ago

https://github.com/microsoft/onnxruntime/issues/13851 will make this much easier

puyuanOT commented 1 year ago

Is there any update on this? I also suffer from the difficulty to use TensorRT as the provider after run optimum-cli optimization.

chilo-ms commented 1 year ago

Hi,

1.15 ORT TRT supports explicit input shape meaning users can provide the shape range for all the dynamic shape input. Please see the PR as well as the doc for usage/details.

Let us know if you have further questions or other feedbacks for ORT TRT. We are willing to make ORT TRT easier to use.

fxmarty commented 1 year ago

@chilo-ms Thanks a lot, that looks great! It is not at the top of our todo for now, but we're welcoming community contribution to interface well TensorrtExecutionProvider with ORTModel classes!