Open fxmarty opened 1 year ago
https://github.com/microsoft/onnxruntime/issues/13851 will make this much easier
Is there any update on this? I also suffer from the difficulty to use TensorRT as the provider after run optimum-cli optimization.
@chilo-ms Thanks a lot, that looks great! It is not at the top of our todo for now, but we're welcoming community contribution to interface well TensorrtExecutionProvider with ORTModel classes!
Feature request
For decoder models with cache, it can be painful to manually compile the TensorRT engine as ONNX Runtime does not give options to specify shapes. The engine build could maybe be done automatically.
The current doc is only for
use_cache=False
, which is not very interesting. It could be improved to show how to pre-build the TRT with use_cache=True.References: https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/gpu#tensorrt-engine-build-and-warmup https://github.com/microsoft/onnxruntime/issues/13559
Motivation
TensorRT is fast
Your contribution
will work on it sometime