Open juntaosun opened 4 months ago
@liqunfu
I completely agree with @juntaosun.
For example, LivePortrait currently cannot support ONNX because 5D grid_sample
is not supported on GPU : (
@tianleiwu @liqunfu
I completely agree with @cleardusk Are there any plans to improve the performance and speed of grid_sample in onnxruntime-gpu ? @tianleiwu @liqunfu
@liqunfu, is there plan to add the support in 1.20 release?
If not, I suggest other people who are interested in it can continue from your scratch, and submit a pull request. What do you think?
Agreed. On onnxruntime 1.17.0 +cuda11.8+opset 20, grid_sample 1080p output takes 70 ms with CPU, while GPU is much slow than CPU mode, around 140ms doubled. Compared with torch implementation, inference only takes 0.01ms. really big diffenence.
looking forward onnx team to support andoptimize 4D/5D grid_sample op on GPU,thanks
I hope you can pay attention to it. More and more models are being used, but grid_sample
in onnxruntime
is dozens of times slower than torch
.
I added/update gridsample cpu implementation when the op was added/updated in onnx as part of onnx integration with ort. The implementation was inherited from an existing contribute op. I do not see quick way to improve its performance by dozens times. Usually gridsample is preceded with an affinegrid. In this case the ops can be fused. In such case, the implementation can be greatly improved. I wonder if this is the use case? I expect someone taking over this work because I am on other task now.
Describe the feature request
Many models now use grid_sample 5D calculations, but the export onnx does not seem to support it yet. It now works on the CPU, which makes the inference speed very slow compared to the original torch.nn.functional.grid_sample. Searching for issues has mentioned this issue many times in the past. As of 2024-07-17, the latest onnxruntime still does not support it. In addition, I have seen an implementation in the branch.
https://github.com/microsoft/onnxruntime/commit/7c0ae44ebb3e38fd7d1ebb6886301eaa2feff204
Hope to support it as soon as possible. I think it will be great for most developers.
Describe scenario use case
I believe that many people need it ( Cuda ). Thank you for your efforts and excellent work. ❤️