[Feature Request] Request grid_sample 5D support 🌟

microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

https://onnxruntime.ai

MIT License

14.74k stars 2.94k forks source link

[Feature Request] Request grid_sample 5D support 🌟 #21382

Open juntaosun opened 4 months ago

juntaosun commented 4 months ago

Describe the feature request

Many models now use grid_sample 5D calculations, but the export onnx does not seem to support it yet. It now works on the CPU, which makes the inference speed very slow compared to the original torch.nn.functional.grid_sample. Searching for issues has mentioned this issue many times in the past. As of 2024-07-17, the latest onnxruntime still does not support it. In addition, I have seen an implementation in the branch.

https://github.com/microsoft/onnxruntime/commit/7c0ae44ebb3e38fd7d1ebb6886301eaa2feff204

Hope to support it as soon as possible. I think it will be great for most developers.

Describe scenario use case

I believe that many people need it ( Cuda ). Thank you for your efforts and excellent work. ❤️

tianleiwu commented 4 months ago

@liqunfu

cleardusk commented 3 months ago

I completely agree with @juntaosun. For example, LivePortrait currently cannot support ONNX because 5D grid_sample is not supported on GPU : (

@tianleiwu @liqunfu

juntaosun commented 2 months ago

I completely agree with @cleardusk Are there any plans to improve the performance and speed of grid_sample in onnxruntime-gpu ? @tianleiwu @liqunfu

tianleiwu commented 2 months ago

@liqunfu, is there plan to add the support in 1.20 release?

If not, I suggest other people who are interested in it can continue from your scratch, and submit a pull request. What do you think?

fedral commented 2 months ago

Agreed. On onnxruntime 1.17.0 +cuda11.8+opset 20, grid_sample 1080p output takes 70 ms with CPU, while GPU is much slow than CPU mode, around 140ms doubled. Compared with torch implementation, inference only takes 0.01ms. really big diffenence.

looking forward onnx team to support andoptimize 4D/5D grid_sample op on GPU，thanks

juntaosun commented 1 month ago

I hope you can pay attention to it. More and more models are being used, but grid_sample in onnxruntime is dozens of times slower than torch.

liqunfu commented 1 month ago

I added/update gridsample cpu implementation when the op was added/updated in onnx as part of onnx integration with ort. The implementation was inherited from an existing contribute op. I do not see quick way to improve its performance by dozens times. Usually gridsample is preceded with an affinegrid. In this case the ops can be fused. In such case, the implementation can be greatly improved. I wonder if this is the use case? I expect someone taking over this work because I am on other task now.