microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.7k stars 2.93k forks source link

[Feature Request] SpaceToDepth & DepthToSpace integer implementations #21287

Open mcollinswisc opened 4 months ago

mcollinswisc commented 4 months ago

Describe the feature request

The DepthToSpace and SpaceToDepth ops support integer types: https://github.com/onnx/onnx/blob/38afbd31ac9a585abb7463dcaaae121651e0a2d7/docs/Operators.md#DepthToSpace https://github.com/onnx/onnx/blob/38afbd31ac9a585abb7463dcaaae121651e0a2d7/docs/Operators.md#SpaceToDepth

There's currently only implementations for float and double in CPU: https://github.com/microsoft/onnxruntime/blob/4c3c809bdbcde4ea96f0a31a242ca6877a10c40a/onnxruntime/core/providers/cpu/tensor/space_depth_ops.cc and float16 in CUDA: https://github.com/microsoft/onnxruntime/blob/4c3c809bdbcde4ea96f0a31a242ca6877a10c40a/onnxruntime/core/providers/cuda/tensor/space_depth_ops.cc

Describe scenario use case

Running quantized models

skottmckay commented 3 months ago

Should be relatively simple to try out given the implementations are templatized.

You could extend the list of supported types in the type constraints for the latest opset and add a new branch in the Compute.

https://github.com/microsoft/onnxruntime/blob/4c3c809bdbcde4ea96f0a31a242ca6877a10c40a/onnxruntime/core/providers/cpu/tensor/space_depth_ops.cc#L28-L29 https://github.com/microsoft/onnxruntime/blob/4c3c809bdbcde4ea96f0a31a242ca6877a10c40a/onnxruntime/core/providers/cpu/tensor/space_depth_ops.cc#L53-L54 https://github.com/microsoft/onnxruntime/blob/4c3c809bdbcde4ea96f0a31a242ca6877a10c40a/onnxruntime/core/providers/cpu/tensor/space_depth_ops.cc#L135

Ideally uint8 and int8 are handled in the same branch given they're the same datasize (i.e. we don't want to pay the binary size cost for 2 implementation moving 8-bit data around).

The CUDA implementation seems to be pretty generic already and may just need the addition of the data types in the type constraints.

The kernel registrations in the EPs aren't typed (i.e. the kernel implementation is internally handling the different supported data types) so you shouldn't need to do anything there.