Custom ops may have inputs/outputs on different memory locations. Internally we can specify inputs/outputs memory type by calling KernelDefBuilderInputMemoryType() or KernelDefBuilderOutputMemoryType when registering kernels. It would be nice if this can be supported in custom op. The aim is to mitigate memory copy between device and host.
Describe the feature request
Custom ops may have inputs/outputs on different memory locations. Internally we can specify inputs/outputs memory type by calling KernelDefBuilderInputMemoryType() or KernelDefBuilderOutputMemoryType when registering kernels. It would be nice if this can be supported in custom op. The aim is to mitigate memory copy between device and host.
Describe scenario use case
custom beam search op