Problem:
We do not currently have a way to cleanly separate function calls that are intended to launch an async operation from those that are intended to run within the device thread. This means that when we compose operations, the unintended consequences is that the operation will sometimes invoke launch_op when in fact we intended for the operation to simply run as-is on the device.
Solution:
Introduce the TensorAsync and remove the existing promise-like features from Tensor. By having a distinct type, the same function can be overloaded such that a function receiving arguments with TorchAsync would naturally be expected to invoke launch_op whereas the function receiving Tensor would simply be expected to run on device. By having this distinction, we would then be able to easily compose operations that are expected to run in within one invocation of launch_op.
This issue will be the global issue should this work be accepted and will drive sub-issue(s) per operation that will update the underlying ops to differentiate with these overloaded functions.
Problem: We do not currently have a way to cleanly separate function calls that are intended to launch an async operation from those that are intended to run within the device thread. This means that when we compose operations, the unintended consequences is that the operation will sometimes invoke launch_op when in fact we intended for the operation to simply run as-is on the device.
Solution: Introduce the TensorAsync and remove the existing promise-like features from Tensor. By having a distinct type, the same function can be overloaded such that a function receiving arguments with TorchAsync would naturally be expected to invoke launch_op whereas the function receiving Tensor would simply be expected to run on device. By having this distinction, we would then be able to easily compose operations that are expected to run in within one invocation of launch_op.
This issue will be the global issue should this work be accepted and will drive sub-issue(s) per operation that will update the underlying ops to differentiate with these overloaded functions.