Open fengyuan14 opened 3 months ago
It is a performance requirement. The existing CUDA implementation in PyTorch supports data type dynamic cast, so that there won't be an extra kernel to align data types of input and output.
No response
Not an urgent case, as the usage is rare. Lower the priority.
🚀 The feature, motivation and pitch
It is a performance requirement. The existing CUDA implementation in PyTorch supports data type dynamic cast, so that there won't be an extra kernel to align data types of input and output.
Alternatives
No response
Additional context
No response