Closed LuFinch closed 1 week ago
cuda and xpu does not use accumulate dtype for bfloat16 and flat16, while cpu used. torch-xpu-ops will align with cuda.
cuda:
void glu_kernel(TensorIteratorBase& iter) {
AT_DISPATCH_FLOATING_TYPES_AND2(
kHalf, kBFloat16, iter.dtype(), "glu_cuda", [&]() {
using opmath_t = at::opmath_type
xpu:
struct GluFunctor {
using opmath_t = at::opmath_type
cpu:
void glu_kernel(TensorIteratorBase& iter) {
if (at::isReducedFloatingType(iter.dtype())) {
AT_DISPATCH_REDUCED_FLOATING_TYPES(iter.dtype(), "glu_cpu", [&]() {
const float float_one_val(1);
const Vectorized
close the issue as torch-xpu-ops will align with cuda.
🐛 Describe the bug
Reproducer
Directly run
pytest test.py
and it would outputStatus
This gap between xpu and cpu causes an IPEX UT fail. In IPEX2.5, we override this Op with IPEX implementation.
However, I found that implementation of glu_backward in torch-xpu-ops is aligned with CUDA, and CUDA can't pass this case too.
Not sure whether we should fix or just skip this UT.
Versions
...