Open WangJialei-A opened 1 day ago
The root cause of this failure is similar to #360 (memref.alloc()
in GPU kernel), however the fix figured out in #360 won't work for this case :(
The MLIR generated by OV for the model from the reproducer (#382) looks like this:
The main difference from the modules we've been processing before, is that it takes bias as a 1D tensor (%arg2 : tensor<1x512>) that has to be broadcasted to a proper shape (512x512). linalg.broadcast
is the operation that triggers memref.alloc()
to be inserted to the GPU kernel:
We can potentially lower linalg.broadcast
to vector.broadcast
in linalg-to-xegpu
pass like this and get rid of unnecessary allocation:
However the chain of memref.subview + memref.expand_shape
cannot be lowered into xegpu.create_nd_desc + xegpu.update_nd_desc
by the xegpu-fold-alias-ops
pass which eventually causes the pipeline to fail on imex-convert-gpu-to-spirv
pass (error: failed to legalize operation 'memref.subview'
)
In order to fix that problem we have to get rid of memref.collapse_shape
(outside the loop) and memref.expand_shape
(inside the loop). This should probably be a separate pass since this logic seems to not be related to linalg-to-xegpu lowering (or maybe there's already an upstream pass that does the same?):
cc @kurapov-peter
Generate case:
Reproduce:
Error log: