When one of the last two shape dimensions is broadcasted, eliminate_contiguous removes a contiguous that should remain before gemm, due to gemm's compute_shape not handling zeros in the last two strides. This causes the rocBLAS implementation of gemm to fail. However, this is not an issue when using MLIR dot.
When one of the last two shape dimensions is broadcasted,
eliminate_contiguous
removes a contiguous that should remain before gemm, due to gemm'scompute_shape
not handling zeros in the last two strides. This causes the rocBLAS implementation of gemm to fail. However, this is not an issue when using MLIR dot.