Transpose -> GEMM issue

When transposing the last two axes of a shape with 1, 1 as its last two strides, gemm does not correctly deduce that the input with that shape is transposed, due to it only checking the strides. This causes the rocBLAS implementation of gemm to fail. The issue doesn't appear when using MLIR.

@4 = hip::copy_to_gpu(x2,@2) -> float_type, {2, 2, 1}, {2, 1, 1}, target_id=0
@9 = transpose[permutation={0, 2, 1}](@4) -> float_type, {2, 1, 2}, {2, 1, 1}, target_id=0
@10 = gpu::gemm[alpha=1,beta=0,compute_fp32=0,trans_batch=0,solution_idx=0](@7,@9,@8) -> float_type, {2, 2, 2}, {4, 2, 1}, target_id=0

migraphx-benchmark / AMDMIGraphX

Transpose -> GEMM issue #183