This PR fixes #1442
Because fp8 tensors are promoted to fp16 for enabling DPAS, there was a format mismatch.
It is fixed by 1) matching the format when lowering 2d block load to LLVM and 2) enabling 8-bit 2d block primitive that supports different shape format.
This PR fixes #1442 Because fp8 tensors are promoted to fp16 for enabling DPAS, there was a format mismatch. It is fixed by 1) matching the format when lowering 2d block load to LLVM and 2) enabling 8-bit 2d block primitive that supports different shape format.