intel / intel-graphics-compiler

Other
596 stars 155 forks source link

Possible DPAS miscompilation #275

Closed masahi closed 1 year ago

masahi commented 1 year ago

Hi, I'm working on generating a SPIRV module with the joint matrix extension using TVM, and running it with the OpenCL runtime. While I got the simplest case (m8n8k16 matmul, fp16) working, I'm getting a weird result if I extend the simplest test case to m8n16k16 and let a single subgroup do two independent JointMatrixMadINTEL.

The SPIRV module in question is attached below and its text representation is posted at https://gist.github.com/masahi/9e219607dd80939397b1cffe91a08f4e

joint_matrix_test.zip

As you can see in the text disassembly, there are two JointMatrixMadINTEL:

...
7 JointMatrixMadINTEL 41 184 175 180 43 40
7 JointMatrixMadINTEL 41 185 175 183 44 40
...

The first one is supposed to compute the left half of the 8x16 output matrix, while the second one does the right half. Here is my current situation:

Since the same A matrix is used for both multiply-add and the first one is computing the correct output, the only possible explanation of this result is that the B matrix in the second multiply-add is all zero. But I'm pretty sure that the B cannot be all-zero in my test case.

I've been trying to figure out what's going on, and one thing I noticed is that, in the generated ISA (dumped using https://github.com/intel/opencl-intercept-layer), there are three calls to dpas like the ones below, while the SPIRV module only has two JointMatrixMadINTEL. Is this expected, and if so, why three?

...
        dpas.8x8 (8|M0)          r109:f        null:f            r109:hf           r53.0:hf         {Atomic}
        dpas.8x8 (8|M0)          r109:f        null:f            r77:hf            r53.0:hf         {Atomic}
        dpas.8x8 (8|M0)          r117:f        null:f            r101:hf           r53.0:hf         {$1}
...
mbelicki commented 1 year ago

Hi @masahi, I've checked your sample, but I couldn't reproduce your result, I've got:

        dpas.8x8 (8|M0)          r109:f        null:f            r77:hf            r53.0:hf         {Atomic} // $107
        dpas.8x8 (8|M0)          r117:f        null:f            r101:hf           r53.0:hf         {$12} // $108

so, without the middle dpas instruction. Could you attach the dumps that you have gathered? They should include exact flags and platform that was targeted, this should allow me to reproduce your problem.

masahi commented 1 year ago

Hi @mbelicki, thank you for your reply. Indeed, I was using a locally-built IGC to debug other issues I had before. I confirmed that, If I use libigc.so that comes with my opencl installation, there are two dpas generated as expected.

Here is my ISA dump and disassembly having three dpas. I got this using an IGC build at the commit 4fd4b38ae. dpas_dump.zip

mnaczk commented 1 year ago

Hi, @masahi Could you provide Shader Dumps? It will be useful for debugging purposes. Instruction on how to do it can be found here: https://github.com/intel/intel-graphics-compiler/blob/master/documentation/shader_dumps_instruction.md

masahi commented 1 year ago

Oh I didn't know about that! I was using cliloader --dump-kernel-isa-binaries and iga64 -d -p=12p71 to dump the ISA before. I've just generated this one using the latest IGC build as of now, 6e94f33a9. The file OCL_asm69ea140fff61b12f_simd8_entry_0001.asm shows three dpas.

shader_dump_dpas.zip

mbelicki commented 1 year ago

Thanks for providing the dumps, this is really helpful! I've looked at them and it seems that the problem is somewhere in the VISA finalizer. On all compilation stages up until the very end there are only two dpas intrinsics/intstructions, the third one appears only in the final assebly.

I will notify the team working on the finalizer about this issue.

DianaChen commented 1 year ago

The redundant dpas you got here is introduced by a hardware workaround (The WA intended to insert the dummy dpas). Note that the WA caused some other issues so was removed. The dummy dpas should no longer being seen in latest driver.

masahi commented 1 year ago

Confirmed that the redundant dpas is gone.