Closed masahi closed 1 year ago
Hi @masahi, I've checked your sample, but I couldn't reproduce your result, I've got:
dpas.8x8 (8|M0) r109:f null:f r77:hf r53.0:hf {Atomic} // $107
dpas.8x8 (8|M0) r117:f null:f r101:hf r53.0:hf {$12} // $108
so, without the middle dpas instruction. Could you attach the dumps that you have gathered? They should include exact flags and platform that was targeted, this should allow me to reproduce your problem.
Hi @mbelicki, thank you for your reply. Indeed, I was using a locally-built IGC to debug other issues I had before. I confirmed that, If I use libigc.so
that comes with my opencl installation, there are two dpas generated as expected.
Here is my ISA dump and disassembly having three dpas. I got this using an IGC build at the commit 4fd4b38ae. dpas_dump.zip
Hi, @masahi Could you provide Shader Dumps? It will be useful for debugging purposes. Instruction on how to do it can be found here: https://github.com/intel/intel-graphics-compiler/blob/master/documentation/shader_dumps_instruction.md
Oh I didn't know about that! I was using cliloader --dump-kernel-isa-binaries
and iga64 -d -p=12p71
to dump the ISA before. I've just generated this one using the latest IGC build as of now, 6e94f33a9. The file OCL_asm69ea140fff61b12f_simd8_entry_0001.asm
shows three dpas.
Thanks for providing the dumps, this is really helpful! I've looked at them and it seems that the problem is somewhere in the VISA finalizer. On all compilation stages up until the very end there are only two dpas intrinsics/intstructions, the third one appears only in the final assebly.
I will notify the team working on the finalizer about this issue.
The redundant dpas you got here is introduced by a hardware workaround (The WA intended to insert the dummy dpas). Note that the WA caused some other issues so was removed. The dummy dpas should no longer being seen in latest driver.
Confirmed that the redundant dpas is gone.
Hi, I'm working on generating a SPIRV module with the joint matrix extension using TVM, and running it with the OpenCL runtime. While I got the simplest case (m8n8k16 matmul, fp16) working, I'm getting a weird result if I extend the simplest test case to m8n16k16 and let a single subgroup do two independent
JointMatrixMadINTEL
.The SPIRV module in question is attached below and its text representation is posted at https://gist.github.com/masahi/9e219607dd80939397b1cffe91a08f4e
joint_matrix_test.zip
As you can see in the text disassembly, there are two
JointMatrixMadINTEL
:The first one is supposed to compute the left half of the 8x16 output matrix, while the second one does the right half. Here is my current situation:
C
matrix inA x B + C
.Since the same
A
matrix is used for both multiply-add and the first one is computing the correct output, the only possible explanation of this result is that theB
matrix in the second multiply-add is all zero. But I'm pretty sure that theB
cannot be all-zero in my test case.I've been trying to figure out what's going on, and one thing I noticed is that, in the generated ISA (dumped using https://github.com/intel/opencl-intercept-layer), there are three calls to
dpas
like the ones below, while the SPIRV module only has twoJointMatrixMadINTEL
. Is this expected, and if so, why three?