I build LLC from the main branch and compile the attached test case with -mcpu=sapphirerapids. The resulting assembly code has the following instructions:
Here tile reload misses index register with a stride, therefore zero stride is used and the first row of the stored tile is broadcasted. Also, it looks like this spill/reload pair can be avoided.
[amx-reload-bug.txt](https://github.com/user-attachments/files/17157112/amx-reload-bug.txt)
I build LLC from the main branch and compile the attached test case with `-mcpu=sapphirerapids`. The resulting assembly code has the following instructions:
```
tilestored %tmm0, 6016(%rsp,%rax) # 1024-byte Folded Spill
tileloadd 6016(%rsp), %tmm7 # 1024-byte Folded Reload
```
Here tile reload misses index register with a stride, therefore zero stride is used and the first row of the stored tile is broadcasted. Also, it looks like this spill/reload pair can be avoided.
amx-reload-bug.txt
I build LLC from the main branch and compile the attached test case with
-mcpu=sapphirerapids
. The resulting assembly code has the following instructions:Here tile reload misses index register with a stride, therefore zero stride is used and the first row of the stored tile is broadcasted. Also, it looks like this spill/reload pair can be avoided.