Open DianQK opened 2 months ago
@llvm/issue-subscribers-backend-aarch64
Author: DianQK (DianQK)
@davemgreen @efriedma-quic Could you take a look at this issue (to see if it's what I suspect it is)? :p
It looks like this is probably true of other cores too if the umlal can start executing earlier. We have usually tried to solve issues like this in the MachineCombiner, which can take the latencies and depths of the instructions into account to re-associate the result back. Could the same thing work here?
https://github.com/llvm/llvm-project/pull/99634 mentions the barriers were not needed in that version, but I can imagine with slightly different code the un-reassociation would still be useful.
It looks like this is probably true of other cores too if the umlal can start executing earlier. We have usually tried to solve issues like this in the MachineCombiner, which can take the latencies and depths of the instructions into account to re-associate the result back. Could the same thing work here?
That sounds reasonable. It looks like we need to consider this issue in the loop. Fortunately, the number of instructions is quite small: https://rust.godbolt.org/z/4hTnqcnMz. But I know very little about the details of CPU execution, and so far, I haven't found the reason.
The following IR has had its instruction order altered after
reassociate
:The changes in the assembly instructions are as follows:
The performance of the altered instruction order has significantly decreased on the Apple M1. (I am not sure if this is also the case for other ARM processors.) My immature guess is that the
add
instruction is preventing the parallel execution ofumlal
. Perhaps we need anllvm.aarch64.neon.umlal.*
intrinsic?Here's a real example in Rust: https://rust-lang.zulipchat.com/#narrow/stream/187780-t-compiler.2Fwg-llvm/topic/Is.20instruction.20ordering.20something.20to.20file.20issues.20about.3F/near/453056084 C: https://github.com/Cyan4973/xxHash/blob/a57f6cce2698049863af8c25787084ae0489d849/xxhash.h#L5312-L5323 Godbolt: https://llvm.godbolt.org/z/oeKqn19ff