Open eduardosm opened 3 years ago
CodeGen should be improved after 70289ea6f591bd39c631f1eee3e6f2622fbc1d46, but its still not perfect.
New codegen
read_swap: lbu a1, 1(a0) lbu a0, 0(a0) slli a1, a1, 8 or a0, a1, a0 srli a1, a0, 8 slli a0, a0, 8 or a0, a0, a1 ret
Extended Description
On RISC-V, an unaligned read followed by a bswap produces suboptimal code.
Given the following IR:
declare i16 @llvm.bswap.i16(i16) define i16 @read(i16 %p) { start: %v = load i16, i16 %p, align 1 ret i16 %v } define i16 @read_swap(i16 %p) { start: %v = load i16, i16 %p, align 1 %v2 = tail call i16 @llvm.bswap.i16(i16 %v) ret i16 %v2 }
compiled with llc -mtriple=riscv64-unknown-linux-gnu -O3
it produces the following assembly:
read: lb a1, 1(a0) lbu a0, 0(a0) slli a1, a1, 8 or a0, a0, a1 ret read_swap: lb a1, 1(a0) lbu a0, 0(a0) slli a1, a1, 8 or a0, a0, a1 slli a1, a0, 40 addi a2, zero, 255 slli a2, a2, 48 and a1, a1, a2 slli a0, a0, 56 or a0, a0, a1 srli a0, a0, 48 ret
The code for read is generated as expected. However, the code for read_swap can be simplified to:
read_swap: lb a1, 0(a0) lbu a0, 1(a0) slli a1, a1, 8 or a0, a0, a1 ret