while debugging issue 32 (loop unrolling in combination with heavy-duty
optimization), I came upon this code (produced with -O3 + -funroll-loops,
though it's probably the same with -O0 as well; comments by me):
ld1b $s32,0(,$s39) # $s32 <- src[0]
sll $s32,$s32,56 # $s32 <- $s32 << 56
srax $s32,$s32,56 # $s32 <- $s32 >> 56
st1b $s32,0(,$s38) # dst[0] <- $s32
it seems that gcc uses sll+srax combination to make sure that top 56 bits
of the register to/from which chars are loaded/stored are 0. this is imho
redundant as ld1b "loads 1 byte, zero extended". since ld1b is not sign
extended, sll+srax after it is redundant and should not be emitted by the
compiler.
saving 2 cycles here and 2 cycles there makes for a whole lot of cycles ... ;)
Original issue reported on code.google.com by jmoc...@gmail.com on 26 Nov 2008 at 3:14
Original issue reported on code.google.com by
jmoc...@gmail.com
on 26 Nov 2008 at 3:14