For little endian targets I would expect both these functions to turn out the same (like they do on x86). 32-bit and 64-bit loads seem to be rewritten already; it's just 16-bit where they diverge.
Note that -march=armv7-a (or similar) is necessary to make the unaligned load option available.
We have some limited support for this sort of thing (see https://reviews.llvm.org/D27861), but it doesn't catch this particular case because of the zero-extend. It could probably be extended, though.
Extended Description
(also affects AArch64, but I can only tag one component)
clang --target=arm-linux-gneabihf -march=armv7-a -O3 -S -xc -o- - << EOF
include
uint16_t ld16(uint8_t const* p) { uint16_t r; __builtin_memcpy(&r, p, sizeof(r)); return r; }
uint16_t ld16_bytes(uint8_t const* p) { uint16_t r = p[0] | (p[1] << 8); return r; } EOF
gives:
ld16: ldrh r0, [r0] bx lr
ld16_bytes: ldrb r1, [r0] ldrb r0, [r0, #1] orr r0, r1, r0, lsl #8 bx lr
For little endian targets I would expect both these functions to turn out the same (like they do on x86). 32-bit and 64-bit loads seem to be rewritten already; it's just 16-bit where they diverge.
Note that -march=armv7-a (or similar) is necessary to make the unaligned load option available.